当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

浙江大学:《生物信息学》(第二版)配套PPT课件_3 Analysis and alignment of sequences 3.1 Compositional bias in biological sequences 3.2 Alignment of pairs of sequences

资源类别:文库,文档格式:PPTX,文档页数:54,文件大小:2.72MB,团购合买
3.1 Compositional bias in biological sequences 3.2 Alignment of pairs of sequences
点击下载完整版文档(PPTX)

《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-1 3. Analysis and alignment of sequences 3.1 Compositional bias in biological sequences 3.2 Alignment of pairs of sequences 3.3 Database searching for similar sequences 3. 4 Multiple sequence alignment and domain finding

3. Analysis and alignment of sequences • 3.1 Compositional bias in biological sequences • 3.2 Alignment of pairs of sequences • 3.3 Database searching for similar sequences • 3.4 Multiple sequence alignment and domain finding 《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-1

CTACATTCCTATCCACTGGTGCATATCTAGO ETATCTITCTCTAACCTTAACACACITTAAGITCACAAAATTA 31c。mp。st。 aabbs in bfolocicalsecuences 我vM以EN TACATTTT GGAATCAGGGC://15 AGi ISoSoweai eolaFdistrbutione he石 CGTTGTT AAAATAATIGTCATAA合e

CACTAGTCTCTGTACTAGCCACTAGAAGTACTAACCTTTCACACTAATATATCTATCTCCTGCTGCATTTAGTACACAAGTTCATAAAAGCACCCTATTTCTATAAAAAAAATACGGTAAATGTA GCAACTTAC TAGTACCATAAGAAATTTTGCTGATCTAGCTAACTTATTACTAGCTACTTGCTAGGTCTGAACACTATTAAAATGTAACAATACACTTACCTCCTTGATCTGTGCAGCCCTGTTCTCACGCTGGCTTCTATGG TGCGAGTAGTATTCCTAGGTTTTCGTAGGCTTTTATAGCAACAGCTTTCTTCGGACCGAATGAGACACCTGCCTTGTTTATGAGAGGGATGGATAGCTTTCACCTGCTGGACATTTATTTGTTTTTTTTTACT GGTCACTACATTCCTATCCACTGGTGCATATCTATCCTATCCCCTTTGGTCAGTAAAATATACTGCCTCCCCCATTCTCTTTCTTTCTCTATCTTTCTCTAAGCTTAACACACTTTAAGTTCACA AAATTATTAT TATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTAGCAGGCTTCCCTCCTTTAGAAATTTCATCGTCGAAATTATTATACCTTGGTGATGGAAAA ACTGAGGCTAGT TTTTTCTGGAGATCATCTTCCTTCTCCCATGTGGCCTCATCCATGGTGTGATGACTCCATTGTACCTTTAAAAATCTAATTGTTTGGTTCCTTGTTTTTAGATCTTTAATATCCAAGATACAAACAGGATATTC CTGATATGTCAAATCGTTATGCAACTCAGCCATAGGAATTTCAACTTAATCACTTGGCCTCCGAAGGCATTTACGAAGCATGGAGATGTGGAATACATCATGTACCCCGGTGAAAGCATCTGGTA GCTTTA GCATGTAAGGCACTTCTCCTATTTGCTTAACAATTGTAAATGGTCCAACATATCTGGAACTTATTTTTTTTCCAAGTCCGAATCGCTTAATTCCCTTTATAGGTGATACTTTTAAATATACCCAGTCACCTATAT CAAAGTTAAGATCCCTTCTCCTATTATCTGCATAACTTTTTTTGTCTATTTTGAGCTGTTTGCAGTCGTTCCCGTATCAGTCGTATTGTTTCTTCTATCTGTTGTATTATATCCGGTCCTAACAA TTTTCTTTCTC CTACTTCGTTCCAGCAAACAGGTGTTCTGCATTTCCTTCCATATAAGGCTTCATACGGAGCCATTTGTATACTAGATTGATAACTATTGTTATATGCAAATTCTGCTAATGGCATAAATTCTTTCCATGATCCT TTAAATTCTAGGATGCAAGATCGTAAAATATTTTCAATTATTTGATTCACCCTTTCAGTTTGTCCATCGGTTTGGGGGTGATACGCTGCACTGAAATCTAATGTTGTTCCCACGGGCTTGTGTAGTCTTTTCT AGAAATTGGACAGAAACTGTGTATCTCTGTCTGACACAATCCTTCTTGGAACACCATGTAAAGATACTATTTCTTTGACATATAGTTTAGCTAACCTTTCCAAAGAAAATTTGCTTTTAACGGGTATGAAATGA GCAGATTTTGTTAACCGATCCACTATTCAGATACTATCATTTCCTGGAGGTGTGGTAGGTAATCCTTGAACAAAGTCCATACTGATTTCTTCTCATTTCCATAGTGGAATACTTAAGGGTTGTAA CAGTCTTG CCGGCCTTTGATGTTCAACTTTTACGCATTGGCAGATATCACATTCTGCAATGAATTTTGCAATTTCTATTTTCATGATACATTTTGGTACTTCCTGGATGTATGGTATAGGGAGAGAAATGTGA TTCTTCCAA TATTCTCTGTTTTAAATTAGGGTCGTTAGGCACACACAATCTATTTTTGAAACATATAGCACCATTATGATCAATTCGAAATTCAGACACCTTCCCTTCTTCAATATTTTTCTTTGCCTTTTGCA ATCCACTGTC GTCTCTTTGTTTCTCTAGAATATTTTCTTCTAAAGTAGGCTTTATTTGAAGCACGGGTAATAATACTCTGGGTTCATGGATCTTTAATTCCACATCCAATCTTTCCAAGTCTCTAAGTATATGTTGATCCTGTGT GATCTGAATAGCCATATTACAAAGAGCTTTTCGACTTAGAGCATCTTCCACAATGTTGGCTTTCAGAGGGTGATAATGAATATTCAAATCATAATCTTTCAATAATTCTAACCATCCCCTTTATCTCATATTCA ATTCCTTCTGAGTAAATATGTACTTTAAACTTTTGTGGTCAGTAAATATTTCACAATGCTCACCATATAGGTAATGTCTCCAGATTTTTAAGGCAAAAATAACAGCAGCTAATTCCATATCATGGGTTGGATAA TTTTGCTCGTATGGCTTTAATTGACGCGAAGCATAGGCAATTACCTTAGCTTTTTGCATGAGAACACAACCTAATCCAATTTTTGAAGCATCACAGTAAATAGTAAATTCTTCTCCCATTATAGGCAAGGCAA GAATAGTAAATTCTTTGCAATTCTGAGTCCACTCATATTTTACTCCCTTTTGTGTCAACCGGGTTAGAGGAGCTGCAATTCTAGCGAAGTTACTAATAAATCGACGGTAATATCCCGCCAACCCA AGAAAAC TTCGTATCTCGGTTACCGATGAGGGCCTTTTCCACTCTGAGACGGTTTTGACCTTTTCAGGGTCCACTGATATACCTTCACCCGAAATAACATGACCAAGCAAAAATACTTTATCCATCCAGAAA TCGCATT TCTTTAATTTGGCAAATAGTTTATGATCTCGCAATGTCTTGTAGTACTATTCTCAAATGATTTGCATGATCTTCCTTAGTCTTGGAATATATCAAAATATCATCTATATATAAATACAACTACAAATTAATCAAGA TAAGGCTTGAATTTACGATTCATTAAATCCATAAAAGCTGCCGGTGCATTAGTCAAACCAAATGGCATTACTAGATATTCATAGTGTCCATAGCATGCACGGAAAGCAGTCTTGGGTATATCACTAGGTTTA ATCTTTAGTTGATGGTAGCCTGATTGAAGATCAATTTTTGAGAAAACCCGAGCTCCTTGTAGTTGATCAAATAGATCGTCTATCCTTGGTAAAGGATATTTGTTTTTGATAGTCACCTTATTCAGTTCTCGGTA ATCCGTGCATAATCGCATAGTTCCATCCTTTTTCTTGACAAATAGAACAGGAACACCCCCACGGGGAGACACTAGGACAAATGAATCCTTTATCTTCTAATTCTTTTAATTGTACATTTAGTTCC TTTAGCTC AACAGGGGCCATTATGTAGGGTGCCTAATAAATCGGAGTAGTTCCTGGTCCTATTTCAATACCAAATTCAATCTCTCGATCTAGTGCTAATCCTGGTAATTCAGCTGGAAAAACTGGAAACTCATTCACAAT TGGCATTCCTTCCCAACTTGCTTCCTTTCTCATGATTTCTGCCACTAAAGGTCTTGGTAAATTGTTTTAATCTCCATGGTAAGTAATTTGGTTTTGATCCCATGGTTTAAGTGTAATTTGTTTTT CATGGCAATC AATATTTGCTTTGTTCTTACATAACCAATCCATACCAAGTATAATATCAAAATCATGCATATCCAAGGGTATGAGGTCAGCAGTTAATTCCCATCCATCAATAGTAATTGGACACAATTTGCAAA TTAAATTAG TTATTTGGCTATCCAAAGGAGTTTCTATGCAAATCCTTTCTTTTAATTGACTAGTAGGGATGGTGTATTTTCTCACGAAGTTGGTGGAGATAAACGAATGTGTTGCGCCAGAATCAAATAAAACTTTACCAGG ATAAGAGCACACTAAGACATTACCTGTAACCACGGTGTTGGATTTTTCGGCTGTGCTCTTAGTTAAGTTGTATACCCCAAGCGCGATTCCCACCTTGTGAATTATTCGACCGTATTCCTCATGTA GTATTAG TATTTGCAGGTGGCTTTCCTTGATTTGGCCCATTATTATTTGCTGAAGATGGTCCAGGTAAATAAAGCGACGGTACTGAAGTCAATACTTTAGTACTTGGCTGAGTAGTTCAATTAACTCGATTT TTACCCTT CTGTAACAGAGGACAAAGGTATCTAGTATGTCCTGCTTCTCCACACTCAAAGCACCTTCCCCACCGATTAGGACAAATTGATGGAACATGGCCACCTTGGCATATTGGACATTTTCTGTCTTGATTTTCTAA AGATTCCCTCTACATTTTTCCAGAGTAGTTTCCACGGAATCTTCCCTGGTTTTGTTGATTATTTGTCTTGAATTTCTTTTGGGGTTGTCCGTGTTCTATTCTTTGTTCATGATACCCCTTCTCAA GAAGTTGTG CTTTACTTACTACCTCCCTGAATATGGTTAATTCAAAGGCTTCGACACACCTTTTGAGAGGTTGGCGTAATCCACTTTCAAATCGTCGAGCTTTAGAGCCGTCCGTTTGTACAAATTCAGGAGCA AATCTTG CAAGTCTCGAAAATTCTATTTCATATTCTACTACAGATTTATTACCTTACTTAAGCTCTAGAAATTCCTTCTTCATTCTCTTCACACTTTCTGGAAAATATTTCTTGTAAAAAGCTTCTTTGAATATTTCCCATGT AATAGAGATACGTTCCGAATATGACTTTTTGTGAGCATCCCACCATTCAAAAGCACTAGACTGAAGCATATAGGTAGCATATGTAATCTTTTCTTTATCTGTACAACCCATAGCTTCAAATGCCT TTTCCATT GCTACTATCCAAACTTCCGCTTCAAGTGGATTGGTAGTTCCTGAAAGGAAAAAGTATGAATTACCCCCTGAACTATTGCGAGAGTATGAATTACCCCCCCCCCCCAAAACCACAAAACCAGACATATTAAAC CTCAAACTATTGAAATCGGATTACCCCCCCTGATTCAATCCGGAGCGGTTTGGTCCTACGTGGCATACACGTGGCACCGCCATGGAAATCCAATCAGCAATATTAGGTGGTCCCACATGTCATGA TCATGT ATTTCTTCCACTTTCCCCTCTCTTCATCTCCTCCAGGGCAAATAGAAAGCGGCGCGGTGGTGGCGCTCTCCAGGGCGGCCGGGGGAAGCGGCGGCGGCGGCGTCCAGGGCGGGTGGGGGAAGCGGC GGCGTCCAGGGCGGCTGCGGAAGCGACGGCGGCGTCCAGGGTGGGCTAGGGAAGCGGCGGCTTCTAGGGCAAGCTGGGGAAGTGGCGGCGGTGGCGGCGACGGCGGCGTCCAGGGCGGGCTGG GGAAGCAGCGGCGTCCAGGGCAGGCGGGGAAGTGGCGGTGATGACGGCGCCCTCCAGGTCGAACTGGGGTGGTGGCGGGGAAGTGACGGCAGCGACGGCGCCCTCCAGGGCAGGTAGGGGAAGC GGTGGCGGCGGGTGTGGCGGGAGCGCTCGTGCGGTGGGCGCGGCGGGAGCGGGAGCGGGCGCGGCGAGGAGCAGGCGCTTGTGCTCCTCCTCCGTGGCGCCAGAGATGGAGCGGGCGCTCGTG AGCGGGTCGGCCGCCGCTGCGAGCTCGCCGTGGAGGCGGCGAGAATCGAGATCGACGGCGAGCTCCACGGAGATGGAGAGAAGAAGGGAAGGGGCAAAGAGGAGGGGGAGAAGAGGAGGGTTGG GCAGACAGTGGGCCCCACCATATTTATTTGTTGTGGCTGACAAGTGGGTCCTATATATTTTTCTTTTGTTTTAGCTGACCAGACTGCCACATGGGCATCCACGTAGGACCGAAACCACCCTATATCGATCTA GGGGGTAATTCATCCGGTTTGTAAAGTTCAGGGTTAAAAATAACTGGTATTGGAGTTCAGGGTTAAAAATCGGACGACCGTAATTGTTGAGGGGGTAATTCGTACTTTTTCCTTCTTGAAAATGTTGGTGG CTTCAATTTCTGAAATTCCCCAAGTCCATTCCGGTTAGCATCACTTTTAGTAGTACGTTCTAAAATCTCCATCTATCGTTGTTGGGTTTCCTGTTGCTTGCCCAATATATTCGCGAGTAAGTTAGCCCAAGGG TCTTGACTACTTGCACTAGGTATTATTGATCCAGTGGCACCATTACTAGTATTATTTCCATCCTGACTAGTACCATTGTTGTCGTTGTTTTGCTCCATCTATCATATTCAACTCATTAGCCAGAA TACATAAAT GATCATTGGATGGATCTCAAAATGGTAACAAAAATCAGATTTACTATAAAATATTCAATATAGGTAATATTAAAATAAAACTATTTAGTTATATTATCATCATTATACTTTTCTCTTCTTATTTTAGTCTTATCATT ATTCTTAACATGCACCAGTTAAAAAATAAATAAATAAAATTAGTACAAACCACAAGCACCACAGCACTAGTGCATTACGGTCATGTTTAGATTCAAATTTTTTTCTTCAAACTTCTAACTTTTCCGTCACATCAA ATGTTTGGACACATGCATGGAGCATTAAATGTGGAGAAAAAAACAATTGCACAGTTTGCATGTAAATTGTGAGACGAATCTTTTGAGCCTAATTACACCATGATTTGACAATGTGATGCTATAGTAAACATTT GTTAATGATAGATTAATTAGTCTTAATAAATTCATCTCGCAGTTTACAGGTGAAATCTGTAATTTGTTTTGTTATTAGTCTACATTTAATACTTCAAATGTATATCCATATACTTGAAAAAAAATTTGGCACACG AACTAAACACAGCCTACTTCGACGAAAAGAAAGTGCAGGAGCCTATCATGCTACACAAACACTAAGGCAAACACCTACTGGTGTACTAGTGCCACATACAGAGCTCTGGTTGTTTACACAAGATGTCTAGA AAGACATCACCATGAGTTCTGATGTTAACTCTTCAGTTCTAAAAGCTCCTTTGGCTGTCTCGTGACCCATCCACACATGCTACTAACACTAAGGGTGTGTAGGGTGTGTTTAGTTCACACCAAAA TTGAAAG TTTGGTTGAAATTGAAACGATGTGACGGAAAAGTTGAAGTTTACGTGTGTAGGAGAGTTTTGATGTGATGAAAAAGTTAAAAGTTTGAAGAAAAATTTTGGAACTAAACTCAGCCTAAAGGACTTATTATAGT GGAGTACATCCCATCCCAAGGGAAAACAAAACCCATACTGACACCACTCCTACATCTCACACACTGCCACTAGAGCTGTCACTACCCCCAACCCCACTCTGCAGAACAGTAAATGGTTTCACTCA GGTAG CAGACGCGGTGGTACAGGCGATAGGTGAGGCGCTCCAGAAACATAGGCTGTGTTTAGATGGTGGAAAAGTTGGGAGGTTGGGAGAAAGTTAGTAGTTTGGAGAAAAAGTTGGTAGTTTATGTGTGTACG AAAGTTTTCGATGTGATGTGATGTGATGGAAAGTTAGGAATTTGGGGGGAACTAAACACGGCCATAACTTCATTCTCACTGGAGCGAACAATAGTCGGCAGTTATTTTTATATACATATTTGTTA AAGAAGA AATATTACTGTCCATGGATATTAATGGCCGATAAATAGTATAAAAAACATTAAATATAGTAAGTGATTTAAATACATTCTGCAGAGGTATTAAAATAATTGTCATAATCTCGTTCCTTCAATCCA TTTTTTTCCA ACTAGTGATACCTCATCTGAGAATCACGGCGCCGAATTCCCTACTTGTGTGAGGCATTCCTTCTCTCACACTGATATCAGCCGACCCGATATCGTTGTTTCAGGTATCGGCCGTCTCAGGCTAAGTATCAA AATCATGTTCCATGATTATGACGTTATTATTCTCACTGATAAAATCATCAATCAATTATTCGGGAGTTAATAATATTTACCGTTAGATCGTTAGTATCATCATCCCAATATATAATACAGGTAAGCGAATTTAGT TAGAGATGATTAAGTAAAATAGTTGATGGACACAGTCTTGCCTTCTCTTTTGTTGTTCTTCCTCTGCATCCCACCTAATCAAATATACATGTCTTTGGTATTAATTTATATCTATATTTGTTATG CAGGACATTA GCTACTGGAACCAGCTACTAGGACCATAGATAGCTAGTTGATGTGACTCTACTGGAGAAAGAAAACCAACATGTAGGCCTAGTTTATTTCCCCCAAAATTTTTCCCAAAAACATCACATTGAATCTTTGGAC ATATGCATGGAGCATTAAATATAGATTAAAAAAACTAATTGCACAGTTAGGGGGAAAATCACGAGACGAATCTTTTGAGCCTTATTAATCCATGATTAGCCATAAGTGCTACAGTAATGCCAGCTGGGCGAG GAGAGGTGGCAGTGGTGGTGAGCCCAGCTGGGTGGATGTGTGGAGGGTGGAGAGGAGACGGGGAGGGAGGGAGGGAGGGAGAGAGGACTAGG 3.1 Compositional bias in biological sequences An obvious first summary of a DNA sequence is just the distribution of the four base types. Almost all empirical studies show an unequal distribution of the four bases

Promoter sequences Base content as a function of CDNA position, relative to the start of transcription sites, and averaged over all cDNAs with a 10-bp sliding window R Ice I-10-A TSS CDNA coord. 100b

Promoter sequences Base content as a function of cDNA position, relative to the start of transcription sites, and averaged over all cDNAs with a 10-bp sliding window 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -4 -3 -2 -1 0 1 cDNA coord, 100bp I-10-GC I-10-A I-10-T I-10-G I-10-C Rice TSS

Arabidopsis 0.45 a-10-GO 10-A 0.25 a-10-T a-10-G 0.2 a-10-C 0.15 0.1 0.0

Arab_10_A,T,G,C,GC 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -4 -3 -2 -1 0 1 a-10-GC a-10-A a-10-T a-10-G a-10-C Arabidopsis

Human 0.6 w~4 H-10-GC H-10-A H-10 0.3 H-10-G H-10-C

Human_10_A,T,G,C,GC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -4 -3 -2 -1 0 1 H-10-GC H-10-A H-10-T H-10-G H-10-C Human

Three patterns of base contents Rio Arabidopsis Human

Three patterns of base contents Rice Arabidopsis Human TSS

Neighboring bases are not independet P air Observed/Expected Example TG 1.29 CT 126 Dinucleotide frequencies CC in some vertebrate AG 1.16 Squences AA 15 CA 1.15 GG 1.14 Based on 166 vertebrate TT 1.07 sequences, totaling GA 1.0 136, 731 bases(Nussinov, TC 1.00 1984) GO 0.99 AT 0.85 AC 0.84 GT 0.82 Pn≠PuPv TA 0.65 CG 0.42

Neighboring bases are not independet Pair Observed/Expected TG CT CC AG AA CA GG TT GA TC GC AT AC GT TA CG 1.29 1.26 1.18 1.16 1.15 1.15 1.14 1.07 1.04 1.00 0.99 0.85 0.84 0.82 0.65 0.42 Example: Dinucleotide frequencies in some vertebrate squences. Based on 166 vertebrate sequences, totaling 136,731 bases (Nussinov, 1984) Puv ≠ PuPv

相邻碱基对观测频率/期望频率 人类 水稻 数据来自这两个 127 1.05 GG 122 1.03 物种目前注释出 1.20 来的所有基因的 TG 1.19 DNA序列,总长 AG 0.99 CT 0.99 各为168717,208 1.13 1.13 和1,506657,427 AA 1.13 个碱基(邱杰, GC 1.02 105 2016) 0.96 100 AT 0.88 1.02 0.84 0.84 AC 0.83 0.86 A 0.75 0.77 CG 0.26 0.83

相邻碱基对 观测频率/期望频率* 人类 水稻 CC 1.27 1.05 GG 1.22 1.03 CA 1.20 1.11 TG 1.19 1.11 AG 1.18 0.99 CT 1.15 0.99 TT 1.13 1.13 AA 1.13 1.11 GC 1.02 1.11 GA 0.99 1.05 TC 0.96 1.00 AT 0.88 1.02 GT 0.84 0.84 AC 0.83 0.86 TA 0.75 0.77 CG 0.26 0.83 数据来自这两个 物种目前注释出 来的所有基因的 DNA序列,总长 各为168,717,208 和1,506,657,427 个碱基 (邱杰, 2016)

3. 2 Alignment of pairs of sequences The most basic sequence analysis task is to ask if two sequences are related This is usually done by first aligning the sequences(or parts of them) and deciding whether that alignment is more likely to have occurred because the sequences are related, or just by chance Sequence alignment is the procedure of comparing two(pairwise alignment or more(multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that in the same order in the sequences

3.2 Alignment of Pairs of Sequences • The most basic sequence analysis task is to ask if two sequences are related. • This is usually done by first aligning the sequences (or parts of them) and deciding whether that alignment is more likely to have occurred because the sequences are related, or just by chance. • Sequence alignment is the procedure of comparing two (pairwise alignment) or more (multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that in the same order in the sequences

Web BLAST blastx Nucleotide BLAST tblastn Protein BLAST nucleotide b nucleotide protein> translated nucleotide BLAST Genomes Enteremansm common name scentific name, or tar d Search Standalone and API BLAST Download BLAST Use BLAST API Get BLAST databases and executables all BLAST from your application Specialized searches SmartBLAST Primer- BLAST Global Align CD-search primers specific to Compare two sequences VecScreen DART Multiple Allgnment Search immunoglobulins Search sequences for Find sequences with Align sequences using and T cell receptor contaminator similar conserved domain domain and protein MOLE- BLAST Establsh taxonomy for cultured or

点击下载完整版文档(PPTX)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共54页,可试读18页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有