Predicting Genes in Eukaryotic Genomes By computer Hao bailin(郝柏林) T-Life research Center, Fudan University Beijing Genomics Institute, Academia sinica Institute of Theoretical Physics, Academia Sinica (www.itp.ac.cn/hao/
Predicting Genes in Eukaryotic Genomes By Computer Hao Bailin (郝柏林) T-Life Research Center, Fudan University Beijing Genomics Institute , Academia Sinica Institute of Theoretical Physics, Academia Sinica (www.itp.ac.cn/~hao/)
The Central Dogma of molecular Biology replication DNA DNA reverse transcription transcription CDNA mrNA translation Protein/Enzyme folding Function Structure Interaction
The Central Dogma of Molecular Biology replication DNA DNA reverse transcription transcription cDNA mRNA translation Protein/Enzyme folding Function Structure interaction
Eukaryote Prokaryote 10∞m Nucleolus Nucleoid capsule Cell wall Nucleus Ribosomes Cell membrane Flagellum Mitochondrion
Human family Nuclear genome 力 Mitochondrial genomes o
2 4 5 6 8 9 101112 自 131415161718 192021 22
DNA(脱氧核糖核酸)序列 由4种字母(核苷酸、碱基a,c,g,t)组成 长度:单条染色体从几千到几千万个字母 ·人有23对染色体;黑猩猩有24对;小鼠有 19对;水稻有12对;猕猴桃有300对 ·染色体的一部分编码蛋白质;其余是控制 信号,重复片段,意义不明的“随机”字 母串,等等
DNA(脱氧核糖核酸)序列 • 由4种字母(核苷酸、碱基a, c, g, t )组成 • 长度:单条染色体从几千到几千万个字母 • 人有23对染色体;黑猩猩有24对;小鼠有 19对;水稻有12对;猕猴桃有300对 • 染色体的一部分编码蛋白质;其余是控制 信号,重复片段,意义不明的“随机”字 母串,等等
Large-Scale DNA Sequencing Since 1977 Sanger method polymerization stopping Maxam-Gilbert: chemical degradation Each reaction: 500-600 bp(a single read) Clone by clone vs. whole-genome shotgun Sequence assembling: reads -contigs scattolds-superscaffolas Automatic sequencer: MegaBace, 96 or 384 channels
Large-Scale DNA Sequencing Since 1977 • Sanger method: polymerization stopping • Maxam-Gilbert: chemical degradation • Each reaction: 500-600 bp (a single read) • Clone by clone vs. whole-genome shotgun • Sequence assembling: reads – contigs – scaffolds – superscaffolds • Automatic sequencer: MegaBace, 96 or 384 channels
Letter production at BGi (Beijing Hangzhou) Daily: 5X10 Yearly: 100
Letter production at BGI (Beijing + Hangzhou) Daily: 5 x107 Yearly: 1010
已经测序的真核生物基因组 酿酒酵母( Saccharomyces cerevisiae) 列解酵母( Schizosaccharomyces pombe) 秀丽线虫( Caenorhabitatis elegans) 果蝇( Drosophila melanogaster 恶性疟疾原虫( Plasmodium falciparum) 岗比亚按蚊( Anopheles gambiae) 智人( Homo sapiens)、黑猩猩( Pan troglodytes 小鼠( Mus musculus)、大鼠( Rattus norvegicus) 家犬( Canis familiaris)、家鸡( Gallus gallus))、家猪( Sus scrofa) 河豚鱼( Fugu rubripes)) 家蚕( Bambyx mori)、蜜蜂( Apsis mellifera) 拟南芥( Arabidopsis thaliana)、水稻( Oryza sativa) 玉米( ea mays
已经测序的真核生物基因组 • 酿酒酵母 (Saccharomyces cerevisiae) • 列解酵母 (Schizosacchromyces pombe) • 秀丽线虫 (Caenorhabitatis elegans) • 果蝇 (Drosophila melanogaster) • 恶性疟疾原虫 (Plasmodium falciparum) • 岗比亚按蚊 (Anopheles gambiae) • 智人 (Homo sapiens)、黑猩猩 (Pan trogodytes) • 小鼠 (Mus musculus)、大鼠 (Rattus norvegicus) • 家犬 (Canis familiaris)、家鸡 (Gallus gallus)、家猪(Sus scrofa) • 河豚鱼 (Fugu rubripes) • 家蚕 (Bambyx mori)、蜜蜂 (Apsis mellifera) • 拟南芥(Arabidopsis thaliana)、水稻(Oryza sativa) • 玉米 (Zea mays)
cccaatatcttgcttcagcaagatattgggtatttctagctttcctttcttcaaaaattgo gttagcagaaaagccttatccattaagagat ggaacttcaagagcagctaggtctagagggaagttgt gagcattacgttcgtgc gatatcagccc aagtattaataacgcgaccttggctatcaactacagattggttgaaattgaatcc gaaccaaatccctactacaggcc aagaagaagt gtaaagaacgagagttgttaaaactagcata ataaccatgagcggccacaatattatagttcttectcttgaccaatct gtacctcattageagattcgtttcag actagaggtaccaaggaaccat gcatagcactgatagggaa gaatacaccagctacacctaacat gtga aat ggatgcataagga gaaacaggctcacgaataccatcaatatctactggaggggcage gat gaaggcgataataaatacagaagttgc ggtcaataaggtagggatcatcaaaacaccgaacc gtttcggt gctagttatccagttgcagaagcga ctctaaaattgcagtcatggtaagatcttggtttattcaaattgcaaggactcccaagcacac gtat aagataatagaaggcttgttatttaacagtataatatagactatataccaat gtcaaccaagccagcccc gacagttgtatatccatacaacaaaatttaccaaaccaaaaaatttt gtaaatgaagtgagtgaaa aat caaaactcagattgctccttctagtttccatatgggttgo gataattattccttgttacaatagagaaaaaacctctccccaaatcgt gcttgcatttttcattgcacacgactttecctatgtagaaataggc tatttctattccgaagaggaagtctactaattt ttatcaat catttctagtttattagtttgttaatgattaattaagaggattcaccagat cattgatacggagaatatcca aataccaaatacgctcactgtgcgatcca ccaaccattgcataaaagctcgtacc gtgcttttatgtttacgagctaa gaaagtcgaagtata tactttagtcgatacaaagtcttctttttt aaatcggtccaaattggtttact aataggatgeccc gatccagtacaaaattg gagaggagtaacagg gactttggtatcgaatttttcatttgagtatctat gaagcaagagt tttctaattggtttagatggatcctttgcggttgagtccaaaaa aaggtaacatttccatttcttcttc cccaaaat gttccatcttcctagaaaagatgattcgttccagaaaggttccgga agaagttaatcgcaagcaagaagattgtttacgaaga ggatttcattgaagt aataaaactattccaattcgagtagtagttgagaaagnatcgcaataaat gcaag gatggaacatcttggatccggtattgaaggagttgaagcaagatatccaaat ggataggatagggtatttctat atgt gctagat aatgtaagt gcaaaaatttgtctt tgatgaatagatcgtaaattctgaaactttgg gagaat gggatttctacaacgatcgcaaaccctca gatagaatct gagaataaactca gttgtaatccaataatcgatcttgttaggat gattaaccaaattaatccaaaattctg gatacattcgaatcattaacc gtcacaagtagtgaactaatttcttgttattaga tatcaatatactgcttcttttaca tectttcaatccaaat caagaatattagoatttctaaaaaaaagaaaaaat caaaggtctactcataggaaaaccagct attagatcagggagttcttccaattaagaagtta agctcgttgetttttgttttaccagaattggp agactctaccaagagtctggacg aattttttgcttcatccaaat attaaaagaaagattttatcac cattgaaaacactcagatac aatagggagtgaggataaa aatagttagacaacaagg tttcaaggaaatagagaatagctggaacatttaaacaagtttgattctttatc ggatcataaaaacctactttt tttctcctcatacggctcgagaatatgacttgcattaatttccgtacagaaaaaacaaa tttcatttatactcat gacte gtttacttgttcgggaatcctttatcttt gatttgtgaaatccttgggttt gagcgattgattctgatagactttaatcaaaagagttcccatatcttccaaaattggactttcttctta ttttaaccttttgatttctatattat acccagcgaaaattcggttcgggac gaatag aacagactatgtcgagccaagago aacattcctctaatttcat atggat ggaatcat ga atagtcattagittcgttttttgtatactaa gcaat gaaaagaaagtttttggtagttatagaat ctcgaataccaaaagaaagaaaaaaat gaagtaaaaasgacacatttect ot aattaatttaattaatatat tagattagcaatctattttccataccattttccg ttctccaacaaataaactatca cgcatttcctgtaaagtaaaattaaggtctttactttacttatttttctttacctaaaag caaaattgaatccattctatctaacgag cagttcttatcttatctttaccgggatggatcattctggatatttaaaaatcgcggatcgagatcg ccttttactaatatttatctctat cataaatctatctctaccata dcgttttatacaatoft ctacgtcaagtttaaaattttcat gaa gattat gtttct gagacagaaaat gaacgcattaggactgcatc gatctaagagtttataagag gcagaatacaatac gatttcatctttcgtttcatc agaaaaaatct gegacggaaggattcgaacctcc gagtaacggga ttaccacttggccacgccccatttcgggttttatgcgacactaataaac cttattcgtcaatcctactt gegaataatatagaatccaaaa aatccaattatctactctacaagaa tttttctattagectttgtttggcaag ctgctgtaagttttcgatgaaatct tacgttgagcaggtatctttaggtaaccgc tggagattatgtaatgcttactct gegggtcagaattcctt ccaattcgaaagtcccaaacgatccga gaaaggtttccgt gatatgacagaggcaagaaataacgattgcaaaaaatcc ttcctttttcttcaaaagttcaaaaa cagatagatttctcttcagcaggcatttccatataggacttgttataataaaacaagca ggttatagaaaaaaactcttttttttattatttatcaacaaago gtcttatcaaaccaacccaccccataaaattggaaa gaaagataaagtaagtggacctg gaggcctctatccgctattctgatatataattcgatg agacttagaccaca a oraac attctcgctatttactatttcatattcttgttactagat gttctatagedgaac cgaaagtcaattttct gagcgagt gggaaaagggaggttacttttcattttcttaaaaat aggcttcttggaaatag gtttatttctat agtataagaaaaactaat gaatccatagr tcttatacccat gcaataga tcgagaccattgaaaaaaggcattgaacgagnaaaaatcgtccacagataatctatcgtatg gatataaggtgctcggaaat ggttgaagtaattgnatag gtagagttactaaagaagaaaat gatttatttgatattat ggacgactggtta cgaagggaccgtttgttttgtaggat ggtctggcctattgcttttccttgtgettatttcgo tggttacagggacaactttgtaacttettggtataccca ggattgge gagttectatttggaaggttgcaatttcttaaccgcagca gccaatagtttagcacacto actct ttgttgctactatggggcccggaagcacaaggggatttactcgttggt gtcaattaggt ggtctgt ggacttttgttgctctccatggggc tttgaacttgctcggtctgttcaattgc ggccttataatgcaatttcattctctg ccgtattcctgatttatccactggggcaatccggttggttctttgcgecgagtttggcgtagc gatatttcgattcatcctcttcttccaaggatttcataattggacgttg catat gat gggagttgccggagtattaggcgcggctctge
cccaatatcttgcttcagcaagatattgggtatttctagctttcctttcttcaaaaattgctatatgttagcagaaaagccttatccattaagagat ggaacttcaagagcagctaggtctagagggaagttgtgagcattacgttcgtgcattacttccataccaagattagcacggttgat gatatcagccc aagtattaataacgcgaccttggctatcaactacagattggttgaaattgaatccgtttagattgaaagccatagtactaatacctaaagcagt gaaccaaatccctactacaggccaagcagcc aagaagaagt gtaaagaacgagagttgttaaaactagcatattggaagattaatcggccaaa ataaccatgagcggccacaatattataagtttcttcctcttgaccaaatctgtaaccctcattagcagattcgttttcagtggtttccctgatcaaactagaggttaccaaggaaccatgcatagcactgaatagggaaccgccgaatacaccagctacacctaacat gtgaaatggatgcataaggat gttatgctctgcctggaatacaatcataaagttgaaagtaccagatattcctaaaggcataccatcagagaaacttccttgaccaatagggtaaatcaagaaaacagcagtagcagctgcaacaggagct gaatatgcaacagcaatccaaggacgcatacccagacggaaactcagttcccact cacgacccatataacaagctacaccaagtaagaagtgtagaacaattagctcataaggaccaccattgtataaccactcatcaacagat gcagcttcccaaattgggtaaaagtgcaatccgatc gccgcagaagtaggaataat ggcaccagagataatattgtttccgtaaagtaaagaacca gaaacaggctcacgaataccatcaatatctactggaggggcagcgat gaaggcgataataaatacagaagttgcggtcaataaggtagggatcatcaaaacaccgaaccatccgatgtaaagacggttttcggtgctagttatccagttgcagaagcgaccccacaggcttgtactttcgcgtct ctctaaaattgcagtcatggtaagatcttggtttattcaaattgcaaggactcccaagcacacgtattaactagaaagataatagaaggcttgttatttaacagtataatatagactatataccaat gtcaaccaagccagccccgacagttgtatatccatacaacaaaatttaccaaaccaaaaaatttt gtaaatgaagtgagtgaaaaatcaaaactcagattgctcctttctagtttccatatgggttgcccgggactcgaacccggaactagtcggatggagtagataattattccttgttacaatagagaaaaaacctctccccaaatcgt gcttgcatttttcattgcacacgactttccctatgtagaaataggc tatttctattccgaagaggaagtctactaatttttttagtagtaagttgattcacttactatttattatagtacagagaacatttcagaatggaaact gtgaaagttttaccttgatcatttatcaatcatttctagtttattagttttgtttaatgattaattaagaggattcaccagatcattgatacggagaatatcca aataccaaatacgctcactgtgcgatccacggaaagaaaagtaagttgttttggcgaacatcaaagaaaaaacttgctcttcttccgtaaaaaattcttctaaaaataccgaacccaaccattgcataaaagctcgtaccgtgcttttatgtttacgagctaaagttctagcgcatgaaagtcgaagtata tactttagtcgatacaaagtcttcttttttgaagatccactgtgataatgaaaaagatttctacatatccgaccaaaccgatcaagaatatcccaatccgataaatcggtccaaattggtttact aataggatgccccgatccagtacaaaattgggcttttgctaaagatccaat gagaggagtaacagg gactttggtatcgaattttttcatttgagtatctattagaaatgaattctccagcatttgattccttactaacaaagaatttattggtacacttgaaaagtaccccagaaaatcgaagcaagagttttctaattggtttagatggatcctttgcggttgagtccaaaaagagaaagaatattgccacaaacggac aaggtaacatttccatttcttcttcaaaagaagagttccttttgatgcaagaattgcctttccttgatatcgaacataatgcataaggggatccataacgaaccatatggttttccgaaaaaaagcagggtacattaacccaaaat gttccatcttcctagaaaagatgattcgttccagaaaggttccgga agaagttaatcgcaagcaagaagattgtttacgaagaaacaacaagaaaaattcatattctgatacataagagttatataggaaccgaaatagtcttttattttcttttttcaaaataaaaatggatttcattgaagtaataaaactattccaattcgagtagtagttgagaaagaatcgcaataaatgcaag gatggaacatcttggatccggtattgaaggagttgaagcaagatatccaaat ggataggatagggtatttctatatgtgctagataatgtaagt gcaaaaatttgtcttctaaaaaaggaaatat tgaatgaatagatcgtaaattctgaaactttggtatttctttttcttccggacaagactgttctcgtagc gagaatgggatttctacaacgatcgcaaacccctcagatagaatctgagaataaaactcagaataaaaaaaattgttgtaatccaataatcgatcttggttaggatgattaaccaaattaatccaaaaattctgctgatacattcgaatcattaaccgtttcacaagtagtgaactaaatttcttgttattaga accaataatttcgacaagttcggaaccatttaatccataatcatgggcaaacacataaat gtactcctgaaagagtagtgggtagacgaaatattgtctaggaaatttaagtttttctgaataaccctcgaatttttccatttgtatttctacttgaatcagagagagagaaatatttctcggtttatcaaatggt gatacatagtacaatatggtcagaacagggt gttgcattttttaatacaaacccctggggaagaaaaggagtctaatccacggatctttttccgctccttttctatccaatttgtttatgtttgt tctaattacaaaagagaacaaatcctttatttttgcaggccaattgctcttttgactttgggatacagtctctt tatcaatatactgcttcttttacacattcaatccataacatccttttcaatccaaaatcaagaataattaggatttctaaaaaaaaaagaaaaaatcaaaggtctactcataggaaaaccagcttttccctacatcaggcactaatctatttttaacgtctaattagatcagggagttcttccaattaagaagtta agctcgttgctttttgttttaccagaattggagccaggctctatccatttattcattagacccagaaaatcagaatttttttattccattccaaaaatccaaaataagaaattgattttattacgacatgctattttttccattcattacccttgaggatcagtcgcggtcttatagactctaccaagagtctggacg aattttttgcttcatccaaatgtgtaaaagatcatagtcgcacttaaaagccgagtactctaccattgagttagcaacccagataaactaggatcttagatacgatcgaaatccaaaaatcaatggaattacaccgcacacccctgtcaaaatcttaaaatagcaagacattaaaagaaagattttatcac cattgaaaacactcagataccaaaaggaacgggtctggttaaatttcactaaggttaaaagt ggcaccaatcacgatcgtaaaattgtcatttttttagcatttttatttaaataaataaataaatcttgtatgagagtacaaacaagagggacaaccctaccatttgagcaaagtgtaggcaaaaaacct aatagggagtgaggataaagagacttatccatctacaaattctagatgttcaatggacctttgtcaatggaaatacaatggtaagaaaaaaattagatagaaaaactcaaaaaaataaaggcttatgttggattggcacgacataaatccagtcaaaaataggattaagaaagaggcaaattatttcta aatagttagacaacaagggatactagtgagcctctcctagttttttattcatttagttcttcaattaactcaaagttctttctttttctttaaagaattccgccttccttaaaatatcagaaacggttcttgtaggttgagcacctttttcaaggaaatagagaatagctggaacatttaaacaagtttgattctttatc ggatcataaaaacctacttttcgaagatctcttccttctcttcgagatcgaacatcaattgcaacgattcgatagacagcttattgggatagatgtagataaataaagccccccctagaaacgtataggaggttttctcctcatacggctcgagaatatgacttgcattaatttccgtacagaaaaaacaaa tttcatttatactcatgactcaagttgactaattttgattgacagacttgaaagaaaaaaatcctttgaaattttttgagtcgtctctaaactcttttctttgcctcatctcgaacaaattcacttttattccttattccggtccaattctattgttgagacagttgaaaatcgtgtttacttgttcgggaatcctttatcttt gatttgtgaaatccttgggtttaaacattacttcgggaattcttattcttttttctttcaaaagagtagcaacatacccttttttcttatttccttcgataaagcatttccctcttctatagaaatcgaatatgagcgattgattctgatagactttaatcaaaagagttttcccatatcttccaaaattggactttcttctta ttttaaccttttgatttctatattatttcgatttctatattaagggtagaatgacaaagttggcctaatttattagttttcactaaccctagattctttcccttgataaaaaataaattctgtcctctcgagctccatcgtgtactatttacttagcttacttacaaacaacccagcgaaaattcggttcgggacgaatag aacagactatgtcgagccaagagcattttcattactatggaaaatggt ggatagcaaaatccacaatcgatcgt gtccttcaagtcgcacgttgctttctaccacatcgttttaaacgaagtttt aacataacattcctctaatttcattgcaaagtgttatagggaattgatccaatatggat ggaatcat ga atagtcattagtttcgttttttgtatactaattcaaacttgctttgctatctatggagaaatatgaataaaagaaattaagtatttatcgggaaagactccgcaaagagccaatttatttaaacccatattctatcatatgaatgaaatatagttcgaaaaaagggaataaacaagtttgcttaagacttatttattat ggaatttccatcctcaacagaggactcgagatgatcaatccaatcctgaaatgataagagaagaattgactcttctccaacaaataaactatcaacctcccgtttaattaatttaattaatatattagattagcaatctatttttccataccatttttccgtaacaaaactaattaactattaactagttaaactatt gcaatgaaaagaaagttttttggtagttatagaattctcgtatttcttcgactcgaataccaaaagaaagaaaaaaat gaagtaaaaaaaacgcatttcctgtaaagtaaaattaaggtctttgcttttacttattttttcttttacctaaaagaagcaactccaaatcaaaattgaatccattctatctaacgag cagttcttatcttatctttaccgggatggatcattctggatatttaaaaaatcgcggatcgagatcgtttttgcttaaccaaagaaagaaaaagaagaaggaaccttttttactaataaaatact ataaaaaaaatttatctctatcataaatctatctctaccataaaggaataggtctcgttttttatacaatgtt ctacgtcaagtttaaaattttttcatgaaaaaaagattttcaatttgactggacttgacactggattatgttttctgagacagaaaat gaacgcattaggactgcatcgaatctaagagtttataagagaaaaaaattctctttaataaactttatgtctcgtgcagaatacaatacgatttcatctttcgtttcatc agaaaaaatctgggacggaaggattcgaacctccgagtaacgggaccaaaacccgctgccttaccacttggccacgccccatttcgggttttatgcgacactaataaacagtattatgtttattt cttattcgtcaatcctacttcaattacataaaaatggggggtattctcttggtaggattctagacat gcgaataatatagaatccaaaaaatgcattgatcattacatggaattctattaagatattatatgaaagtcgaatttcttccactctcatttgagagtgcgaatacaaggaggtattttgtgttt gggaaagtccgaagaaaaaaggattttgaatcctccttttcctttttcccttagaaaaataactcaatcaa aatccaattatctactctacaagaacgaaacgcttgttatgcctaatatacttagtttaacctgtatttgttttaattctgttatttatccgactagttttttcttcgccaaattgcccgaagct tatgccattttcaatccaatcgtggattttatgcctgtcatacctgtactcttttttctattagcctttgtttggcaag ctgctgtaagttttcgatgaaatctttactactctgtctgccaaattgaatcatgtattcattctaaaaaaattcgaaaaatggataagagccgagaagtcttatattatgaaccttcgattctaaaattcaaattcttctacattgaatgtatagctgcagcaataaatttggatcagcctttctactccctgcatc tacgttgagcaggtatctttaggtaaccgcacaatacctaacctaatttattgataagagt gcttattataaatcaattcttgcaatttttttcaaaaattgatttttgcatttttaggtgtcaaaataaacaaaacccatcctagtggatttgtgtggtaaggaaaaacgggtaatctattccttaaaaaaaaatct tggagattatgtaatgcttactctcaaactttttgtttatacagtagtgatattctttgtttccctctttatctttggattcttatctaatgatccaggacgtaatcctgggcgt gacgagtaaaaatccaaaattttttcttacaaattggatttgtttcatacatttatctacgagaaaatccgggggtcagaattcctt ccaattcgaaagtcccaaacgatccgagggggcggaaagagagggattcgaaccctcggtacaaaaaaattgtacaacggattagcaatccgccgctttagtccactcagccatctctccccgtt ccaaatcgaaaggtttccgtgatatgacagaggcaagaaataacgattgcaaaaaatcc ttcctttttctttcaaaagttcaaaaaaattatattgccaattccattttagttatattcttttttcttaatgttaataaaaaaaagaagaaaattcttcttttttctttctaattctaaaattggatattggctaaaagacaatcagatagattttctcttcagcaggcatttccatataggacttgttataataaaacaagca ggttatagaaaaaaactcttttttttattatttatcaacaaagcaaaaaggggtcttatcaaaccaacccaccccataaaattggaaagaaagataaagtaagtggacctgactccttgaat gaggcctctatccgctattctgatatataaattcgatgtagatgaaattgtataagtggatttttttgtatttc cttagacttagaccacgcaaggcaagaatttctcgctatttactatttcatattcttgttactagatgttctataggaataagaagaaatcgcaacccctttccgctacacataaaaatggattt cgaaagtcaatttttcttttcaatatctttactttttttcagaatcctatttttgttcttatacccatgcaataga gagcgagtgggaaaagggaggttactttttttcattttttccttaaaaaataggctttcttggaaataggaatcatggaataatctgaattccaatgtttatttctatagtataagaaaaactaattgaatcaaattcatggatttaccacgacctcggct gtgaccccatagataaaaatgcaaaatttctatct tcgagaccattgaaaaaaggcattgaacgagaaaaaatcgtccacagataatctatcgtatgccttggaagtgatataaggtgctcggaaat ggttgaagtaattgaataggaggatcactatga ctatagcccttggtagagttactaaagaagaaaat gatttatttgatattatggacgactggtta cgaagggaccgttttgtttttgtaggatggtctggcctattgctttttccttgtgcttatttcgctttaggaggttggtttacagggacaacttttgtaacttcttggtatacccatggattggc gagttcctatttggaaggttgcaatttcttaaccgcagcagtttccacccct gccaatagtttagcacactct ttgttgctactatggggcccggaagcacaaggggattttactcgttggtgtcaattaggt ggtctgtggacttttgttgctctccatggggcttttgcactaataggtttcatgttacgtcaatt tgaacttgctcggtctgttcaattgcggccttataatgcaatttcattctctggcccaatcgctgtttttgttt ccgtattcctgatttatccactggggcaatccggttggttctttgcgccgagttttggcgtagcagcgatatttcgattcatcctcttcttccaaggatttcataattggacgttgaacccattt catatgatgggagttgccggagtattaggcgcggctctgctatgcgctattcatggggcaaccgtgga