正在加载图片...
articles The remaining 39 contigs containing 0.3% of the sequence were not We then merged the sequences from overlapping sequenced clones(Fig. 6), using the computer program GigAssembler. The Fingerprint clone contig program considers nearby sequenced clones, detects overlap Pick clones for sequencing between the initial sequence contigs in these clones, merges the overlapping sequences and attempts to order and orient the sequence contigs. It begins by aligning the initial sequence contigs from one clone with those from other clones in the same fingerprint clone contig on the basis of length of alignment, per cent identity of Sequence to at least draft coverag Bquenced-cione-contig scaffold the alignment, position in the sequenced clone layout and other factors. Alignments are limited to one end of each initial sequence contig for partially overlapping contigs or to both ends of an initial clone a Sequenced clone B sequence contig contained entirely within another; this eliminates Merge data internal alignments that may reflect repeated sequence or possible Merged sequence contig misassembly(Fig. 6b). Beginning with the highest scoring pairs, initial sequence contigs are then integrated to produce merged Order and orient with mRNA, paired end reads, other information Sequence-contig scaffold overlap between them and then rebuilds the seque at or within &r") quence contigs(usually referred to simply as The program refines the arrangement of the clones within the sequence itis. Next, the program selects a sequence path through the sequence contigs Figure 7 Levels of clone and sequence coverage. A fingerprint clone contig is (Fig. 6c). It tries to use the highest quality data by preferring longer assembled by using the computer program FPC4, s to analyse the restriction enzyme initial sequence contigs and avoiding the first and last 250 bases of digestion patterns of many large-insert clones. Clones are then selected for sequencing to initial sequence contigs where possible. Finally, it attempts to order minimize overlap between adjacent clones. For a clone to be selected, all of its restriction and orient the sequence contigs by using additional information, enzyme fragments(except the two vector-insert junction fragments)must be shared with including sequence data from paired-end plasmid and BAC reads, at least one of its neighbours on each side in the contig. Once these overlapping clones known messenger RNAs and ESTs, as well as additional linking have been sequenced, the set is a'sequenced-clone contig When all selected clones information provided by centres. The sequence contigs are thereby from a fingerprint clone contig have been sequenced, the sequenced-clone contig will be linked together to create ' sequence-contig scaffolds'(Fig. 6d).The the same as the fingerprint clone contig. Until then, a fingerprint clone contig may contain process also joins overla sequenced clones into several sequenoed-clone contigs. After individual clones (for example, A and B)have been clone contigs and links sequenced-clone contigs to form s equenced to draft coverage and the clones have been mapped, the data are analysed by clone-contig scaffolds. a fingerprint clone contig may contain sigAssembler(Fig 6), producing merged sequence contigs from initial sequence contigs, several sequenced-clone contigs, because bridging clones remain and linking these to form sequence-contig scaffolds(see Box 1) to be sequenced. The assembly contained 4, 884 sequenced-clone able 5 The draft genome sequence Sequence from clones (b) Sequence from contigs(kb) Finished clones Draft clones Pre-draft clones ished dones sequence configs All 826.441 1,734,9 131.476 B40815 893,175 72461 11057 B283 108,1 ,2 65.14 8465 68,98 32,42 78302 29,8 5 2,35 4.056 20222 2,056 02 2395 声如如mM可可bbL nes. Thus, the draft consists of approxmately one-third finished sequence, one-third deep coverage sequence and one-third draft/pre-draft coverage sequence In al of the statistics, we count only nonoverlapping bases in the draft gen 870 A@2001 Macmillan Magazines Ltd NATURE VOL 409 15 FEBRUARY 20011The remaining 39 contigs containing 0.3% of the sequence were not positioned at all. We then merged the sequences from overlapping sequenced clones (Fig. 6), using the computer program GigAssembler104. The program considers nearby sequenced clones, detects overlaps between the initial sequence contigs in these clones, merges the overlapping sequences and attempts to order and orient the sequence contigs. It begins by aligning the initial sequence contigs from one clone with those from other clones in the same ®ngerprint clone contig on the basis of length of alignment, per cent identity of the alignment, position in the sequenced clone layout and other factors. Alignments are limited to one end of each initial sequence contig for partially overlapping contigs or to both ends of an initial sequence contig contained entirely within another; this eliminates internal alignments that may re¯ect repeated sequence or possible misassembly (Fig. 6b). Beginning with the highest scoring pairs, initial sequence contigs are then integrated to produce `merged sequence contigs' (usually referred to simply as `sequence contigs'). The program re®nes the arrangement of the clones within the ®ngerprint clone contig on the basis of the extent of sequence overlap between them and then rebuilds the sequence contigs. Next, the program selects a sequence path through the sequence contigs (Fig. 6c). It tries to use the highest quality data by preferring longer initial sequence contigs and avoiding the ®rst and last 250 bases of initial sequence contigs where possible. Finally, it attempts to order and orient the sequence contigs by using additional information, including sequence data from paired-end plasmid and BAC reads, known messenger RNAs and ESTs, as well as additional linking information provided by centres. The sequence contigs are thereby linked together to create `sequence-contig scaffolds' (Fig. 6d). The process also joins overlapping sequenced clones into sequenced￾clone contigs and links sequenced-clone contigs to form sequenced￾clone-contig scaffolds. A ®ngerprint clone contig may contain several sequenced-clone contigs, because bridging clones remain to be sequenced. The assembly contained 4,884 sequenced-clone articles 870 NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com Fingerprint clone contig Sequenced-clone contig Pick clones for sequencing Merge data Sequenced clone A Sequenced clone B Sequence to at least draft coverage Initial sequence contig Sequenced-clone-contig scaffold Merged sequence contig Sequence-contig scaffold Order and orient with mRNA, paired end reads, other information A B Figure 7 Levels of clone and sequence coverage. A `®ngerprint clone contig' is assembled by using the computer program FPC84,451 to analyse the restriction enzyme digestion patterns of many large-insert clones. Clones are then selected for sequencing to minimize overlap between adjacent clones. For a clone to be selected, all of its restriction enzyme fragments (except the two vector-insert junction fragments) must be shared with at least one of its neighbours on each side in the contig. Once these overlapping clones have been sequenced, the set is a `sequenced-clone contig'. When all selected clones from a ®ngerprint clone contig have been sequenced, the sequenced-clone contig will be the same as the ®ngerprint clone contig. Until then, a ®ngerprint clone contig may contain several sequenced-clone contigs. After individual clones (for example, A and B) have been sequenced to draft coverage and the clones have been mapped, the data are analysed by GigAssembler (Fig. 6), producing merged sequence contigs from initial sequence contigs, and linking these to form sequence-contig scaffolds (see Box 1). Table 5 The draft genome sequence Chromosome Sequence from clones (kb) Sequence from contigs (kb) Finished clones Draft clones Pre-draft clones Contigs containing ®nished clones Deep coverage sequence contigs Draft/predraft sequence contigs All 826,441 1,734,995 131,476 958,922 840,815 893,175 1 50,851 149,027 12,356 61,001 78,773 72,461 2 46,909 167,439 7,210 53,775 81,569 86,214 3 22,350 152,840 11,057 26,959 79,649 79,638 4 15,914 134,973 17,261 19,096 66,165 82,887 5 37,973 129,581 2,160 48,895 61,387 59,431 6 75,312 76,082 6,696 93,458 28,204 36,428 7 94,845 47,328 4,047 103,188 14,434 28,597 8 14,538 102,484 7,236 16,659 47,198 60,400 9 18,401 77,648 10,864 24,030 42,653 40,230 10 16,889 99,181 11,066 21,421 54,054 51,662 11 13,162 111,092 4,352 16,145 65,147 47,314 12 32,156 84,653 7,651 37,519 43,995 42,946 13 16,818 68,983 7,136 22,191 38,319 32,429 14 58,989 27,370 565 78,302 3,267 5,355 15 2,739 67,453 3,211 3,112 34,758 35,533 16 22,987 48,997 1,143 27,751 20,892 24,484 17 29,881 36,349 6,600 33,531 14,671 24,628 18 5,128 65,284 2,352 6,656 40,947 25,160 19 28,481 26,568 369 32,228 7,188 16,003 20 54,217 5,302 976 56,534 1,065 2,896 21 33,824 0 0 33,824 0 0 22 33,786 0 0 33,786 0 0 X 77,630 45,100 4,941 83,796 14,056 29,820 Y 18,169 3,221 363 20,222 333 1,198 NA 2,434 1,858 844 2,446 122 2,568 UL 2,056 6,182 1,020 2,395 1,969 4,894 ................................................................................................................................................................................................................................................................................................................................................................... The table presents summary statistics for the draft genome sequence over the entire genome and by individual chromosome. NA, clones that could not be placed into the sequenced clone layout. UL, clones that could be placed in the layout, but that could not reliably be placed on a chromosome. First three columns, data from ®nished clones, draft clones and predraft clones. The last three columns break the data down according to the type of sequence contig. Contigs containing ®nished clones represent sequence contigs that consist of ®nished sequence plus any (small) extensions from merged sequence contigs that arise from overlap with ¯anking draft clones. Deep coverage sequence contigs include sequence from two or more overlapping un®nished clones; they consist of roughly full shotgun coverage and thus are longer than the average un®nished sequence contig. Draft/predraft sequence contigs are all of the other sequence contigs in un®nished clones. Thus, the draft genome sequence consists of approximately one-third ®nished sequence, one-third deep coverage sequence and one-third draft/pre-draft coverage sequence. In all of the statistics, we count only nonoverlapping bases in the draft genome sequence. © 2001 Macmillan Magazines Ltd
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有