正在加载图片...
articles well as targeted regions of mammalian genomes"-. These projects libraries with more uniform representation. The practice of sequen- showed that large-scale sequencing was feasible and developed the cing from both ends of double-stranded clones(double-barrelled two-phase paradigm for genome sequencing. In the first, 'shotgun, shotgun sequencing) was introduced by Ansorge and others"in phase, the genome is divided into appropriately sized segments and 1990, allowing the use of linking information between sequence each segment is covered to a high degree of redundancy(typically, fragments 35t to tenfold) through the sequencing of randomly selected The application of shotg was also extended ubfragments. The second is a'finishing'phase, in which sequence applying it to larger and larger DNA molecules--from plasm gaps are closed and remaining ambiguities are resolved through (4 kilobases(kb))to cosmid clones(40 kb), to artificial chro directed analysis. The results also showed that complete genomic mosomes cloned in bacteria and yeast(100-500 kb)and bacterial equence provided information about genes, regulatory regions and genomes(1-2 megabases(Mb). In principle, a genome of arbi In 1995, genome scientists considered a proposals that would formly sampled at random. beated s by the shotgun method, chromosome structure that was not readily obtainable from cDNA trary size may be directly sequenced by the shotgun method, studies alone genome in a first phase and then returning to finish the sequence in one detects overlaps by consulting an alphabetized look-up table of second phase. After vigorous debate, it was decided that such a all k-letter words in the data). Mathematical analysis of the plan was premature for several reasons. These included the need first expected number of gaps as a function of coverage is similarly to prove that high-quality, long-range finished sequence could be straightforward?. produced from most parts of the complex, repeat-rich human Practical difficulties arise because of repeated sequences and genome; the sense that many aspects of the sequencing process cloning bias. Small amounts of repeated sequence pose little were still rapidly evolving; and the desirability of further decreasing problem for shotgun sequencing. For example, one can readily costs assemble typical bacterial genomes(about 1. 5% repeat)or the Instead, pilot projects were launched to demonstrate the feasi- euchromatic portion of the fly genome(about 3% repeat). By bility of cost-effective, large-scale sequencing, with a target comple- contrast, the human genome is filled(> 50%) with repeated tion date of March 1999. The projects successfully produced sequences, including interspersed repeats derived from transposable finished sequence with 99.99% accuracy and no gaps. They also elements, and long genomic regions that have been duplicated in introduced bacterial artificial chromosomes( BACs)", a new large- tandem, palindromic or dispersed fashion(see below). These insert cloning system that proved to be more stable than the cosmids include large duplicated segments(50-500 kb) with high sequence and yeast artificial chromosomes(YACs) that had been used identity(98-99.9%), at which mispairing during recombination eviously. The pilot projects drove the maturation and conver- creates deletions responsible for genetic syndromes. Such features gence of sequencing strategies, while producing 15% of the human complicate the assembly of a correct and finished genome sequence genome sequence. With successful completion of this phase, the There are two approaches for sequencing large repeat-rich human genome sequencing effort moved into full-scale production genomes. The first is a whole-genome shotgun sequencing in march 1999 approach, as has been used for the repeat-poor genomes of viruses, The idea of first producing a draft genome sequence was revived bacteria and flies, using linking information and computational at this time, both because the ability to finish such a sequence was no longer in doubt and because there was great hunger in the scientific ommunity for human sequence data. In addition, some scientists Hierarchical shotgun sequencing favoured prioritizing the production of a draft genome sequence over regional finished sequence because of concerns about com- I that might be subject to undesirable restrictions on use" quence Genomic DNA nercial plans to generate proprietary databases of huma The consortium focused on an initial goal of producing, in a first production phase lasting until June 2000, a draft genome sequence overing most of the genome. Such a draft genome sequence, BAC library although not completely finished, would rapidly allow investigators dORseY to begin to extract most of the information in the human sequence Experiments showed that sequencing clones covering about 90% of organ the human genome to a redundancy of about four-to fivefold Chalf- clone contigs oal has been achieved as described belo The second sequence production phase is now under way. Its BAC to be aims are to achieve full-shotgun coverage of the existing clones sequenced during 2001, to obtain clones to fill the remaining gaps in the physical map, and to produce a finished sequence(apart from Shotgun regions that cannot be cloned or sequenced with currently available clones techniques)no later than 2003 Shotgun ..Ac Strategic issues TGATCATGCTTAAAcO AACCCTGTGCATCCTACTG oly .. ACCGTAAATGGGCTGATCATGCTTAAACCCTGTGCATCCTACTG Hierarchical shotgun sequencing the fundamental method for ln as introduc ncing methods 7. s, the Figure 2 idealized representation of the hierarchical shotgun sequencing strategy. A Soon after the invention of dna it has remained library is constructed by fragmenting the target genome and cloning it into a large- genome sequ the past 20 years. The approach has been refined and ext lake it more efficient. For example, improved prote for clones are selected and sequenced by the random shotgun strategy. Finally,the clone fragmenting and cloning DNA allowed construction of shotgun sequences are assembled to reconstruct the sequence of the genome NATURE VOL 409 15 FEBRUARY 200 .nature. com A⊙2 mcmillan Magazines Ltdwell as targeted regions of mammalian genomes34±37. These projects showed that large-scale sequencing was feasible and developed the two-phase paradigm for genome sequencing. In the ®rst, `shotgun', phase, the genome is divided into appropriately sized segments and each segment is covered to a high degree of redundancy (typically, eight- to tenfold) through the sequencing of randomly selected subfragments. The second is a `®nishing' phase, in which sequence gaps are closed and remaining ambiguities are resolved through directed analysis. The results also showed that complete genomic sequence provided information about genes, regulatory regions and chromosome structure that was not readily obtainable from cDNA studies alone. In 1995, genome scientists considered a proposal38 that would have involved producing a draft genome sequence of the human genome in a ®rst phase and then returning to ®nish the sequence in a second phase. After vigorous debate, it was decided that such a plan was premature for several reasons. These included the need ®rst to prove that high-quality, long-range ®nished sequence could be produced from most parts of the complex, repeat-rich human genome; the sense that many aspects of the sequencing process were still rapidly evolving; and the desirability of further decreasing costs. Instead, pilot projects were launched to demonstrate the feasi￾bility of cost-effective, large-scale sequencing, with a target comple￾tion date of March 1999. The projects successfully produced ®nished sequence with 99.99% accuracy and no gaps39. They also introduced bacterial arti®cial chromosomes (BACs)40, a new large￾insert cloning system that proved to be more stable than the cosmids and yeast arti®cial chromosomes (YACs)41 that had been used previously. The pilot projects drove the maturation and conver￾gence of sequencing strategies, while producing 15% of the human genome sequence. With successful completion of this phase, the human genome sequencing effort moved into full-scale production in March 1999. The idea of ®rst producing a draft genome sequence was revived at this time, both because the ability to ®nish such a sequence was no longer in doubt and because there was great hunger in the scienti®c community for human sequence data. In addition, some scientists favoured prioritizing the production of a draft genome sequence over regional ®nished sequence because of concerns about com￾mercial plans to generate proprietary databases of human sequence that might be subject to undesirable restrictions on use42±44. The consortium focused on an initial goal of producing, in a ®rst production phase lasting until June 2000, a draft genome sequence covering most of the genome. Such a draft genome sequence, although not completely ®nished, would rapidly allow investigators to begin to extract most of the information in the human sequence. Experiments showed that sequencing clones covering about 90% of the human genome to a redundancy of about four- to ®vefold (`half￾shotgun' coverage; see Box 1) would accomplish this45,46. The draft genome sequence goal has been achieved, as described below. The second sequence production phase is now under way. Its aims are to achieve full-shotgun coverage of the existing clones during 2001, to obtain clones to ®ll the remaining gaps in the physical map, and to produce a ®nished sequence (apart from regions that cannot be cloned or sequenced with currently available techniques) no later than 2003. Strategic issues Hierarchical shotgun sequencing Soon after the invention of DNA sequencing methods47,48, the shotgun sequencing strategy was introduced49±51; it has remained the fundamental method for large-scale genome sequencing52±54 for the past 20 years. The approach has been re®ned and extended to make it more ef®cient. For example, improved protocols for fragmenting and cloning DNA allowed construction of shotgun libraries with more uniform representation. The practice of sequen￾cing from both ends of double-stranded clones (`double-barrelled' shotgun sequencing) was introduced by Ansorge and others37 in 1990, allowing the use of `linking information' between sequence fragments. The application of shotgun sequencing was also extended by applying it to larger and larger DNA moleculesÐfrom plasmids (, 4 kilobases (kb)) to cosmid clones37 (40 kb), to arti®cial chro￾mosomes cloned in bacteria and yeast55 (100±500 kb) and bacterial genomes56 (1±2 megabases (Mb)). In principle, a genome of arbi￾trary size may be directly sequenced by the shotgun method, provided that it contains no repeated sequence and can be uni￾formly sampled at random. The genome can then be assembled using the simple computer science technique of `hashing' (in which one detects overlaps by consulting an alphabetized look-up table of all k-letter words in the data). Mathematical analysis of the expected number of gaps as a function of coverage is similarly straightforward57. Practical dif®culties arise because of repeated sequences and cloning bias. Small amounts of repeated sequence pose little problem for shotgun sequencing. For example, one can readily assemble typical bacterial genomes (about 1.5% repeat) or the euchromatic portion of the ¯y genome (about 3% repeat). By contrast, the human genome is ®lled (. 50%) with repeated sequences, including interspersed repeats derived from transposable elements, and long genomic regions that have been duplicated in tandem, palindromic or dispersed fashion (see below). These include large duplicated segments (50±500 kb) with high sequence identity (98±99.9%), at which mispairing during recombination creates deletions responsible for genetic syndromes. Such features complicate the assembly of a correct and ®nished genome sequence. There are two approaches for sequencing large repeat-rich genomes. The ®rst is a whole-genome shotgun sequencing approach, as has been used for the repeat-poor genomes of viruses, bacteria and ¯ies, using linking information and computational articles NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com 863 Genomic DNA BAC library Organized mapped large clone contigs BAC to be sequenced Shotgun clones Assembly Shotgun sequence ...ACCGTAAATGGGCTGATCATGCTTAAA ...ACCGTAAATGGGCTGATCATGCTTAAACCCTGTGCATCCTACTG... TGATCATGCTTAAACCCTGTGCATCCTACTG... Hierarchical shotgun sequencing Figure 2 Idealized representation of the hierarchical shotgun sequencing strategy. A library is constructed by fragmenting the target genome and cloning it into a large￾fragment cloning vector; here, BAC vectors are shown. The genomic DNA fragments represented in the library are then organized into a physical map and individual BAC clones are selected and sequenced by the random shotgun strategy. Finally, the clone sequences are assembled to reconstruct the sequence of the genome. © 2001 Macmillan Magazines Ltd
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有