正在加载图片...
THE HUMAN GENOME map and sequence the human ge- coverage and to use the unordered and unori- 1 Sources of DNA and Sequencing This section discusses the rationale donor selection to changes fied whol posed to do genome over a inogeographi terim assembled se An Overview of the Predicted Protein-group.I terly. The modifications included a plan to Coding Genes in the Human Genome ood was form random shotgun sequencing to om males, 30 ml of whole, heparinized blood was 1306 16 february 2001 vol 291 science www.sciencemag.orgneously map and sequence the human ge￾nome by means of end sequences from 150- kbp bacterial artificial chromosomes (BACs) (17, 18). The end sequences spanned by known distances provide long-range continu￾ity across the genome. A modification of the BAC end-sequencing (BES) method was ap￾plied successfully to complete chromosome 2 from the Arabidopsis thaliana genome (19). In 1997, Weber and Myers (20) proposed whole-genome shotgun sequencing of the human genome. Their proposal was not well received (21). However, by early 1998, as less than 5% of the genome had been se￾quenced, it was clear that the rate of progress in human genome sequencing worldwide was very slow (22), and the prospects for finishing the genome by the 2005 goal were uncertain. In early 1998, PE Biosystems (now Applied Biosystems) developed an automated, high￾throughput capillary DNA sequencer, subse￾quently called the ABI PRISM 3700 DNA Analyzer. Discussions between PE Biosystems and TIGR scientists resulted in a plan to under￾take the sequencing of the human genome with the 3700 DNA Analyzer and the whole-genome shotgun sequencing techniques developed at TIGR (23). Many of the principles of operation of a genome-sequencing facility were estab￾lished in the TIGR facility (24). However, the facility envisioned for Celera would have a capacity roughly 50 times that of TIGR, and thus new developments were required for sam￾ple preparation and tracking and for whole￾genome assembly. Some argued that the re￾quired 150-fold scale-up from the H. influenzae genome to the human genome with its complex repeat sequences was not feasible (25). The Drosophila melanogaster genome was thus chosen as a test case for whole-genome assem￾bly on a large and complex eukaryotic genome. In collaboration with Gerald Rubin and the Berkeley Drosophila Genome Project, the nu￾cleotide sequence of the 120-Mbp euchromatic portion of the Drosophila genome was deter￾mined over a 1-year period (26–28). The Dro￾sophila genome-sequencing effort resulted in two key findings: (i) that the assembly algo￾rithms could generate chromosome assemblies with highly accurate order and orientation with substantially less than 10-fold coverage, and (ii) that undertaking multiple interim assemblies in place of one comprehensive final assembly was not of value. These findings, together with the dramatic changes in the public genome effort subsequent to the formation of Celera (29), led to a modi￾fied whole-genome shotgun sequencing ap￾proach to the human genome. We initially pro￾posed to do 10-fold sequence coverage of the genome over a 3-year period and to make in￾terim assembled sequence data available quar￾terly. The modifications included a plan to per￾form random shotgun sequencing to ;5-fold coverage and to use the unordered and unori￾ented BAC sequence fragments and subassem￾blies published in GenBank by the publicly funded genome effort (30) to accelerate the project. We also abandoned the quarterly an￾nouncements in the absence of interim assem￾blies to report. Although this strategy provided a reason￾able result very early that was consistent with a whole-genome shotgun assembly with eight￾fold coverage, the human genome sequence is not as finished as the Drosophila genome was with an effective 13-fold coverage. However, it became clear that even with this reduced cov￾erage strategy, Celera could generate an accu￾rately ordered and oriented scaffold sequence of the human genome in less than 1 year. Human genome sequencing was initiated 8 September 1999 and completed 17 June 2000. The first assembly was completed 25 June 2000, and the assembly reported here was completed 1 Octo￾ber 2000. Here we describe the whole-genome random shotgun sequencing effort applied to the human genome. We developed two differ￾ent assembly approaches for assembling the ;3 billion bp that make up the 23 pairs of chromo￾somes of the Homo sapiens genome. Any Gen￾Bank-derived data were shredded to remove potential bias to the final sequence from chi￾meric clones, foreign DNA contamination, or misassembled contigs. Insofar as a correctly and accurately assembled genome sequence with faithful order and orientation of contigs is essential for an accurate analysis of the human genetic code, we have devoted a con￾siderable portion of this manuscript to the documentation of the quality of our recon￾struction of the genome. We also describe our preliminary analysis of the human genetic code on the basis of computational methods. Figure 1 (see fold-out chart associated with this issue; files for each chromosome can be found in Web fig. 1 on Science Online at www.sciencemag.org/cgi/content/full/291/ 5507/1304/DC1) provides a graphical over￾view of the genome and the features encoded in it. The detailed manual curation and inter￾pretation of the genome are just beginning. To aid the reader in locating specific an￾alytical sections, we have divided the paper into seven broad sections. A summary of the major results appears at the beginning of each section. 1 Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome Evolution 6 A Genome-Wide Examination of Sequence Variations 7 An Overview of the Predicted Protein￾Coding Genes in the Human Genome 8 Conclusions 1 Sources of DNA and Sequencing Methods Summary. This section discusses the rationale and ethical rules governing donor selection to ensure ethnic and gender diversity along with the methodologies for DNA extraction and li￾brary construction. The plasmid library con￾struction is the first critical step in shotgun sequencing. If the DNA libraries are not uni￾form in size, nonchimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome se￾quence. We used automated high-throughput DNA sequencing and the computational infra￾structure to enable efficient tracking of enor￾mous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of se￾quence). Sequencing and tracking from both ends of plasmid clones from 2-, 10-, and 50-kbp libraries were essential to the computational reconstruction of the genome. Our evidence indicates that the accurate pairing rate of end sequences was greater than 98%. Various policies of the United States and the World Medical Association, specifically the Declaration of Helsinki, offer recommenda￾tions for conducting experiments with human subjects. We convened an Institutional Re￾view Board (IRB) (31) that helped us estab￾lish the protocol for obtaining and using hu￾man DNA and the informed consent process used to enroll research volunteers for the DNA-sequencing studies reported here. We adopted several steps and procedures to pro￾tect the privacy rights and confidentiality of the research subjects (donors). These includ￾ed a two-stage consent process, a secure ran￾dom alphanumeric coding system for speci￾mens and records, circumscribed contact with the subjects by researchers, and options for off-site contact of donors. In addition, Celera applied for and received a Certificate of Con￾fidentiality from the Department of Health and Human Services. This Certificate autho￾rized Celera to protect the privacy of the individuals who volunteered to be donors as provided in Section 301(d) of the Public Health Service Act 42 U.S.C. 241(d). Celera and the IRB believed that the ini￾tial version of a completed human genome should be a composite derived from multiple donors of diverse ethnic backgrounds Pro￾spective donors were asked, on a voluntary basis, to self-designate an ethnogeographic category (e.g., African-American, Chinese, Hispanic, Caucasian, etc.). We enrolled 21 donors (32). Three basic items of information from each donor were recorded and linked by con￾fidential code to the donated sample: age, sex, and self-designated ethnogeographic group. From females, ;130 ml of whole, heparinized blood was collected. From males, ;130 ml of whole, heparinized blood was T H E H UMAN G ENOME 1306 16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org on September 27, 2009 www.sciencemag.org Downloaded from
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有