正在加载图片...
articles Initial sequencing and analysis of the human genome A partial list of authors appears on the opposite page. Affiliations are listed at the end of the paper. The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. The rediscovery of Mendel's laws of heredity in the opening weeks of coordinate regulation of the genes in the clusters the 20th century-'sparked a scientific quest to understand the There appear to be about 30,000-40,000 protein-coding genes in nature and content of genetic information that has propelled the human genome-only about twice as many as in worm or fly. biology for the last hundred years. The scientific progress made However, the genes are more complex, with more alternative falls naturally into four main phases, corresponding roughly to the splicing generating a larger number of protein products. four quarters of the century. The first established the cellular basis of The full set of proteins(the proteome)encoded by the human heredity: the chromosomes. The second defined the molecular basis genome is more complex than those of invertebrates. This is due in f heredity: the dNA double helix. The third unlocked the informa- part to the presence of vertebrate-specific protein domains and tional basis of heredity, with the discovery of the biological mechan- motifs(an estimated 7% of the total), but more to the fact that ism by which cells read the information contained in genes and with vertebrates appear to have arranged pre-existing components into a the invention of the recombinant DNA technologies of cloning and richer collection of domain architectures sequencing by which scientists can do the same. Hundreds of human genes appear likely to have resulted from The last quarter of a century has been marked by a relentless drive horizontal transfer from bacteria at some point in the vertebrate to decipher first genes and then entire genomes, spawning the field ge. Dozens of genes appear to have been derived from trans of genomics. The fruits of this work already include the genome posable elements. quences of 599 viruses and viroids, 205 naturally occurring Although about half of the human genome derives from trans- plasmids, 185 organelles, 31 eubacteria, seven archaea, one posable elements, there has been a marked decline in the overall fungus, two animals and one plant activity of such elements in the hominid lineage. DNA transposons Here we report the results of a collaboration involving 20 groups appear to have become completely inactive and long-terminal from the United States, the United Kingdom, Japan, France, repeat(LTR)retroposons may also have done so Germany and China to produce a draft sequence of the human The pericentromeric and subtelomeric regions of chromosomes genome. The draft genome sequence was generated from a physical are filled with large recent segmental duplications of sequence from ap covering more than 96% of the euchromatic part of the human elsewhere in the genome. Segmental duplication is much more it covers about 94% of the human genome. The sequence was a w)isis of thea s than in yeast, fly or worm organization of Alu elements explains the long roduced over a relatively short period, with coverage rising from standing mystery of their surprising genomic distribution, and about 10% to more than 90% over roughly fifteen months. The suggests that there may be strong selection in favour of preferential sequence data have been made available without restriction and retention of Alu elements in GC-rich regions and that these selfish updated daily throughout the project. The task ahead is to produce a elements may benefit their human hosts finished sequence, by closing all gaps and resolving all ambiguities. The mutation rate is about twice as high in male as in female Already about one billion bases are in final form and the task of meiosis, showing that most mutation occurs in males bringing the vast majority of the sequence to this standard is now Cytogenetic analysis of the sequenced clones confirms sugges- tions that large GC-poor regions are strongly correlated with dark The sequence of the human genome is of interest in several G-bands in karyotypes st genome to be extensively sequenced so far, Recombination rates tend to be much higher in distal region eeing 25 times as large as any previously sequenced genome and (around 20 megabases(Mb))of chror mosomes and on shorter eight times as large as the sum of all such genomes. It is the first chromosome arms in general, in a pattern that promotes the vertebrate genome to be extensively sequenced. And, uniquely, it is occurrence of at least one crossover per chromosome arm in each Much work remains to be done to produce a complete finished More than 1.4 million single nucleotide polymorphisms(SNPs) sequence, but the vast trove of information that has become in the human genome have been identified. This collection should available through this collaborative effort allows a global perspective allow the tion of genome-wide linkage n the human genome. Although the details will change as the mapping of the genes in the human population is finished In this paper, we start by presenting background information on The genomic landscape shows marked variation in the distribu- the project and describing the generation, assembly and evaluation tion of a number of features, including genes, transposable of the draft genome sequence. We then focus on an initial analysis of elements, GC content, CpG islands and recombination rate. This the sequence itself: the broad chromosomal landscape; the repeat gives us important clues about function. For example, the devel- elements and the rich palaeontological record of evolutionary and opmentally important HOX gene clusters are the most repeat-poor biological processes that they provide; the human genes and regions of the human genome, probably reflecting the very complex proteins and their differences and similarities with those of other 860 A@2001 Macmillan Magazines Ltd NATURE VOL 4091 15 FEBRUARY 2001Initial sequencing and analysis of the human genome International Human Genome Sequencing Consortium* * A partial list of authors appears on the opposite page. Af®liations are listed at the end of the paper. ............................................................................................................................................................................................................................................................................ The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. The rediscovery of Mendel's laws of heredity in the opening weeks of the 20th century1±3 sparked a scienti®c quest to understand the nature and content of genetic information that has propelled biology for the last hundred years. The scienti®c progress made falls naturally into four main phases, corresponding roughly to the four quarters of the century. The ®rst established the cellular basis of heredity: the chromosomes. The second de®ned the molecular basis of heredity: the DNA double helix. The third unlocked the informa￾tional basis of heredity, with the discovery of the biological mechan￾ism by which cells read the information contained in genes and with the invention of the recombinant DNA technologies of cloning and sequencing by which scientists can do the same. The last quarter of a century has been marked by a relentless drive to decipher ®rst genes and then entire genomes, spawning the ®eld of genomics. The fruits of this work already include the genome sequences of 599 viruses and viroids, 205 naturally occurring plasmids, 185 organelles, 31 eubacteria, seven archaea, one fungus, two animals and one plant. Here we report the results of a collaboration involving 20 groups from the United States, the United Kingdom, Japan, France, Germany and China to produce a draft sequence of the human genome. The draft genome sequence was generated from a physical map covering more than 96% of the euchromatic part of the human genome and, together with additional sequence in public databases, it covers about 94% of the human genome. The sequence was produced over a relatively short period, with coverage rising from about 10% to more than 90% over roughly ®fteen months. The sequence data have been made available without restriction and updated daily throughout the project. The task ahead is to produce a ®nished sequence, by closing all gaps and resolving all ambiguities. Already about one billion bases are in ®nal form and the task of bringing the vast majority of the sequence to this standard is now straightforward and should proceed rapidly. The sequence of the human genome is of interest in several respects. It is the largest genome to be extensively sequenced so far, being 25 times as large as any previously sequenced genome and eight times as large as the sum of all such genomes. It is the ®rst vertebrate genome to be extensively sequenced. And, uniquely, it is the genome of our own species. Much work remains to be done to produce a complete ®nished sequence, but the vast trove of information that has become available through this collaborative effort allows a global perspective on the human genome. Although the details will change as the sequence is ®nished, many points are already clear. X The genomic landscape shows marked variation in the distribu￾tion of a number of features, including genes, transposable elements, GC content, CpG islands and recombination rate. This gives us important clues about function. For example, the devel￾opmentally important HOX gene clusters are the most repeat-poor regions of the human genome, probably re¯ecting the very complex coordinate regulation of the genes in the clusters. X There appear to be about 30,000±40,000 protein-coding genes in the human genomeÐonly about twice as many as in worm or ¯y. However, the genes are more complex, with more alternative splicing generating a larger number of protein products. X The full set of proteins (the `proteome') encoded by the human genome is more complex than those of invertebrates. This is due in part to the presence of vertebrate-speci®c protein domains and motifs (an estimated 7% of the total), but more to the fact that vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures. X Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage. Dozens of genes appear to have been derived from trans￾posable elements. X Although about half of the human genome derives from trans￾posable elements, there has been a marked decline in the overall activity of such elements in the hominid lineage. DNA transposons appear to have become completely inactive and long-terminal repeat (LTR) retroposons may also have done so. X The pericentromeric and subtelomeric regions of chromosomes are ®lled with large recent segmental duplications of sequence from elsewhere in the genome. Segmental duplication is much more frequent in humans than in yeast, ¯y or worm. X Analysis of the organization of Alu elements explains the long￾standing mystery of their surprising genomic distribution, and suggests that there may be strong selection in favour of preferential retention of Alu elements in GC-rich regions and that these `sel®sh' elements may bene®t their human hosts. X The mutation rate is about twice as high in male as in female meiosis, showing that most mutation occurs in males. X Cytogenetic analysis of the sequenced clones con®rms sugges￾tions that large GC-poor regions are strongly correlated with `dark G-bands' in karyotypes. X Recombination rates tend to be much higher in distal regions (around 20 megabases (Mb)) of chromosomes and on shorter chromosome arms in general, in a pattern that promotes the occurrence of at least one crossover per chromosome arm in each meiosis. X More than 1.4 million single nucleotide polymorphisms (SNPs) in the human genome have been identi®ed. This collection should allow the initiation of genome-wide linkage disequilibrium mapping of the genes in the human population. In this paper, we start by presenting background information on the project and describing the generation, assembly and evaluation of the draft genome sequence. We then focus on an initial analysis of the sequence itself: the broad chromosomal landscape; the repeat elements and the rich palaeontological record of evolutionary and biological processes that they provide; the human genes and proteins and their differences and similarities with those of other articles 860 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有