正在加载图片...
Vol 447 14 June 2007 doi: 10. 1038/nature05874 nature ARTICLES Identification and analysis of functional elements in 1% of the human genome by the Encode pilot project The ENCODE Project Consortium* We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. these data have been further integrated and augmented by a number of evolutionary and computational analyses. Together our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another Second systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. third, a more sophisticated view of chromatin structure has emerged including its inter-relationship with DNA replication and transcriptional regulation Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function The human genome is an elegant but cryptic store of information. The evolve, our present understanding about the evolution of other func roughly three billion bases encode, either directly or indirectly, the tional genomic regions is poorly developed. Experimental studies instructions for synthesizing nearly all the molecules that form each that augment what we learn from evolutionary analyses are key for human cell, tissue and organ. Sequencing the human genome- pro- solidifying our insights regarding genome function. vided highly accurate DNA sequences for each of the 24 chromosomes. The Encyclopedia of DNA Elements(ENCODE) Project aims to However, at present, we have an incomplete understanding of the provide a more biologically informative representation of the human protein-coding portions of the genome, and markedly less under- genome by using high-throughput methods to identify and catalogue standing of both non-protein-coding transcripts and genomic ele- the functional elements encoded. In its pilot phase, 35 groups pro ments that temporally and spatially regulate gene expression. To vided more than 200 experimental and computational data sets that understand the human genome, and by extension the biological pro- examined in unprecedented detail a targeted 29, 998 kilobases(kb)of cesses it orchestrates and the ways in which its defects can give rise to the human genome. These roughly 30 Mb--equivalent to -1% of disease, we need a more transparent view of the information it encodes. the human genome--are sufficiently large and diverse to allow for The molecular mechanisms by which genomic information directs rigorous pilot testing of multiple experimental and computational the synthesis of different biomolecules has been the focus of much of methods. These 30 Mb are divided among 44 genomic regions; molecular biology research over the last three decades. Previous stud- approximately 15 Mb reside in 14 regions for which there is already ies have typically concentrated on individual genes, with the resulting substantial biological knowledge, whereas the other 15 Mb reside in general principles then providing insights into transcription, chro- 30 regions chosen by a stratified random-sampling method(see matinremodellingmessengerRnasplicingDnareplicationandhttp://www.genome.gov/10506161).Thehighlightsofourfindings numerous other genomic processes. Although many such principles to date include seem valid as additional genes are investigated, they generally have The human is pervasively transcribed, such that the not provided genome-wide insights about biological function. majority of its bases are associated with at least one primary tran E The first genome-wide analyses that shed light on human genome script and many t pts link distal regions to established protei Inction made use of observing the actions of evolution. The ever- coding loci growing set of vertebrate genome sequences- is providing increas-. Many novel non-protein-coding transcripts have been identified, convincingly indicate the presence of numerous genomic regions tionally silent. under strong evolutionary constraint, they have less power in iden- Numerous previously unrecognized transcription start sites tifying the precise bases that are constrained and provide little, if any, have been identified, many of which show chromatin structure insight into why those bases are biologically important. Furthermore, and sequence-specific protein-binding properties similar to well lthough we have good models for how protein-coding regions understood promoters a list of authors and their affiliations appears at the end of the paper. E2007 Nature Publishing GroupARTICLES Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project The ENCODE Project Consortium* We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function. The human genome is an elegant but cryptic store of information. The roughly three billion bases encode, either directly or indirectly, the instructions for synthesizing nearly all the molecules that form each human cell, tissue and organ. Sequencing the human genome1–3 pro￾vided highly accurate DNA sequences for each of the 24 chromosomes. However, at present, we have an incomplete understanding of the protein-coding portions of the genome, and markedly less under￾standing of both non-protein-coding transcripts and genomic ele￾ments that temporally and spatially regulate gene expression. To understand the human genome, and by extension the biological pro￾cesses it orchestrates and the ways in which its defects can give rise to disease, we need a more transparent view of the information it encodes. The molecular mechanisms by which genomic information directs the synthesis of different biomolecules has been the focus of much of molecular biology research over the last three decades. Previous stud￾ies have typically concentrated on individual genes, with the resulting general principles then providing insights into transcription, chro￾matin remodelling, messenger RNA splicing, DNA replication and numerous other genomic processes. Although many such principles seem valid as additional genes are investigated, they generally have not provided genome-wide insights about biological function. The first genome-wide analyses that shed light on human genome function made use of observing the actions of evolution. The ever￾growing set of vertebrate genome sequences4–8 is providing increas￾ing power to reveal the genomic regions that have been most and least acted on by the forces of evolution. However, although these studies convincingly indicate the presence of numerous genomic regions under strong evolutionary constraint, they have less power in iden￾tifying the precise bases that are constrained and provide little, if any, insight into why those bases are biologically important. Furthermore, although we have good models for how protein-coding regions evolve, our present understanding about the evolution of other func￾tional genomic regions is poorly developed. Experimental studies that augment what we learn from evolutionary analyses are key for solidifying our insights regarding genome function. The Encyclopedia of DNA Elements (ENCODE) Project9 aims to provide a more biologically informative representation of the human genome by using high-throughput methods to identify and catalogue the functional elements encoded. In its pilot phase, 35 groups pro￾vided more than 200 experimental and computational data sets that examined in unprecedented detail a targeted 29,998 kilobases (kb) of the human genome. These roughly 30 Mb—equivalent to ,1% of the human genome—are sufficiently large and diverse to allow for rigorous pilot testing of multiple experimental and computational methods. These 30 Mb are divided among 44 genomic regions; approximately 15 Mb reside in 14 regions for which there is already substantial biological knowledge, whereas the other 15 Mb reside in 30 regions chosen by a stratified random-sampling method (see http://www.genome.gov/10506161). The highlights of our findings to date include: $ The human genome is pervasively transcribed, such that the majority of its bases are associated with at least one primary tran￾script and many transcripts link distal regions to established protein￾coding loci. $ Many novel non-protein-coding transcripts have been identified, with many of these overlapping protein-coding loci and others located in regions of the genome previously thought to be transcrip￾tionally silent. $ Numerous previously unrecognized transcription start sites have been identified, many of which show chromatin structure and sequence-specific protein-binding properties similar to well￾understood promoters. *A list of authors and their affiliations appears at the end of the paper. Vol 447| 14 June 2007| doi:10.1038/nature05874 799 ©2007 NaturePublishingGroup
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有