正在加载图片...
ARTICLES NATURE Vol 447 14 June 2007 transcripts nor easily explained as structural non-coding RNAs. generated data about sequence-specific transcription factor binding Other studies have noted complex transcription around specific loci and clusters of regulatory elements. Finally, we describe how this or chimaeric-gene structures( for example refs 28-30), but these have information can be integrated to make predictions about transcrip- often been considered exceptions; our data show that complex inter- tional regulation alated transcription is common at many loci. The results presented Transcription start site catalogue. We analysed two data sets in the next section show extensive amounts of regulatory factors to catalogue TSSs in the ENCODE regions: the 5'ends of around novel TSSs, which is consistent with this extensive transcrip- GENCODE-annotated transcripts and the combined results of two tion. The biological relevance of these unannotated transcripts 5'-end-capture technologies--CAGE and PET-tagging. The initial remains unanswered by these studies. Evolutionary information results suggested the potential presence of 16,051 unique TSSs. (detailed below) is mixed in this regard; for example, it indicates that However, in many cases, multiple TSSs resided within a single small than many other annotated features. As with other ENCODE- taining TSSs with many very close precise initiation sites". To nor detected elements, it is difficult to identify clear biological roles for malize for this effect, we grouped TSSs that were 60 or fewer bases the majority of these transcripts; such experiments are challenging to apart into a single cluster, and in each case considered the most perform on a large scale and, furthermore, it seems likely that many frequent CAGE or PET tag(or the 5-most TSS in the case of TSSs of the corresponding biochemical events may be evolutionarily neut- identified only from GENCODE data)as representative of that clus ral (see below ) ter for downstream analys The above effort yielded 7, 157 TSS clusters in the ENCODE Regulation of transcription Overview. A significant challenge in biology is to identify the tran- regions. We classified these TSSs into three categories: known(pre- transcript and to understand how the function of these elements is by other evidence)and unsupported. The novel TSSs were further riptional regulatory elements that control the expression of each sent at the end of GENCODE-defined transcripts), novel(supported ubdivided on the basis of the nature of the supporting evidence(see coordinated to execute complex cellular processes. A simple, com- Table 3 and Supplementary Information section 3.5), with all four of monplace view of transcriptional regulation involves five types of the resulting subtypes showing significant overlap with experimental cis-acting regulatory sequences -promoters, enhancers, silencers, evidence using the GSC statistic. Although there is a larger relative insulators and locus control regions 1. Overall, transcr whereby the restricted to only singleton tags, the novel TSSs continue to have lation involves the interplay of multiple component otional regu- proportion of singleton tags in the novel category, when analysis is availability of specific transcription factors and the accessibility of highly significant overlap with supporting evidence(see Supplemen ecific genomic regions determine whether a transcript is gener ated"l.However, the current view of transcriptional regulation is Correlating genomic features with chromatin structure and tran- known to be overly simplified, with many details remaining to be scription factor binding By measuring relative sensitivity to DNasel established. For ple, the consensus sequences of transcription digestion(see Supplementary Information section 3.3),we identified factor binding sites(typically 6 to 10 bases) have relatively little nformation content and are present numerous times in the genome, and TSSs both reflect genomic regions thought to be enriched for regulation. Does chromatin structure then determine whether such a partitioned dHSs into those within 2.5kb of a TSS(958; 46.5%)and sequence has a regulatory role, re there complex inter-factor inter. the remaining ones, which were classified as distal (1, 102; 53.5%).We from different distal regulatory elements coupled without affecting all then cross-analysed the TSSs and DHSs with data sets relating to histone modifications, chromatin accessibility and sequence-specific neighbouring genes? Meanwhile, our understanding of the repertoire transcription factor binding by summarizing these signals in aggreg- of transcriptional events is becoming more complex, with an increas ing appreciation of alternative TSSs233 and the presence of non- ate relative to the distance from TSSs or DHSs. Fi gure 5 shows rep- ding2. and anti-sense transcripts resentative profiles of specific histone modifications, Pol Il and selected transcription factor binding for the different categories of To better understand transcriptional regulation, we sought to TSSs. Further profiles and statistical analysis of these studies can b ENCODE regions. For this pilot project, we mainly focused on the found in Supplementary Information 3.6 binding of regulatory proteins and chromatin structure involved in In the case of the three TSS categories(known, novel and unsup- transcriptional regulation. We analysed over 150 data sets, mainly ported ) known and novel TSSs are both associated with similar from ChIP-chip2-, ChIP-PET and STAGE studies(see Sup plementary Information section 3.1 and 3.2). These methods use through DNasel accessibility), whereas unsupported TSSs are not. chromatin immunoprecipitation with specific antibodies to enrich for DNA in physical contact with the targeted epitope. This enriched Table 3 Different categories of TSSs defined on the basis of support from DNA can then be analyse either microarrays( ChIP-chip) different transcript-survey method high-throughput sequencing( ChIP-PET and STAGE). The assays Category Transcript survey Number of TSS P? Singleton included 18 sequence-specific transcription factors and components (non-redundant) of the general transcription machinery( for example, RNA polyme ase II (Pol ID), TAFI and TFIIB/GTF2B) In addition, we tested more Known GENCODE5′ends1.730 25(74 overall GENCODE sense 1.437 than 600 potential promoter fragments for transcriptional activity by transient-transfection reporter assays that used 16 human cell lines GENCODE 3×10-865 We also examined chromatin structure by studying the ENCODE antisense exons 63 regions for DNasel sensitivity(by quantitative PCR"2 and tiling 7×10-6371 on sury rrays,,see Supplementary Information section 3.3), histone com- 4×10-9060 position", histone modifications(using ChIP-chip assays)6, and 2,666 83.4 histone displacement(using FAIRE, see Supplementary Information TSS clusters with this support, excluding TSSs from higher catego section 3.4). Below, we detail these analyses, starting with the efforts to define and classify the 5 ends of transcripts with respect to their t Per cent of clusters with only one tag For the known' category this was calculated as the per associated regulatory signals. Following that are summaries of cent of GENCODE 5'ends with tag support (25%)or overall (74%) E2007 Nature Publishing Grouptranscripts nor easily explained as structural non-coding RNAs. Other studies have noted complex transcription around specific loci or chimaeric-gene structures (for example refs 28–30), but these have often been considered exceptions; our data show that complex inter￾calated transcription is common at many loci. The results presented in the next section show extensive amounts of regulatory factors around novel TSSs, which is consistent with this extensive transcrip￾tion. The biological relevance of these unannotated transcripts remains unanswered by these studies. Evolutionary information (detailed below) is mixed in this regard; for example, it indicates that unannotated transcripts show weaker evolutionary conservation than many other annotated features. As with other ENCODE￾detected elements, it is difficult to identify clear biological roles for the majority of these transcripts; such experiments are challenging to perform on a large scale and, furthermore, it seems likely that many of the corresponding biochemical events may be evolutionarily neut￾ral (see below). Regulation of transcription Overview. A significant challenge in biology is to identify the tran￾scriptional regulatory elements that control the expression of each transcript and to understand how the function of these elements is coordinated to execute complex cellular processes. A simple, com￾monplace view of transcriptional regulation involves five types of cis-acting regulatory sequences—promoters, enhancers, silencers, insulators and locus control regions31. Overall, transcriptional regu￾lation involves the interplay of multiple components, whereby the availability of specific transcription factors and the accessibility of specific genomic regions determine whether a transcript is gener￾ated31. However, the current view of transcriptional regulation is known to be overly simplified, with many details remaining to be established. For example, the consensus sequences of transcription factor binding sites (typically 6 to 10 bases) have relatively little information content and are present numerous times in the genome, with the great majority of these not participating in transcriptional regulation. Does chromatin structure then determine whether such a sequence has a regulatory role? Are there complex inter-factor inter￾actions that integrate the signals from multiple sites? How are signals from different distal regulatory elements coupled without affecting all neighbouring genes? Meanwhile, our understanding of the repertoire of transcriptional events is becoming more complex, with an increas￾ing appreciation of alternative TSSs32,33 and the presence of non￾coding27,34 and anti-sense transcripts35,36. To better understand transcriptional regulation, we sought to begin cataloguing the regulatory elements residing within the 44 ENCODE regions. For this pilot project, we mainly focused on the binding of regulatory proteins and chromatin structure involved in transcriptional regulation. We analysed over 150 data sets, mainly from ChIP-chip37–39, ChIP-PET and STAGE40,41 studies (see Sup￾plementary Information section 3.1 and 3.2). These methods use chromatin immunoprecipitation with specific antibodies to enrich for DNA in physical contact with the targeted epitope. This enriched DNA can then be analysed using either microarrays (ChIP-chip) or high-throughput sequencing (ChIP-PET and STAGE). The assays included 18 sequence-specific transcription factors and components of the general transcription machinery (for example, RNA polymer￾ase II (Pol II), TAF1 and TFIIB/GTF2B). In addition, we tested more than 600 potential promoter fragments for transcriptional activity by transient-transfection reporter assays that used 16 human cell lines33. We also examined chromatin structure by studying the ENCODE regions for DNaseI sensitivity (by quantitative PCR42 and tiling arrays43,44, see Supplementary Information section 3.3), histone com￾position45, histone modifications (using ChIP-chip assays)37,46, and histone displacement (using FAIRE, see Supplementary Information section 3.4). Below, we detail these analyses, starting with the efforts to define and classify the 59 ends of transcripts with respect to their associated regulatory signals. Following that are summaries of generated data about sequence-specific transcription factor binding and clusters of regulatory elements. Finally, we describe how this information can be integrated to make predictions about transcrip￾tional regulation. Transcription start site catalogue. We analysed two data sets to catalogue TSSs in the ENCODE regions: the 59 ends of GENCODE-annotated transcripts and the combined results of two 59-end-capture technologies—CAGE and PET-tagging. The initial results suggested the potential presence of 16,051 unique TSSs. However, in many cases, multiple TSSs resided within a single small segment (up to ,200 bases); this was due to some promoters con￾taining TSSs with many very close precise initiation sites47. To nor￾malize for this effect, we grouped TSSs that were 60 or fewer bases apart into a single cluster, and in each case considered the most frequent CAGE or PET tag (or the 59-most TSS in the case of TSSs identified only from GENCODE data) as representative of that clus￾ter for downstream analyses. The above effort yielded 7,157 TSS clusters in the ENCODE regions. We classified these TSSs into three categories: known (pre￾sent at the end of GENCODE-defined transcripts), novel (supported by other evidence) and unsupported. The novel TSSs were further subdivided on the basis of the nature of the supporting evidence (see Table 3 and Supplementary Information section 3.5), with all four of the resulting subtypes showing significant overlap with experimental evidence using the GSC statistic. Although there is a larger relative proportion of singleton tags in the novel category, when analysis is restricted to only singleton tags, the novel TSSs continue to have highly significant overlap with supporting evidence (see Supplemen￾tary Information section 3.5.1). Correlating genomic features with chromatin structure and tran￾scription factor binding. By measuring relative sensitivity to DNaseI digestion (see Supplementary Information section 3.3), we identified DNaseI hypersensitive sites throughout the ENCODE regions. DHSs and TSSs both reflect genomic regions thought to be enriched for regulatory information and many DHSs reside at or near TSSs. We partitioned DHSs into those within 2.5 kb of a TSS (958; 46.5%) and the remaining ones, which were classified as distal (1,102; 53.5%). We then cross-analysed the TSSs and DHSs with data sets relating to histone modifications, chromatin accessibility and sequence-specific transcription factor binding by summarizing these signals in aggreg￾ate relative to the distance from TSSs or DHSs. Figure 5 shows rep￾resentative profiles of specific histone modifications, Pol II and selected transcription factor binding for the different categories of TSSs. Further profiles and statistical analysis of these studies can be found in Supplementary Information 3.6. In the case of the three TSS categories (known, novel and unsup￾ported), known and novel TSSs are both associated with similar signals for multiple factors (ranging from histone modifications through DNaseI accessibility), whereas unsupported TSSs are not. Table 3 | Different categories of TSSs defined on the basis of support from different transcript-survey methods Category Transcript survey method Number of TSS clusters (non-redundant)* P value{ Singleton clusters{ (%) Known GENCODE 59 ends 1,730 2 3 10270 25 (74 overall) Novel GENCODE sense exons 1,437 6 3 10239 64 GENCODE antisense exons 521 3 3 1028 65 Unbiased transcription survey 639 7 3 10263 71 CpG island 164 4 3 10290 60 Unsupported None 2,666 - 83.4 * Number of TSS clusters with this support, excluding TSSs from higher categories. { Probability of overlap between the transcript support and the PET/CAGE tags, as calculated by the Genome Structure Correction statistic (see Supplementary Information section 1.3). { Per cent of clusters with only one tag. For the ‘known’ category this was calculated as the per cent of GENCODE 59 ends with tag support (25%) or overall (74%). ARTICLES NATURE|Vol 447| 14 June 2007 804 ©2007 NaturePublishingGroup
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有