正在加载图片...
LETTERS PS 3527423 versus rs4981998 b 1400 世”日”…想, 〓〓 NM005484 a Figure 1 Analysis steps from identification of significant probe set in the PARP2 gene to idation. (a)Linear regression analysis of expression scores for probe set (PS)3527423 with otypes of SNP rs4981998, giving a P-value of 2.81 x 10-30. Probe set scores for eack individual are shown in red and regression line is indicated w probe set 3527423 in the context of all other probe sets belonging to the same transcript (meta-probe set 3527418) For each probe set, the significance level (P-value)is graphed (red line), along with fold change expression between the mean scores of the two homozygous genotypes(mean /meancc) (vertical blue bars). The solid horizontal red and blue lines represent the and fold change expression for the regression analysis at the meta- be set level against SNP rs4981998. Arrow, probe set 3527423. (c)RT- idation of probe set 352742 flanking exon-body primers e splice site use resulting in a larger second eo? ow the exons. The significant probe set 3527423 is highlighted in red and corresponds to alternat/e.orms 9 Individuals are highlighted by color according to their genotype for SNPrs4s E of parp2 with exon array probe sets for NM 005484 o level. Of these 127 transcripts, all but seven were common to the 317 We proceeded, using two different methods, to validate 32 of our top transcripts derived from the regression analysis at the probe-set level; candidate events distributed among the coding(16),5 UTR (6), and herefore, our final dataset comprised 324 transcripts predicted to 3 UTR(10)regions. For alternative splicing events of internally located ave expression changes at the meta-probe set and/or pro performed RT-PCR We examined the 324 transcripts in greater detail(Fig. 1; examples using exon-body primers in the two exons flanking the candidate in Fig. 2)to determine the nature of the isoform changes on a probe set(Fig. lc). We confirmed 15 probe sets showing SNP transcript level(summarized in Supplementary Table 2 and Supple- association to splicing of a cassette exon or intron(Table 1)and ntary Fig. 2 online). Expression changes were automatically classified them as follows: eight probe sets corresponded to splicing of a classified on the basis of the positions of the variable probe sets, coding exon, four probe sets were located in the 5 UTR and resulted transcript( Supplementary Fig. 2). A large number of genes(127, or use, two probe sets were found within intronic regions and resulted in 39%)showed whole-gene expression changes. However, an even larger intron retention, and the remaining probe set was located in the proportion(55%)of genes showed transcript-isoform changes only, 3 UTR and altered its length. The second, more sensitive validation without an accompanying change in the expression of the entire locus. method using quantitative real-time RT-PCR was applied to differen Nearly half of these transcript variations were at the splicing level (85, tially expressed probe sets within the 5 or 3 UTR and to those or 26%), with the remaining changes at the level of transcript which one of the flanking probe sets was missing in one of the termination (57, or 18%)and initiation(35, or 11%)(Fig. 3). It alternative isoforms. We designed sets of primers to amplify the should be noted that some of the genes showing changes in the differentially expressed probe set itself and compared the resulting expression level of the whole gene also showed further changes in PCR products to ones corresponding to adjacent probe sets showing splicing, transcript termination and/or transcript initiation, suggesting no association to the SNP and also expected to have similar expression that transcript isoform variation constitutes a large part of the genetic levels across all cell lines. Quantitative PCr data was used to perform a variation we have observed. A small number(20, or 6%)of genes linear regression fit with the original associated SNP and confirm the showed very complex patterns of isoform variation that were difficult significance and direction of the association analysis with the micro- to interpret. Notably, when we compare the proportion(18%)of array data at a nominal P-value of 0.05/, where N is the number of significant probe sets within the 3'untranslated regions(UTRs) with candidates tested in the real-time RT-PCR. Using this method,we the proportion of all 3 UTR core probe sets(13%)on the array, we validated six UTR-located probe sets showing SNP association: four found a significant over-representation(Pearsons chi-squared test, the 3 UTR (alternative polyadenylation) and two in the 5 UTR P=5.73x 10-6)of probe sets in this region, indicating that (differential transcriptional initiation). We also used this method on transcript termination variations may occur more frequently than the candidate probe sets that failed our initial validation method owing expected Because predicted changes to the 3'UTR may affect mRNA potentially to low sensitivity of endpoint PCR of minor isoforms, and stability and subcellular localization, this type of isoform variation we were able to validate another four probe sets: two within coding ay have important regulatory roles. These findings illustrate a very regions and two within the 3 UTRs. In total, 25 of 32 candidate probe omplex pattern of expression changes associated with genetic varia- sets were validated, for a success rate of 78%. The remaining 7 probe tion, encompassing alterations at the whole-gene expression level sets failed validation, which can be partially accounted for by unan- andor differences in transcript isofo notated SNPs located within the probe sets possibly leading to altered VOLUME 40 NUMBER 2 FEBRUARY 2008 NATURE GENETICSlevel. Of these 127 transcripts, all but seven were common to the 317 transcripts derived from the regression analysis at the probe-set level; therefore, our final dataset comprised 324 transcripts predicted to have expression changes at the meta–probe set and/or probe set level. We examined the 324 transcripts in greater detail (Fig. 1; examples in Fig. 2) to determine the nature of the isoform changes on a transcript level (summarized in Supplementary Table 2 and Supple￾mentary Fig. 2 online). Expression changes were automatically classified on the basis of the positions of the variable probe sets, followed by manual curation based on visualization of the entire transcript (Supplementary Fig. 2). A large number of genes (127, or 39%) showed whole-gene expression changes. However, an even larger proportion (55%) of genes showed transcript-isoform changes only, without an accompanying change in the expression of the entire locus. Nearly half of these transcript variations were at the splicing level (85, or 26%), with the remaining changes at the level of transcript termination (57, or 18%) and initiation (35, or 11%) (Fig. 3). It should be noted that some of the genes showing changes in the expression level of the whole gene also showed further changes in splicing, transcript termination and/or transcript initiation, suggesting that transcript isoform variation constitutes a large part of the genetic variation we have observed. A small number (20, or 6%) of genes showed very complex patterns of isoform variation that were difficult to interpret. Notably, when we compare the proportion (18%) of significant probe sets within the 3¢ untranslated regions (UTRs) with the proportion of all 3¢ UTR core probe sets (13%) on the array, we found a significant over-representation (Pearson’s chi-squared test, P ¼ 5.73  106 ) of probe sets in this region, indicating that transcript termination variations may occur more frequently than expected. Because predicted changes to the 3¢ UTR may affect mRNA stability and subcellular localization, this type of isoform variation may have important regulatory roles. These findings illustrate a very complex pattern of expression changes associated with genetic varia￾tion, encompassing alterations at the whole-gene expression level and/or differences in transcript isoforms. We proceeded, using two different methods, to validate 32 of our top candidate events distributed among the coding (16), 5¢ UTR (6), and 3¢ UTR (10) regions. For alternative splicing events of internally located probe sets, we performed RT-PCR on our entire panel of cell lines using exon-body primers in the two exons flanking the candidate probe set (Fig. 1c). We confirmed 15 probe sets showing SNP association to splicing of a cassette exon or intron (Table 1) and classified them as follows: eight probe sets corresponded to splicing of a coding exon, four probe sets were located in the 5¢ UTR and resulted in the removal of potential promoter sequences or alternative start codon use, two probe sets were found within intronic regions and resulted in intron retention, and the remaining probe set was located in the 3¢ UTR and altered its length. The second, more sensitive validation method using quantitative real-time RT-PCR was applied to differen￾tially expressed probe sets within the 5¢ or 3¢ UTR and to those in which one of the flanking probe sets was missing in one of the alternative isoforms. We designed sets of primers to amplify the differentially expressed probe set itself and compared the resulting PCR products to ones corresponding to adjacent probe sets showing no association to the SNP and also expected to have similar expression levels across all cell lines. Quantitative PCR data was used to perform a linear regression fit with the original associated SNP and confirm the significance and direction of the association analysis with the micro￾array data at a nominal P-value of 0.05/N, where N is the number of candidates tested in the real-time RT-PCR. Using this method, we validated six UTR-located probe sets showing SNP association: four in the 3¢ UTR (alternative polyadenylation) and two in the 5¢ UTR (differential transcriptional initiation). We also used this method on the candidate probe sets that failed our initial validation method owing potentially to low sensitivity of endpoint PCR of minor isoforms, and we were able to validate another four probe sets: two within coding regions and two within the 3¢ UTRs. In total, 25 of 32 candidate probe sets were validated, for a success rate of 78%. The remaining 7 probe sets failed validation, which can be partially accounted for by unan￾notated SNPs located within the probe sets possibly leading to altered PS 3527423 versus rs4981998 MPS 3527418 versus rs4981998 1,400 1,200 1,000 800 600 Probe-set expression 400 200 CC CT Genotype TT P = 2.81 × 10–30 PS 3527423 30 25 20 15 10 5 0 –log10(P-value) 3527419 3527421 3527422 3527423 3527425 NM_005484 NM_001042618 3527419 06994 06993 07357 12145 07056 12057 11882 12812 07022 12763 12043 12760 07055 12814 12144 12813 07034 12872 11881 12815 06985 12874 12146 12873 12891 11994 12239 11830 07345 11832 07000 12762 12154 12761 12155 12892 11993 12044 12249 12248 11995 12264 11992 12156 11840 11839 12234 11829 11831 12750 12751 12003 12004 12005 12006 3527421 3527422 3527423 3527425 3527426 3527427 3527430 3527431 3527432 3527433 3527435 3527439 3527440 3527441 3527446 3527448 3527450 3527452 3527453 3527454 5′ Probe set 0.0 0.5 1.0 log2(fold change) 1.5 2.0 2.5 3.0 0.4 0.3 0.2 ab c d 0.4 0.3 0.2 0.4 0.3 0.2 Figure 1 Analysis steps from identification of significant probe set in the PARP2 gene to validation. (a) Linear regression analysis of expression scores for probe set (PS) 3527423 with genotypes of SNP rs4981998, giving a P-value of 2.81  1030. Probe set scores for each individual are shown in red and regression line is indicated with blue dashes. (b) Visualization of probe set 3527423 in the context of all other probe sets belonging to the same transcript (meta–probe set 3527418). For each probe set, the significance level (P-value) is graphed (red line), along with fold change expression between the mean scores of the two homozygous genotypes (meanTT / meanCC) (vertical blue bars). The solid horizontal red and blue lines represent the significance and fold change expression for the regression analysis at the meta– probe set level against SNP rs4981998. Arrow, probe set 3527423. (c) RT-PCR validation of probe set 3527423 using flanking exon-body primers. Individuals are highlighted by color according to their genotype for SNP rs4981998: CC (red), CT (black), TT (blue). (d) Schematic of 5¢ end of two isoforms of PARP2 with exon array probe sets shown below the exons. The significant probe set 3527423 is highlighted in red and corresponds to alternative 5¢ splice site use resulting in a larger second exon for NM_005484. 226 VOLUME 40 [ NUMBER 2 [ FEBRUARY 2008 NATURE GENETICS LETTERS © 2008 Nature Publishing Group http://www.nature.com/naturegenetics
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有