正在加载图片...
LETTERS natire enetics Genome-wide analysis of transcript isoform variation in humans Tony Kwan 2, David Benovoyl, 2, Christel Dias, Scott Gurd, Cathy Provencher, Patrick Beaulieu,, s Thomas J Hudson,2, 4, Rob Sladek 2& Jacek Majewski 2 We have performed a genome-wide analysis of common functional relationships to diseases. However, little is known at present genetic variation controlling differential expression of about the genetic variation at the sub-transcript level or about transcript isoforms in the CEU HapMap population using a differences in multiple transcript isoforms of the same gene. Here, splicing differences between various types of samples.sing the 9 comprehensive exon tiling microarray covering 17, 897 genes. we interrogated transcripts across their entire length, usi We detected 324 genes with significant associations between Affymetrix GeneChip Human Exon 1.0 e flanking SNPs and transcript levels. Of these 39% reflected s changes in whole gene expression and 55% reflected transcript Exons within a gene are represented on the microarray by indivi- dual probe sets, and were considered discrete units for our analysis of initiation of transcription)use, and diene, differential 5 UTR transcript isoform-processing differences. We used triplicate samples of lymphoblastoid cell lines(LCLs)derived from 57 unrelated Centre d'Etudes du Polymorphisme Humain(CEPH) CEU individuals(Utah a that the regulatory effects of genetic variation in a normal e human population are far more complex than previously residents with northern and western European ancestry) genotyped by the HapMap consortium, allowing us to establish a possible genetic observed. This extra layer of molecular diversity may account basis for any observed variations in transcript isoforms with associated for natural phenotypic variation and disease susceptibility SNPs. A linear regression analysis under a codominant model was carried out to associate probe set expression intensities with the N Alternative pre-mRNA processing increases the complexity of eukary- genotypes of all SNP markers within a window of 50 kb flanking o otic transcriptomes, allowing multiple transcripts and protein iso- the boundaries of the transcript cluster(meta-probe set)containing forms with distinct functions to be produced from a single genomic the probe set. We assessed the statistical significance of th e vanation locus. Within an organism, tissue specific gene isoforms are known to using the t-statistic, and used the regression equation to estimate the have important functions in development and proper functioning of fold change in expression between the two homozygous genotypes. We diverse cell types. Across individuals, changes in normal isoform used permutation testing 8 to determine empirical P-values corre- structure have phenotypic consequences and have been associated sponding to the asymptotic P-values obtained from the regression with disease. Splicing defects in a number of genes, such as the cystic Subsequently, we applied the false discovery rate(FDr)correction to fibrosis transmembrane conductance regulator, CFTR, result in several establish a cutoff P-value of 9. x 10, corresponding to the 0.0 known mendelian disorders. More subtle changes, such as alternative FDR level(see Methods). This yielded 757 unique probe sets showing 3 processing and polyadenylation, have recently been associated with significant SNP associations, belonging to 317 unique meta-probe sets complex disorders: OASI in severe acute respiratory syndrome, (Supplementary Table I online). Although the most significant SNPs TAP2 in type I diabetes, and IRF5 in susceptibility to systemic may not e causative polymorphisms responsible for these differences in probe set expression, they are very probably in linkage Several recent studies have suggested that natural variation at the disequilibrium with the causative polymorphism(s). This is reflected level of whole-gene expression is common in humans and is associated in the distance distribution of associated polymorphisms, most of with genetic variants, such as SNPs or copy number variants which are in close proximity to the probe sets( Supplementary Fig. 1 (CNVs)0-13. Studying variation in gene expression is becoming online). The association analysis at the transcript(meta-probe set) increasingly important because of its contribution to phenotypic level resulted in a 0.05 FDR cutoff of 6.02 x 10, yielding 127 unique differences among individuals and its possible regulatory and transcripts with significant genetic association at the gene expression dEpartment of Human Genetics, McGill University and University and Genome Quebec Innovation Centre, 740 Dr. Penfield, Room 7210, Montreal, Quebec H3A 1A4, Canada. Division of Hematology-oncology, te-Justine Hospital, Montreal, Quebec H3T lC5, Canada. Ontario Institute for Cancer College Street, Suite 800, Toronto, Ontario M5G lL7, Canada. Correspondence should be addressed to M. jacek majewski@mcgill. ca). Received 9 July 2007; accepted 31 October 2007: published online 13 January 2008; doi: 10. 1038/ng. 2007.57 NATURE GENETICS VOLUME 40 NUMBER 2 I FEBRUARY 2008Genome-wide analysis of transcript isoform variation in humans Tony Kwan1,2, David Benovoy1,2, Christel Dias1, Scott Gurd2, Cathy Provencher2, Patrick Beaulieu3, Thomas J Hudson1,2,4, Rob Sladek1,2 & Jacek Majewski1,2 We have performed a genome-wide analysis of common genetic variation controlling differential expression of transcript isoforms in the CEU HapMap population using a comprehensive exon tiling microarray covering 17,897 genes. We detected 324 genes with significant associations between flanking SNPs and transcript levels. Of these, 39% reflected changes in whole gene expression and 55% reflected transcript isoform changes such as splicing variants (exon skipping, alternative splice site use, intron retention), differential 5¢ UTR (initiation of transcription) use, and differential 3¢ UTR (alternative polyadenylation) use. These results demonstrate that the regulatory effects of genetic variation in a normal human population are far more complex than previously observed. This extra layer of molecular diversity may account for natural phenotypic variation and disease susceptibility. Alternative pre-mRNA processing increases the complexity of eukary￾otic transcriptomes, allowing multiple transcripts and protein iso￾forms with distinct functions to be produced from a single genomic locus1. Within an organism, tissue specific gene isoforms are known to have important functions in development and proper functioning of diverse cell types2. Across individuals, changes in normal isoform structure have phenotypic consequences and have been associated with disease3,4. Splicing defects in a number of genes, such as the cystic fibrosis transmembrane conductance regulator, CFTR, result in several known mendelian disorders5. More subtle changes, such as alternative 3¢ processing and polyadenylation, have recently been associated with complex disorders: OAS1 in severe acute respiratory syndrome6, TAP2 in type I diabetes7, and IRF5 in susceptibility to systemic lupus erythematosus8,9. Several recent studies have suggested that natural variation at the level of whole-gene expression is common in humans and is associated with genetic variants, such as SNPs or copy number variants (CNVs)10–13. Studying variation in gene expression is becoming increasingly important because of its contribution to phenotypic differences among individuals and its possible regulatory and functional relationships to diseases. However, little is known at present about the genetic variation at the sub-transcript level or about differences in multiple transcript isoforms of the same gene. Here, we interrogated transcripts across their entire length, using the Affymetrix GeneChip Human Exon 1.0 ST Array, which can detect splicing differences between various types of samples14–16. Exons within a gene are represented on the microarray by indivi￾dual probe sets, and were considered discrete units for our analysis of transcript isoform-processing differences. We used triplicate samples of lymphoblastoid cell lines (LCLs) derived from 57 unrelated Centre d’Etudes du Polymorphisme Humain (CEPH) CEU individuals (Utah residents with northern and western European ancestry) genotyped by the HapMap consortium17, allowing us to establish a possible genetic basis for any observed variations in transcript isoforms with associated SNPs. A linear regression analysis under a codominant model was carried out to associate probe set expression intensities with the genotypes of all SNP markers within a window of 50 kb flanking the boundaries of the transcript cluster (meta–probe set) containing the probe set. We assessed the statistical significance of the variation using the t-statistic, and used the regression equation to estimate the fold change in expression between the two homozygous genotypes. We used permutation testing18 to determine empirical P-values corre￾sponding to the asymptotic P-values obtained from the regression. Subsequently, we applied the false discovery rate (FDR) correction to establish a cutoff P-value of 9.73  109 , corresponding to the 0.05 FDR level (see Methods). This yielded 757 unique probe sets showing significant SNP associations, belonging to 317 unique meta–probe sets (Supplementary Table 1 online). Although the most significant SNPs may not be the causative polymorphisms responsible for these differences in probe set expression, they are very probably in linkage disequilibrium with the causative polymorphism(s). This is reflected in the distance distribution of associated polymorphisms, most of which are in close proximity to the probe sets (Supplementary Fig. 1 online). The association analysis at the transcript (meta–probe set) level resulted in a 0.05 FDR cutoff of 6.02  107, yielding 127 unique transcripts with significant genetic association at the gene expression Received 9 July 2007; accepted 31 October 2007; published online 13 January 2008; doi:10.1038/ng.2007.57 1Department of Human Genetics, McGill University and 2McGill University and Ge´nome Que´bec Innovation Centre, 740 Dr. Penfield, Room 7210, Montre´al, Que´bec H3A 1A4, Canada. 3Division of Hematology-Oncology, Research Centre, Sainte-Justine Hospital, Montre´al, Que´bec H3T 1C5, Canada. 4Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, Ontario M5G 1L7, Canada. Correspondence should be addressed to J.M. (jacek.majewski@mcgill.ca). NATURE GENETICS VOLUME 40 [ NUMBER 2 [ FEBRUARY 2008 225 LETTERS © 2008 Nature Publishing Group http://www.nature.com/naturegenetics
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有