正在加载图片...
NATUREIVol 437 27 October 2005 ARTICLES estimated from the ENCODE data, where deep sequencing reduces A simple measure of population differentiation is Wright's Fs bias due to SNP ascertainment. Consistent with previous studies, which measures the fraction of total genetic variation due to most SNPs observed in the ENCODE regions are rare: 46% had between-population differences". Across the autosomes, Fsr esti- MAF <0.05, and 9% were seen in only a single individual(Fig. 4). mated from the full set of Phase I data is 0.12, with Although most varying sites in the population are rare, most CHB+JPT showing the lowest level of differentiation(FST =0.07), heterozygous sites within any individual are due to common SNPs. and YRI and CHB-+JPT the highest(FST=0.12). These values are Specifically, in the ENCODE data, 90% of heterozygous sites in each slightly higher than previous reports, but differences in the types of individual were due to common variants(Fig. 4). With ever-deeper variants(SNPs versus microsatellites)and the samples studied make quencing of DNA samples the number of rare variants will rise comparisons difficult. LD) in existing databases(Fig 3) anels). Across the I million SNPs genotyped, only ll have fixed Consistent with previous descriptions, the CEU, CHB and JPt differences between CEU and YRI, 21 between CEU and CHB-+JPT, samples show fewer low frequency alleles when compared to the Yri and 5 between YRI and CHB+JPT, for the autosomes samples(Fig. 5), a pattern thought to be due to bottlenecks in the The extent of differentiation is similar across the autosomes, but history of the non-YRI populations higher on the X chromosome (FST=0.21). Interestingly, 123 SNPs In contrast to the ENCODE data, the distribution of allele on the X chromosome were completely differentiated between frequencies for the genome-wide data is flat(Fig. 5), with much YRI and CHB+JPT, but only two between CEU and YRI and one more similarity in the distributions observed in the three analysis between CEU and CHB+JPT. This seems to be largely due to a single panels. These patterns are well explained by the inherent and region near the centromere, possibly indicating a history of natural intentional bias in the rules used for SNP selection: we prioritized using validated SNPs in order to focus resources on common(rather an rare or false positive)candidate SNPs from the public databases For a fuller discussion of ascertainment issues, including a shift in frequencies over time and an excess of high-frequency derived alleles due to inclusion of chimpanzee data in determination of double-hit 家 status,see the Supplementary Information( Supplementary Fig 3 SNP allele frequencies ss population samples. Of the 1.007 o 60 million SNPs successfully genotyped and polymorphic across the three analysis panels, only a subset were polymorphic in any given940 panel:85% in YRI, 79% in CEU, and 75% in CHB+JPT. The joint 8 2 distribution of frequencies across populations is presented in Fig.6 a (for the ENCODE data)and Supplementary Fig 4(for the genome wide map). We note the similarity of allele frequencies in the CHB and JPTsamples, which motivates analysing them jointly as a single analysis panel in the remainder of this report Table 4 mtDNA and Y chromosome haplogroups 8z6588 CEU (60) JPT(44) LI 31835678639616813697512SNPs 043 0 0z0coto =557 DNA sample Figure 3 Allele frequency and completeness of dbsNP for the ENCODE regions. a-c, The fraction of SNPs in dbSNP, or with a proxy in dbSNP, are Y chromosome haplogroup YRI(30 CEU (30) CHB (22) JPT (22) shown as a function of minor allele frequency foreach analysis panel(a, YRI; b, CEU; C, CHB-+JPT). Singletons refer to heterozygotes observed in a single dividual, and are broken out from other SNps with maF 005 Because FH K 0.23 0.14 lI ENCODE SNPs have been dep SNP as"in dbSNP'if it would be in dbSNP build 125 independent of the 0.70 HapMap ENCODE resequencing project. All remaining SNPs(not 0.09 dbsNP)were discovered only by ENCODE resequencing; they are 0. 45 categorized by their correlation(r2)to those in dbSNP. Note that the number of SNPs in each frequency bin differs among analysis pane because not all SNPs are polymorphic in all ana 1303 2005 Nature Publishing Group© 2005 Nature Publishing Group estimated from the ENCODE data, where deep sequencing reduces bias due to SNP ascertainment. Consistent with previous studies, most SNPs observed in the ENCODE regions are rare: 46% had MAF , 0.05, and 9% were seen in only a single individual (Fig. 4). Although most varying sites in the population are rare, most heterozygous sites within any individual are due to common SNPs. Specifically, in the ENCODE data, 90% of heterozygous sites in each individual were due to common variants (Fig. 4). With ever-deeper sequencing of DNA samples the number of rare variants will rise linearly, but the vast majority of heterozygous sites in each person will be explained by a limited set of common SNPs now contained (or captured through LD) in existing databases (Fig. 3). Consistent with previous descriptions, the CEU, CHB and JPT samples show fewer low frequency alleles when compared to the YRI samples (Fig. 5), a pattern thought to be due to bottlenecks in the history of the non-YRI populations. In contrast to the ENCODE data, the distribution of allele frequencies for the genome-wide data is flat (Fig. 5), with much more similarity in the distributions observed in the three analysis panels. These patterns are well explained by the inherent and intentional bias in the rules used for SNP selection: we prioritized using validated SNPs in order to focus resources on common (rather than rare or false positive) candidate SNPs from the public databases. For a fuller discussion of ascertainment issues, including a shift in frequencies over time and an excess of high-frequency derived alleles due to inclusion of chimpanzee data in determination of double-hit status, see the Supplementary Information (Supplementary Fig. 3). SNP allele frequencies across population samples. Of the 1.007 million SNPs successfully genotyped and polymorphic across the three analysis panels, only a subset were polymorphic in any given panel: 85% in YRI, 79% in CEU, and 75% in CHBþJPT. The joint distribution of frequencies across populations is presented in Fig. 6 (for the ENCODE data) and Supplementary Fig. 4 (for the genome￾wide map). We note the similarity of allele frequencies in the CHB and JPT samples, which motivates analysing them jointly as a single analysis panel in the remainder of this report. A simple measure of population differentiation is Wright’s FST, which measures the fraction of total genetic variation due to between-population differences40. Across the autosomes, FST esti￾mated from the full set of Phase I data is 0.12, with CEU and CHBþJPT showing the lowest level of differentiation (FST ¼ 0.07), and YRI and CHBþJPT the highest (FST ¼ 0.12). These values are slightly higher than previous reports41, but differences in the types of variants (SNPs versus microsatellites) and the samples studied make comparisons difficult. As expected, we observed very few fixed differences (that is, cases in which alternate alleles are seen exclusively in different analysis panels). Across the 1 million SNPs genotyped, only 11 have fixed differences between CEU and YRI, 21 between CEU and CHBþJPT, and 5 between YRI and CHBþJPT, for the autosomes. The extent of differentiation is similar across the autosomes, but higher on the X chromosome (FST ¼ 0.21). Interestingly, 123 SNPs on the X chromosome were completely differentiated between YRI and CHBþJPT, but only two between CEU and YRI and one between CEU and CHBþJPT. This seems to be largely due to a single region near the centromere, possibly indicating a history of natural Table 4 | mtDNA and Y chromosome haplogroups DNA sample* MtDNA haplogroup YRI (60) CEU (60) CHB (45) JPT (44) L1 0.22 – – – L2 0.35 – – – L3 0.43 – – – A – – 0.13 0.04 B – – 0.33 0.30 C – – 0.09 0.07 D – – 0.22 0.34 M/E – – 0.22 0.25 H – 0.45 – – V – 0.07 – – J – 0.08 – – T – 0.12 – – K – 0.03 – – U – 0.23 – – W – 0.02 – – DNA sample* Y chromosome haplogroup YRI (30) CEU (30) CHB (22) JPT (22) E1 0.07 – – – E3a 0.93 – – – F, H, K – 0.03 0.23 0.14 I – 0.27 – – R1 – 0.70 – – C – – 0.09 0.09 D – – – 0.45 NO – – 0.68 0.32 *Number of chromosomes sampled is given in parentheses. Figure 3 | Allele frequency and completeness of dbSNP for the ENCODE regions. a–c, The fraction of SNPs in dbSNP, or with a proxy in dbSNP, are shown as a function of minor allele frequency for each analysis panel (a, YRI; b, CEU; c, CHBþJPT). Singletons refer to heterozygotes observed in a single individual, and are broken out from other SNPs with MAF , 0.05. Because all ENCODE SNPs have been deposited in dbSNP, for this figure we define a SNP as ‘in dbSNP’ if it would be in dbSNP build 125 independent of the HapMap ENCODE resequencing project. All remaining SNPs (not in dbSNP) were discovered only by ENCODE resequencing; they are categorized by their correlation (r 2 ) to those in dbSNP. Note that the number of SNPs in each frequency bin differs among analysis panels, because not all SNPs are polymorphic in all analysis panels. NATURE|Vol 437|27 October 2005 ARTICLES 1303
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有