正在加载图片...
ARTICLES NATUREIVol 437 27 October 2005 correlation with one or more others. When two variants are perfectly ENr1122p16.3 EN321.8q24.11 orrelated one is exactly equivalent to testing the other; we efer to such collections of SNPs (with pairwise r=1.0 in the ENr13.4q26 ENr123.12q12 HapMap samples)as perfect proxy sets ENm010.7p152ENr213.18q12 Considering only common SNPs (the target of study for the ENm013702113 -All ENCODE HapMap Project)in CEU in the ENCODE data, one in five SNPs ENm014.7q31.33 has 20 or more perfect proxies, and three in five have five or more. In contrast, one in five has no perfect proxies. As expected, perfect proxy sets are smaller in YRI, with twice as many SNPs( two in five) 已0.4 having no perfect proxy, and a quarter as many (5%)having 20 or more(Figs 11 and 12). These patterns are largely consistent across the range of frequencies studied he project, with a trend towards fewer proxies at MAF <0. 10(Fig. 11). Put another way 0.1 the average common SNP in ENCODE is perfectly redundant with three other SNPs in the YRI samples, and nine to ten other SNPs in other sample sets(Fig. 13) Proportion of Of course, to be detected through LD in an association study Figure| The distribution of recombination events over the ENCODE correlation need not be complete between the genotyped SNP and the causal variant. For example, under a multiplicative disease model recombination for the ten encode ns(coloured lines)and combined and a single-locus x test, the sample size required to detect (black line). For each line, SNP intervals are placed in decreasing order of association to an allele scales as 1/r- That is, if the causal SNP has estimated recombination rate" combined across analysis panels, and the an r=0.5 to one tested in the disease study, full power can be cumulative recombination fraction is plotted against the cumulative maintained if the sample size is doubled. proportion of sequence. If recombination rates were constant, each line The number of SNPs showing such substantial but incomplete would lie exactly along the diagonal, and so lines further to the right reveal correlation is much larger. For example, using a looser threshold for the fraction of regions where recombination is more strongly locally concentrated. declaring correlation(r 20.5), the average number of proxies found for a common SNP in CHB+JPT is 43, and the average in YRI is 16(Fig. 12). These partial correlations can be exploited The typical SNP is highly correlated with many of its neighbours. through haplotype analysis to increase power to detect putative The encode data reveal that snps ar to several nearby SNPs, and partially corRelated perfectly correlated causal alleles, as discussed below to many others. Evaluating p erformance of the phase I estimate the We use the term proxy to mean a SNP that shows a strong proportion of all common SNPs captured ase I map, we ,",,""--;: 9q13). Haplotypes are coloured by the number events they span, with red indicating many redundant haplotypes with frequency of at least 5% in the combined sample events and blue few (bars)and genes(black segments) are shown in an example gene-dense 2005 Nature Publishing Group© 2005 Nature Publishing Group The typical SNP is highly correlated with many of its neighbours. The ENCODE data reveal that SNPs are typically perfectly correlated to several nearby SNPs, and partially correlated to many others. We use the term proxy to mean a SNP that shows a strong correlation with one or more others. When two variants are perfectly correlated, testing one is exactly equivalent to testing the other; we refer to such collections of SNPs (with pairwise r 2 ¼ 1.0 in the HapMap samples) as ‘perfect proxy sets’. Considering only common SNPs (the target of study for the HapMap Project) in CEU in the ENCODE data, one in five SNPs has 20 or more perfect proxies, and three in five have five or more. In contrast, one in five has no perfect proxies. As expected, perfect proxy sets are smaller in YRI, with twice as many SNPs (two in five) having no perfect proxy, and a quarter as many (5%) having 20 or more (Figs 11 and 12). These patterns are largely consistent across the range of frequencies studied by the project, with a trend towards fewer proxies at MAF , 0.10 (Fig. 11). Put another way, the average common SNP in ENCODE is perfectly redundant with three other SNPs in the YRI samples, and nine to ten other SNPs in the other sample sets (Fig. 13). Of course, to be detected through LD in an association study, correlation need not be complete between the genotyped SNP and the causal variant. For example, under a multiplicative disease model and a single-locus x2 test, the sample size required to detect association to an allele scales as 1/r 2 . That is, if the causal SNP has an r 2 ¼ 0.5 to one tested in the disease study, full power can be maintained if the sample size is doubled. The number of SNPs showing such substantial but incomplete correlation is much larger. For example, using a looser threshold for declaring correlation (r 2 $ 0.5), the average number of proxies found for a common SNP in CHBþJPT is 43, and the average in YRI is 16 (Fig. 12). These partial correlations can be exploited through haplotype analysis to increase power to detect putative causal alleles, as discussed below. Evaluating performance of the Phase I map. To estimate the proportion of all common SNPs captured by the Phase I map, we Figure 10 | The relationship among recombination rates, haplotype lengths and gene locations. Recombination rates in cM Mb21 (blue). Non￾redundant haplotypes with frequency of at least 5% in the combined sample (bars) and genes (black segments) are shown in an example gene-dense region of chromosome 19 (19q13). Haplotypes are coloured by the number of detectable recombination events they span, with red indicating many events and blue few. Figure 9 | The distribution of recombination events over the ENCODE regions. Proportion of sequence containing a given fraction of all recombination for the ten ENCODE regions (coloured lines) and combined (black line). For each line, SNP intervals are placed in decreasing order of estimated recombination rate46, combined across analysis panels, and the cumulative recombination fraction is plotted against the cumulative proportion of sequence. If recombination rates were constant, each line would lie exactly along the diagonal, and so lines further to the right reveal the fraction of regions where recombination is more strongly locally concentrated. ARTICLES NATURE|Vol 437|27 October 2005 1308
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有