正在加载图片...
ARTICLES NATUREIVol 437 27 October 2005 Properties of LD in the human genome Traditionally, des ptions of Ld have focuse ed on measures calcu- lated between pairs of SNPs, averaged as a function of physical distance. Examples of such analyses for the HapMap data are presented in Supplementary Fig. 6. After adjusting for known confounders such as sample size, allele frequency distribution, marker density, and length of sampled regions, these data are highly similar to previously published surveys" Because LD varies markedly on scales of 1-100 kb, and is often discontinuous rather than declining smoothly with distance. averages obscure important aspects of LD structure. A fuller explora tion of the fine-scale structure of LD offers both insight into the causes of LD and understanding of its application to disease research LD patterns are simple in the absence of recombination. The most 0.5 natural path to understanding LD structure is first to consider the implest case in which there is no recombination (or gene conver Figure 4 Minor allele frequency distribution of SNPs in the ENCODE data, on), and then to add recombination to the model.( For simplicity and their contribution to heterozygosity. This figure shows the we ignore genotyping error and recurrent mutation in this discus polymorphic SNPs from the HapMap ENCODE regions according to mi sion, both of which seem to be rare in these data allele frequency(blue), with the lowest minor allele frequency bin(<0.05) In the absence of recombination, diversity arises solely through separated into singletons(SNPs he gous in one individual only, shown mutation. Because each SNP arose on a particular branch of the grey)and SNPs with more than one heterozygous individual. For this genealogical tree relating the chromosomes in the current popu alysis, MAF is averaged across the analysis panels. The sum of the lations, multiple haplotypes are observed. SNPs that arose on the ntribution of each MaF bin to the overall heterozygosity of the ENCODE same branch of the genealogy are perfectly correlated in the sample, egions is also shown (orange) whereas SNPs that occurred on different branches have imperfect correlations, or no correlation at all. We illustrate these concepts using empirical genotype data from 36 selection at this locus(see below; M. L. Freedman et al., personal adjacent SNPs in an ENCODE region(ENr1312q 37), selected because no obligate recombination events were detecta e anon aplotype sharing across populations. We next examined the them in CEU (Fig. 7). (We note that the lack of obligate recombina extent to which haplotypes are shared across populations. We used tion events in a small sample does not guarantee that no recombi a hidden Markov model in which each haplotype is modelled in turn nants have occurred, but it provides a good approximation for as an imperfect mosaic of other haplotypes(see Supplementary illustration. Information). In essence, the method infers probabilistically In principle, 36 such SNPs could give rise to 26 different haplo- which other haplotype in the sample is the closest relative(nearest types. Even with no recombination, gene conversion or recurrent neighbour)at each position along the chromosome 9. Unsurprisingly, the nearest neighbour most often is from the same great potential diversity, only seven haplotypes are observed(five to match a haplotype in another panel (Supplementary Fig. 5). All studied, reflecting shared ancestry since their most recent common individuals have at least some segments over which the nearest ancestor among apparently unrelated individuals neighbour is in a diffe erent a alysis panel. These results indicate In such a setting, it is easy to interpret the two most common nat althor alysis panels are characterized both by different pairwise measures of LD: D and r.(See the Supplementary haplotype frequencies and, to some extent, different combinations of Information for fuller definitions of these measures. )D is defined leles, both common and rare haplotypes are often shared across to be I in the absence of obligate recombination, declining only due aton to recombination or recurrent mutation. In contrast, r is simply CEU CHB+JPT 0.2 0.1 00.1020.30.40.500.10.20.3040.500.102030.40.5 Minor allele frequency analysis panel we plotted(bars)the MAF distribution of all the Phas distribution expected for the standard neutral population with a frequency greater than zero. The solid line shows the MAF constant population size and random mating without asd 2005 Nature Publishing Group© 2005 Nature Publishing Group selection at this locus (see below; M. L. Freedman et al., personal communication). Haplotype sharing across populations. We next examined the extent to which haplotypes are shared across populations. We used a hidden Markov model in which each haplotype is modelled in turn as an imperfect mosaic of other haplotypes (see Supplementary Information)42. In essence, the method infers probabilistically which other haplotype in the sample is the closest relative (nearest neighbour) at each position along the chromosome. Unsurprisingly, the nearest neighbour most often is from the same analysis panel, but about 10% of haplotypes were found most closely to match a haplotype in another panel (Supplementary Fig. 5). All individuals have at least some segments over which the nearest neighbour is in a different analysis panel. These results indicate that although analysis panels are characterized both by different haplotype frequencies and, to some extent, different combinations of alleles, both common and rare haplotypes are often shared across populations. Properties of LD in the human genome Traditionally, descriptions of LD have focused on measures calcu￾lated between pairs of SNPs, averaged as a function of physical distance. Examples of such analyses for the HapMap data are presented in Supplementary Fig. 6. After adjusting for known confounders such as sample size, allele frequency distribution, marker density, and length of sampled regions, these data are highly similar to previously published surveys43. Because LD varies markedly on scales of 1–100 kb, and is often discontinuous rather than declining smoothly with distance, averages obscure important aspects of LD structure. A fuller explora￾tion of the fine-scale structure of LD offers both insight into the causes of LD and understanding of its application to disease research. LD patterns are simple in the absence of recombination. The most natural path to understanding LD structure is first to consider the simplest case in which there is no recombination (or gene conver￾sion), and then to add recombination to the model. (For simplicity we ignore genotyping error and recurrent mutation in this discus￾sion, both of which seem to be rare in these data.) In the absence of recombination, diversity arises solely through mutation. Because each SNP arose on a particular branch of the genealogical tree relating the chromosomes in the current popu￾lations, multiple haplotypes are observed. SNPs that arose on the same branch of the genealogy are perfectly correlated in the sample, whereas SNPs that occurred on different branches have imperfect correlations, or no correlation at all. We illustrate these concepts using empirical genotype data from 36 adjacent SNPs in an ENCODE region (ENr131.2q37), selected because no obligate recombination events were detectable among them in CEU (Fig. 7). (We note that the lack of obligate recombina￾tion events in a small sample does not guarantee that no recombi￾nants have occurred, but it provides a good approximation for illustration.) In principle, 36 such SNPs could give rise to 236 different haplo￾types. Even with no recombination, gene conversion or recurrent mutation, up to 37 different haplotypes could be formed. Despite this great potential diversity, only seven haplotypes are observed (five seen more than once) among the 120 parental CEU chromosomes studied, reflecting shared ancestry since their most recent common ancestor among apparently unrelated individuals. In such a setting, it is easy to interpret the two most common pairwise measures of LD: D0 and r 2 . (See the Supplementary Information for fuller definitions of these measures.) D0 is defined to be 1 in the absence of obligate recombination, declining only due to recombination or recurrent mutation27. In contrast, r 2 is simply Figure 4 | Minor allele frequency distribution of SNPs in the ENCODE data, and their contribution to heterozygosity. This figure shows the polymorphic SNPs from the HapMap ENCODE regions according to minor allele frequency (blue), with the lowest minor allele frequency bin (,0.05) separated into singletons (SNPs heterozygous in one individual only, shown in grey) and SNPs with more than one heterozygous individual. For this analysis, MAF is averaged across the analysis panels. The sum of the contribution of each MAF bin to the overall heterozygosity of the ENCODE regions is also shown (orange). Figure 5 | Allele frequency distributions for autosomal SNPs. For each analysis panel we plotted (bars) the MAF distribution of all the Phase I SNPs with a frequency greater than zero. The solid line shows the MAF distribution for the ENCODE SNPs, and the dashed line shows the MAF distribution expected for the standard neutral population model with constant population size and random mating without ascertainment bias. ARTICLES NATURE|Vol 437|27 October 2005 1304
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有