正在加载图片...
NATUREIVol 437 27 October 2005 ARTICLES the squared correlation coefficient between the two SNPs. Thus, r-is The availability of nearly complete information about common I when two SNPs arose on the same branch of the genealogy and DNA variation in the ENCODE regions allowed a more precise remain undisrupted by recombination, but has a value less than 1 estimation of recombination rates across large regions than in any hen SNPs arose on different branches, or if an initially strong previous study. We estimated recombination rates and identifie orrelation has been disrupted by crossing over. recombination hotspots in the ENCODE data, using methods haplotype structure, r- values display a complex pattern, varying which recombination rates rise dramatically over local background from 0.0003 to 1.0, with no relationship to physical distance. This rates makes sense, however, because without recombination, correlations Whereas the average recombination rate over 500 kb across the long SNPs depend on the historical order in which they arose, not human genome is about 0.5 cM", the estimated recombination rate the physical order of SNPs on the chromosome across the 500-kb ENCODE regions varied nearly tenfold, from a Most importantly, the seeming complexity of r values can be minimum of 0.19 cM(ENm0137q21 13)to a maximum of 1. 25 cM convolved in a simple manner: only seven different SNP configur-(ENr2329q34 11). Even this tenfold variation obscures much more ations exist in this region, with all but two chromosomes matching dramatic variation over a finer scale: 88 hotspots of recombination five common haplotypes, which can be distinguished from each were identified(Fig 8; see also Supplementary Fig. 7)-that is, one other by typing a specific set of four SNPs. That is, only a small per 57 kb-with hotspots detected in each of the ten regions(from 4 minority of sites need be examined to capture fully the information in 12q12 to 14 in 2q37. 1). Across the 5 Mb, we estimate that about in this region. 80% of all recombination has taken place in about 15% of the Variation in local recombination rates is a major determinant of sequence(Fig 9, see also refs 46, 49) LD Recombination in the ancestors of the current population has A block-like structure of human LD. With most human recombina- typically disrupted the simple picture presented above. In the human tion occurring in recombination hotspots, the breakdown of LD genome, as in yeast", mouse and other genomes, recombination is often discontinuous. A 'block-like structure of LD is visually rates typically vary dramatically on a fine scale, with hotspots of apparent in Fig 8 and Supplementary Fig. 7: segments of consistently recombination explaining much crossing over in each region2. The high D that break down where high recombination rates, recombi- generality of this model has recently been demonstrated through nation hotspots and obligate recombination eventsall cluster. tational methods that allow estimation of recombination rates When haplotype blocks are more formally defined in the (including hotspots and coldspots) from genotype data"d eNCOdE data(using a method based on a composite of local D 0.6 00.20.40.60.81.0 020 60.81.0 YRI allele frequency CEU allele frequency c1.0 d1.0 0.2 00.2040.6081.0 YRI allele frequency CHB allele frequency 00200300400500600+ of analysis panels and between the CHB and JPT sample sets. For each are common in one panel but.e Figure 6 Comparison of allele frequencies in the ENCODE data for all pairs given set of allele frequencies. The purple regions show that very few SNPs another. The red polymorphic SNP we identified the minor allele all panels(a-d)and there are many SNPs that have similar low frequencies in each pair then calculated the frequency of this allele in each analysis panel/sample set. analysis panels/sample sets The colour in each bin represents the number of SNPs that display each 2005 Nature Publishing Group© 2005 Nature Publishing Group the squared correlation coefficient between the two SNPs. Thus, r 2 is 1 when two SNPs arose on the same branch of the genealogy and remain undisrupted by recombination, but has a value less than 1 when SNPs arose on different branches, or if an initially strong correlation has been disrupted by crossing over. In this region, D0 ¼ 1 for all marker pairs, as there is no evidence of historical recombination. In contrast, and despite great simplicity of haplotype structure, r 2 values display a complex pattern, varying from 0.0003 to 1.0, with no relationship to physical distance. This makes sense, however, because without recombination, correlations among SNPs depend on the historical order in which they arose, not the physical order of SNPs on the chromosome. Most importantly, the seeming complexity of r 2 values can be deconvolved in a simple manner: only seven different SNP configur￾ations exist in this region, with all but two chromosomes matching five common haplotypes, which can be distinguished from each other by typing a specific set of four SNPs. That is, only a small minority of sites need be examined to capture fully the information in this region. Variation in local recombination rates is a major determinant of LD. Recombination in the ancestors of the current population has typically disrupted the simple picture presented above. In the human genome, as in yeast44, mouse45 and other genomes, recombination rates typically vary dramatically on a fine scale, with hotspots of recombination explaining much crossing over in each region28. The generality of this model has recently been demonstrated through computational methods that allow estimation of recombination rates (including hotspots and coldspots) from genotype data46,47. The availability of nearly complete information about common DNA variation in the ENCODE regions allowed a more precise estimation of recombination rates across large regions than in any previous study. We estimated recombination rates and identified recombination hotspots in the ENCODE data, using methods previously described46 (see Supplementary Information for details). Hotspots are short regions (typically spanning about 2 kb) over which recombination rates rise dramatically over local background rates. Whereas the average recombination rate over 500 kb across the human genome is about 0.5 cM48, the estimated recombination rate across the 500-kb ENCODE regions varied nearly tenfold, from a minimum of 0.19 cM (ENm013.7q21.13) to a maximum of 1.25 cM (ENr232.9q34.11). Even this tenfold variation obscures much more dramatic variation over a finer scale: 88 hotspots of recombination were identified (Fig. 8; see also Supplementary Fig. 7)—that is, one per 57 kb—with hotspots detected in each of the ten regions (from 4 in 12q12 to 14 in 2q37.1). Across the 5 Mb, we estimate that about 80% of all recombination has taken place in about 15% of the sequence (Fig. 9, see also refs 46, 49). A block-like structure of human LD. With most human recombina￾tion occurring in recombination hotspots, the breakdown of LD is often discontinuous. A ‘block-like’ structure of LD is visually apparent in Fig. 8 and Supplementary Fig. 7: segments of consistently high D0 that break down where high recombination rates, recombi￾nation hotspots and obligate recombination events50 all cluster. When haplotype blocks are more formally defined in the ENCODE data (using a method based on a composite of local D0 Figure 6 | Comparison of allele frequencies in the ENCODE data for all pairs of analysis panels and between the CHB and JPT sample sets. For each polymorphic SNP we identified the minor allele across all panels (a–d) and then calculated the frequency of this allele in each analysis panel/sample set. The colour in each bin represents the number of SNPs that display each given set of allele frequencies. The purple regions show that very few SNPs are common in one panel but rare in another. The red regions show that there are many SNPs that have similar low frequencies in each pair of analysis panels/sample sets. NATURE|Vol 437|27 October 2005 ARTICLES 1305
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有