正在加载图片...
Vol 437 27 October 2005 doi: 10. 1038/nature04226 nature ARTICLES A haplotype map of the human genome The International HapMap Consortium Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs)for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, ncluding ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution Despite the ever-accelerating pace of biomedical research, the root diabetes), PTPN22(rheumatoid arthritis and type 1 diabetes)? causes of common human diseases remain largely unknown, pre- insulin(type 1 diabetes ), CTLA4(autoimmune thyroid disease, type ventative measures are generally inadequate, and available treatments I diabetes), NOD2(inflammatory bowel disease)., complement are seldom curative. Family history is one of the strongest risk factors factor H(age-related macular degeneration)sS and RET(Hirsch- for nearly all diseases-including cardiovascular disease, cancer, sprung disease)b. among many others diabetes, autoimmunity, psychiatric illnesses and many others Systematic studies of common genetic variants are facilitated by providing the tantalizing but elusive clue that inherited genetic the fact that individuals who carry a particular SNP allele at one site variation has an important role in the pathogenesis of disease. often predictably carry specific alleles at other nearby variant sites Identifying the causal genes and variants would represent an impor- This correlation is known as linkage disequilibrium(LD); a particu tant step in the path towards improved prevention, diagnosis and lar combination of alleles along a chromosome is termed a haplotype treatment of disease LD exists because of the shared ancestry of contemporary chromo More than a thousand genes for rare, highly heritable 'mendelian somes. When a new causal variant arises through mutation -whether disorders have been identified, in which variation in a single gene is a single nucleotide change, insertion/deletion, or structural altera both necessary and sufficient to cause disease. Common disorders, in tion-it is initially tethered to a unique chromosome on which it contrast, have proven much more challenging to study, as they occurred, marked by a distinct combination of genetic variants are thought to be due to the combined effect of many different Recombination and mutation subsequently act to erode this associ- susceptibility DNA variants interacting with environmental factors. ation, but do so slowly (each occurring at an average rate of about Studies of common diseases have fallen into two broad categories: 10 per base pair(bp) per generation)as compared to the number family-based linkage studies across the entire genome, and popu- of generations(typically 10 to 10) since the mutational event lation-based association studies of individual candidate genes The correlations between causal mutations and the haplotypes on Although there have been notable successes, progress has been slow which they arose have long served as a tool for human genetic due to the inherent limitations of the methods; linkage analysis has research: first finding association to a haplotype, and then sub low power except when a single locus explains a substantial fraction sequently identifying the causal mutation(s) that it carries. This was of disease, and association studies of one or a few candidate genes pioneered in studies of the HLA region, extended to identify causal examine only a small fraction of the universe of sequence variation genes for mendelian diseases(for example, cystic fibrosis s and in each patient. diastrophic dysplasia), and most recently for complex disorders A comprehensive search for genetic influences on disease would such as age-related macular degeneration involve examining all genetic differences in a large number of affected Early information documented the existence of LD in the human individuals and controls. It may eventually become possible to genome20.; however, these studies were limited(for technical accomplish this by complete genome resequencing. In the meantime, reasons)to a small number of regions with incomplete data, and it is increasingly practical to systematically test common general patterns were challenging to discern. With the sequencing of ariants for their role in disease; such variants explain much the human genome and development of high-throughput genomic genetic diversity in our species, a consequence of the hist methods, it became clear that the human genome generally small size and shared ancestry of the human population. displays more LDthan under simple population genetic models Recent experience bears out the hypothesis that common variants and that LD is more varied across regions, and more segmentally have an important role in disease, with a partial list of validated structured2-3o, than had previously been supposed. These obser examples including HLA(autoimmunity and infection), APOE4 vations indicated that LD-based methods would generally have ( Alzheimer's disease, lipids)?, Factor VLeiden deep vein thrombosis), great value(because nearby SNPs were typically correlated with PPARG (encoding PPARY; type 2 diabetes), KCNJ1l(type 2 many of their neighbours), and also that LD relationships would ists of participants and affiliations appear at the end of the paper 2005 Nature Publishing Group© 2005 Nature Publishing Group A haplotype map of the human genome The International HapMap Consortium* Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution. Despite the ever-accelerating pace of biomedical research, the root causes of common human diseases remain largely unknown, pre￾ventative measures are generally inadequate, and available treatments are seldom curative. Family history is one of the strongest risk factors for nearly all diseases—including cardiovascular disease, cancer, diabetes, autoimmunity, psychiatric illnesses and many others— providing the tantalizing but elusive clue that inherited genetic variation has an important role in the pathogenesis of disease. Identifying the causal genes and variants would represent an impor￾tant step in the path towards improved prevention, diagnosis and treatment of disease. More than a thousand genes for rare, highly heritable ‘mendelian’ disorders have been identified, in which variation in a single gene is both necessary and sufficient to cause disease. Common disorders, in contrast, have proven much more challenging to study, as they are thought to be due to the combined effect of many different susceptibility DNA variants interacting with environmental factors. Studies of common diseases have fallen into two broad categories: family-based linkage studies across the entire genome, and popu￾lation-based association studies of individual candidate genes. Although there have been notable successes, progress has been slow due to the inherent limitations of the methods; linkage analysis has low power except when a single locus explains a substantial fraction of disease, and association studies of one or a few candidate genes examine only a small fraction of the ‘universe’ of sequence variation in each patient. A comprehensive search for genetic influences on disease would involve examining all genetic differences in a large number of affected individuals and controls. It may eventually become possible to accomplish this by complete genome resequencing. In the meantime, it is increasingly practical to systematically test common genetic variants for their role in disease; such variants explain much of the genetic diversity in our species, a consequence of the historically small size and shared ancestry of the human population. Recent experience bears out the hypothesis that common variants have an important role in disease, with a partial list of validated examples including HLA (autoimmunity and infection)1 , APOE4 (Alzheimer’s disease, lipids)2 , Factor VLeiden (deep vein thrombosis)3 , PPARG (encoding PPARg; type 2 diabetes)4,5, KCNJ11 (type 2 diabetes)6 , PTPN22 (rheumatoid arthritis and type 1 diabetes)7,8, insulin (type 1 diabetes)9 , CTLA4 (autoimmune thyroid disease, type 1 diabetes)10, NOD2 (inflammatory bowel disease)11,12, complement factor H (age-related macular degeneration)13–15 and RET (Hirsch￾sprung disease)16,17, among many others. Systematic studies of common genetic variants are facilitated by the fact that individuals who carry a particular SNP allele at one site often predictably carry specific alleles at other nearby variant sites. This correlation is known as linkage disequilibrium (LD); a particu￾lar combination of alleles along a chromosome is termed a haplotype. LD exists because of the shared ancestry of contemporary chromo￾somes. When a new causal variant arises through mutation—whether a single nucleotide change, insertion/deletion, or structural altera￾tion—it is initially tethered to a unique chromosome on which it occurred, marked by a distinct combination of genetic variants. Recombination and mutation subsequently act to erode this associ￾ation, but do so slowly (each occurring at an average rate of about 1028 per base pair (bp) per generation) as compared to the number of generations (typically 104 to 105 ) since the mutational event. The correlations between causal mutations and the haplotypes on which they arose have long served as a tool for human genetic research: first finding association to a haplotype, and then sub￾sequently identifying the causal mutation(s) that it carries. This was pioneered in studies of the HLA region, extended to identify causal genes for mendelian diseases (for example, cystic fibrosis18 and diastrophic dysplasia19), and most recently for complex disorders such as age-related macular degeneration13–15. Early information documented the existence of LD in the human genome20,21; however, these studies were limited (for technical reasons) to a small number of regions with incomplete data, and general patterns were challenging to discern. With the sequencing of the human genome and development of high-throughput genomic methods, it became clear that the human genome generally displays more LD22 than under simple population genetic models23, and that LD is more varied across regions, and more segmentally structured24–30, than had previously been supposed. These obser￾vations indicated that LD-based methods would generally have great value (because nearby SNPs were typically correlated with many of their neighbours), and also that LD relationships would ARTICLES *Lists of participants and affiliations appear at the end of the paper. Vol 437|27 October 2005|doi:10.1038/nature04226 1299
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有