正在加载图片...
ARTICLES NATUREIVol 437 27 October 2005 Table 1 Genotyping centres RIKEN 5,114.15.16,n7,19 Third Wave Invader Wellcome Trust Sanger Institute 1,6,10.B3.20 McGill University and Genome Quebec Innovation Centre 18q,22,X umina BeadEr road Institute of harvard and mit 4g, 7g. 18p, Y, mtDNA m mass Extend illumin Baylor College of Medicine with Par Allele BioScience University of California, San Francisco, with Washington University in St Louis PerkinElmer AcycloPrime-FP Perlegen Sciences 5Mb(ENCODE)on 2, 4, 7 igh-density oligonucleotide array The Chinese HapMap Consortium consists of the Beijing Genomics Institute, the Chinese National Human genome Center at Beijing, the University of Hong Kong, the hong Kong University Hong Kong, and th need to be empirically determined across the genome by studying ( for example, Yoruba in Ibadan, Nigeria)to describe the samples polymorphisms at high density in population samples initially. Because the CHB and JPT allele frequencies are generally The International Hap Map Project was launched in October 2002 very similar, some analyses below combine these data sets. When to create a public, genome-wide database of common human doing so, we refer to three analysis panels(YRI, CEU, CHB-+JPT)to sequence variation, providing information needed as a guic confusing this analytic ch with th genetic studies of clinical phenotypes. The project had become population practical by the confluence of the following: (1)the availability of Important details about the design of the Hap Map Project are the human genome sequence;(2)databases of common SNPs presented in the Methods, including:(1)organization of the project subsequently enriched by this project) from which genotyping (2)selection of DNA samples for study; (3)increasing the number assa,, inexpensive, accurate technologies for high-throughput SNP 2.6 million to 9.2 million( Fig. 1); (4)targeted sequencing of the ter and annotation of SNPs in the public SNP map(dbSNP) from genotyping;(5)web-based tools for storing and sharing data; and ENCODE regions, including evaluations of false-positive and false (6)frameworks to address associated ethical and cultural issues. negative rates;(5)genotyping for the genome-wide map; (6)intense The project follows the data release principles of an international efforts that monitored and established the high quality of the data; communityresourceproject(http://www.wellcome.ac.uk/and(7)datacoordinationanddistributionthroughtheprojectData doc_wtdo03208.html),sharinginformationrapidlyandwithoutCoordinationCenter(dcc)(http://www.hapmap.org) restriction on its use Description of the data. The Phase I HapMap contains 1,007, 329 The HapMap data were generated with the primary aim of guiding SNPs that passed a set of quality control(QC)filters(see Methods)in he design and analysis of medical genetic studies. In addition, the each of the three analysis panels, and are polymorphic across the 269 advent of genome-wide variation resources such as the HapMap samples. SNP genotyping was distributed across centres by chromo- opens a new era in population genetics, offering an unprecedented omal region, with several technologies employed(Table 1). Each opportunity to investigate the evolutionary forces that have shaped centre followed the same standard rules for SNP selection, quality variation in natural populations. control and data release; all SNPs were genotyped in the full set of 269 samples. Some centres genotyped more SNPs than required by the The Phase I HapMap Phase I of the HapMap Project set as a goal genotyping at least one Extensive, blinded quality assessment(QA)exercises documented common SNPevery 5 kilobases(kb)across the genome in each of 269 that these data are highly accurate(99.7%)and complete(99.3%,see DNA samples. For the sake of practicality, and motivated by the allele frequency distribution of variants in the human genome a minor allele frequency(MAF)of 0.05 or greater was targeted for study. (For 10 mplicity, in this paper we will use the term common'to m SNP with MAF 20.05. ) The project has a Phase II, which is 2 attempting genotyping of an additional 4.6 million SNPs in each of98 the Hap Map samples. To compare the genome-wide resource to a more complete 4 database of common variation-one in which all common SNPs 0 6 and many rarer ones have been discovered and tested--a representa- o tive collection of ten regions, each 500 kb in length, was selected from the ENCODE (Encyclopedia of DNA Elements) Project. Each 500-kb region was sequenced in 48 individuals, and all SNPs in these regions (discovered or in dbSNP) were genotyped in the omplete set of 269 DNA samples The specific samples examined are:(1)90 individuals (station 01 02 03 o4 a1 02 03 04 Q1 02 Q3 o4 a1 a2 a3 04 o1 @2 @3 o4 o1 02 Q3 YRI); (2)90 individuals (30 trios)in Utah, USA, from the Centre d'Etude du Polymorphisme Humain collection(abbreviation CEU 2004 2005 (3)45 Han Chinese in Beijing, China(abbreviation CHB);(4)44 Japanese in Tokyo, Japan(abbreviation JPT ause none of the samples was collected to be representative of a gure 1 Number of SNPs in dbSNP over time. The cumulative number of non-redundant SNPs(each mapped to a single location larger population such as Yoruba, Northern and Western European, shown as a solid line. as well as the number of SNps valida Han Chinese, or Japanese(let alone of all populations from Africa, (dotted line) and double-hit status(dashed line). Years Europe, or Asia), we recommend using a specific local identifier quarters(Q1-Q4 2005 Nature Publishing Group© 2005 Nature Publishing Group need to be empirically determined across the genome by studying polymorphisms at high density in population samples. The International HapMap Project was launched in October 2002 to create a public, genome-wide database of common human sequence variation, providing information needed as a guide to genetic studies of clinical phenotypes31. The project had become practical by the confluence of the following: (1) the availability of the human genome sequence; (2) databases of common SNPs (subsequently enriched by this project) from which genotyping assays could be designed; (3) insights into human LD; (4) develop￾ment of inexpensive, accurate technologies for high-throughput SNP genotyping; (5) web-based tools for storing and sharing data; and (6) frameworks to address associated ethical and cultural issues32. The project follows the data release principles of an international community resource project (http://www.wellcome.ac.uk/ doc_WTD003208.html), sharing information rapidly and without restriction on its use. The HapMap data were generated with the primary aim of guiding the design and analysis of medical genetic studies. In addition, the advent of genome-wide variation resources such as the HapMap opens a new era in population genetics, offering an unprecedented opportunity to investigate the evolutionary forces that have shaped variation in natural populations. The Phase I HapMap Phase I of the HapMap Project set as a goal genotyping at least one common SNP every 5 kilobases (kb) across the genome in each of 269 DNA samples. For the sake of practicality, and motivated by the allele frequency distribution of variants in the human genome, a minor allele frequency (MAF) of 0.05 or greater was targeted for study. (For simplicity, in this paper we will use the term ‘common’ to mean a SNP with MAF $ 0.05.) The project has a Phase II, which is attempting genotyping of an additional 4.6 million SNPs in each of the HapMap samples. To compare the genome-wide resource to a more complete database of common variation—one in which all common SNPs and many rarer ones have been discovered and tested—a representa￾tive collection of ten regions, each 500 kb in length, was selected from the ENCODE (Encyclopedia of DNA Elements) Project33. Each 500-kb region was sequenced in 48 individuals, and all SNPs in these regions (discovered or in dbSNP) were genotyped in the complete set of 269 DNA samples. The specific samples examined are: (1) 90 individuals (30 parent– offspring trios) from the Yoruba in Ibadan, Nigeria (abbreviation YRI); (2) 90 individuals (30 trios) in Utah, USA, from the Centre d’Etude du Polymorphisme Humain collection (abbreviation CEU); (3) 45 Han Chinese in Beijing, China (abbreviation CHB); (4) 44 Japanese in Tokyo, Japan (abbreviation JPT). Because none of the samples was collected to be representative of a larger population such as ‘Yoruba’, ‘Northern and Western European’, ‘Han Chinese’, or ‘Japanese’ (let alone of all populations from ‘Africa’, ‘Europe’, or ‘Asia’), we recommend using a specific local identifier (for example, ‘Yoruba in Ibadan, Nigeria’) to describe the samples initially. Because the CHB and JPT allele frequencies are generally very similar, some analyses below combine these data sets. When doing so, we refer to three ‘analysis panels’ (YRI, CEU, CHBþJPT) to avoid confusing this analytical approach with the concept of a ‘population’. Important details about the design of the HapMap Project are presented in the Methods, including: (1) organization of the project; (2) selection of DNA samples for study; (3) increasing the number and annotation of SNPs in the public SNP map (dbSNP) from 2.6 million to 9.2 million (Fig. 1); (4) targeted sequencing of the ten ENCODE regions, including evaluations of false-positive and false￾negative rates; (5) genotyping for the genome-wide map; (6) intense efforts that monitored and established the high quality of the data; and (7) data coordination and distribution through the project Data Coordination Center (DCC) (http://www.hapmap.org). Description of the data. The Phase I HapMap contains 1,007,329 SNPs that passed a set of quality control (QC) filters (see Methods) in each of the three analysis panels, and are polymorphic across the 269 samples. SNP genotyping was distributed across centres by chromo￾somal region, with several technologies employed (Table 1). Each centre followed the same standard rules for SNP selection, quality control and data release; all SNPs were genotyped in the full set of 269 samples. Some centres genotyped more SNPs than required by the rules. Extensive, blinded quality assessment (QA) exercises documented that these data are highly accurate (99.7%) and complete (99.3%, see Table 1 | Genotyping centres Centre Chromosomes Technology RIKEN 5, 11, 14, 15, 16, 17, 19 Third Wave Invader Wellcome Trust Sanger Institute 1, 6, 10, 13, 20 Illumina BeadArray McGill University and Ge´nome Que´bec Innovation Centre 2, 4p Illumina BeadArray Chinese HapMap Consortium* 3, 8p, 21 Sequenom MassExtend, Illumina BeadArray Illumina 8q, 9, 18q, 22, X Illumina BeadArray Broad Institute of Harvard and MIT 4q, 7q, 18p, Y, mtDNA Sequenom MassExtend, Illumina BeadArray Baylor College of Medicine with ParAllele BioScience 12 ParAllele MIP University of California, San Francisco, with Washington University in St Louis 7p PerkinElmer AcycloPrime-FP Perlegen Sciences 5 Mb (ENCODE) on 2, 4, 7, 8, 9, 12, 18 in CEU High-density oligonucleotide array *The Chinese HapMap Consortium consists of the Beijing Genomics Institute, the Chinese National Human Genome Center at Beijing, the University of Hong Kong, the Hong Kong University of Science and Technology, the Chinese University of Hong Kong, and the Chinese National Human Genome Center at Shanghai. Figure 1 | Number of SNPs in dbSNP over time. The cumulative number of non-redundant SNPs (each mapped to a single location in the genome) is shown as a solid line, as well as the number of SNPs validated by genotyping (dotted line) and double-hit status (dashed line). Years are divided into quarters (Q1–Q4). ARTICLES NATURE|Vol 437|27 October 2005 1300
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有