EXTENDED PDF FORMAT avel Grants Available FENS Forum o Science Mapping Human Genetic Diversity in Asia The HUGO Pan-Asian SNP Consortium Science326,1541(2009) O:10.1126/ scIence.1177074 AAAS This copy is for your personal, non-commercial use only If you wish to distribute this article to others, you can order high-quality copies for your colleagues, clients, or customers by clicking here Permission to republish or repurpose articles or portions of articles can be obtained by following the guidelines he The following resources related to this article are available online at www.sciencemag.org(thisinformationiscurrentasofMarch23,2014) Updated information and services, including high-resolution figures, can be found in the online version of this article at http://www.sciencemag.org/content/326/5959/1541.full.html Supporting Online Material can be found at http://www.sciencemag.org/content/suppl/2009/12/10/326.5959.1541.dc1.html A list of selected additional articles on the science Web sites related to this article can be found at http://www.sciencemag.org/content/326/5959/1541.fullhtml#frelated This article cites 24 articles 5 of which can be accessed free http://www.sciencemag.org/content/326/5959/1541.fullhtmh#fref-List-1 article has been cited by 26 articles hosted by HighWire Press; see ://www.sciencemag.org/content/326/5959/1541.full.html#related-urls This article appears in the following subject collections Genetics http://www.sciencemag.org/cgi/collection/genetics Science(print ISSN 0036-8075: online ISSN 1095-9203)is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2009 by the American Association for the Advancement of Science; all rights reserved. The title Science is a egistered trademark of AAAs
DOI: 10.1126/science.1177074 Science 326, 1541 (2009); The HUGO Pan-Asian SNP Consortium Mapping Human Genetic Diversity in Asia This copy is for your personal, non-commercial use only. colleagues, clients, or customers by clicking here. If you wish to distribute this article to others, you can order high-quality copies for your following the guidelines here. Permission to republish or repurpose articles or portions of articles can be obtained by www.sciencemag.org (this information is current as of March 23, 2014 ): The following resources related to this article are available online at http://www.sciencemag.org/content/326/5959/1541.full.html version of this article at: Updated information and services, including high-resolution figures, can be found in the online http://www.sciencemag.org/content/suppl/2009/12/10/326.5959.1541.DC1.html Supporting Online Material can be found at: http://www.sciencemag.org/content/326/5959/1541.full.html#related found at: A list of selected additional articles on the Science Web sites related to this article can be http://www.sciencemag.org/content/326/5959/1541.full.html#ref-list-1 This article cites 24 articles, 5 of which can be accessed free: http://www.sciencemag.org/content/326/5959/1541.full.html#related-urls This article has been cited by 26 articles hosted by HighWire Press; see: http://www.sciencemag.org/cgi/collection/genetics Genetics This article appears in the following subject collections: registered trademark of AAAS. 2009 by the American Association for the Advancement of Science; all rights reserved. The title Science is a American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the on March 23, 2014 www.sciencemag.org Downloaded from
hybrid sterility involves both the unusual abun- References and notes 21. 1. E. Tomkiel, Genetica 109, 95(2000) dance and retention of OdsHmau protein in 1. E. Mayr, Systematics and the Ongin of the d. simulans testis. as well as an unusual ewpoint of a Zoologist ( Columbia Univ. 23. 0. Mihola. Z. Trachtulec ek, I C Schimenti J. Foret, Science 323, 373(2009) localization and possibly decondensation of the 2).A Coyne, H A Orr,Speciation (Sinauer N. Phadnis, H. A Orr, Science 323, 376(2009) D. simulans Y chromosome. We conclude on Sunderland, MA 2004). 25. K Sawamura. M. T Yamamoto, T. K Watanabe, Genetics the basis of these data that hybrid male sterility 3. C.C. Laurie, Genetics 147, 937(1997) 133.307(1993) is caused by a gain-of-function interaction be- 4. R. M. Kliman et al. Genetics 156, 1913(2000) tween OdsHmau and some component of the 5. C. T Ting, S. C. Tsaur, M. L. Wu, C. 1. Wu, Science 282, 27. N. ). Brideau et al, Science 314, 1292(2006). H. S. Malik, S. Henikoff, Cell 138, D. simulans Y chromosome heterochromatin, 6.S. Sun, C T. Ting, C. I. Wu, Science 305, 81(2004) 29. We thank C-l. Wu for the d. simulans fertile and sterile with this protein-DNA interaction representing 7. D. E Perez, C L Wu, Genetics 140, 201(1995). introgression lines; C. Ting for scientific discussions the Dobzhansky-Muller incompatibility and sharing data: G. Findlay for initial observations on Odsh shares similarities with the hybrid 134,261(1993) odsH cytology, and K. Ahmad, S. Biggins, N. Elde, S. Henikoff, N. Phadnis, T. Tsukiyama, and D. Vermaal sterility genes Prdm9 (or Meisetz) in mouse(23) 10. C.T. Ting et al, Proc. NatL. Acad. Sci. U.SA.101, 12232 comments ed by nih and Overdrive(Ovd) in Drosophila(24), all of (2004) training grant PHS NRSA 132 GM07270(]].B which encode proteins with putative DNA- 11. K Tabuchi, 5. Yoshikawa, Y Yuasa, K Sawamoto and grants from the Mathers binding domains. Satellite DNAs have also 12. M. Nei, 1. Zhang, Science 282, 1428(1998) NIH RO1-GM74108(HS M ) H.S.M. is an Early-Career Scientist of the Howard Hughes Medical Institute. been implicated in hybrid inviability, including 13. S Henikoff. K Ahmad, H.S. Malik, Science 293, 1098(2001) a pericentric satellite locus(Zhr)(25, 26) and a 14. S. Henikoff, H S Malik, Nature 417, 227(2002) ting Online Material gene encoding a heterochromatin-binding pro- 15. L Fishman, A Saunders,, Science 322, 1559(2008) tein(hr)(27). Thus, rapidly evolving repetitive 16. A Daner er al. Mold. ele e: oL 22. 52 DNA elements driven by genetic conflict may 18. M Ashburner, KG.Golic, RSHawley, Drosophila represent a major evolutionary force driving A Laboratory Handbook(Cold Spring Harbor Laboratory sequence divergence of speciation genes that would 10 September 2009: accepted 13 october 2009 ultimately result in hybrid incompatibilities 19. G. cendi 20. B. D. McKee, Curr. Top. Dev. Biol. 37, 77(1998) Include this information when citing this paper Mapping Human Genetic Diversity in Asia by geographe primit, b a knw n histoy or The HUGO Pan-Asian SNP Consortium*t admixture, or, especially at higher Ks, by mem- bership in a small population isolate. The results ia harbors substantial cultural and linguistic diversity, but the geographic structure of obtained using frappe(In), a maximum-likehhoodH based clustering analysis, showed a general con- genetic variation across the continent remains enigmatic. Here we report a large-scale survey of cordance with those of struCture utosomal variation from a broad geographic sample of Asian human populations. Our results Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene lor y.u26). These analyses show that most individ- show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography within a population share very similar an- cestry estimates at all Ks, an observation that is Southeast Asian(SEA) or Central-South Asian(CSA) populations and show clinal structure with viduals(fig. $27)based on an allele-sharing dis- haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were tance(12). Therefore, we proceeded to evaluate found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic the relationships among populations. A maximum source of EA populations likelihood tree of populations, based on 42, 793 SNPs whose ancestral states were known(Fig. S ontinental relationships, or fine-scale struc We first performed a Bayesian clustering pro- by 100% of bootstrap replicates. This pattern re- ture in Europe, have been published recently (1-8). cedure using the STRUCtUre algorithm (10) mained even after data from 51 additional popu- Asian(SEA) and East Asian(EA)populations by person is posited to derive from an arbitrary num- recent study were integrated into the tree fe We have extended this approach to Southeast to examine the ancestry of each individual. Each lations and 19, 934 commonly typed SNPs from sing the Affymetrix Gene Chip Human Mapping ber of ancestral populations, denoted by K. We ran S28). These observations suggest that SEA and 50K Xba Array. Stringently quality-controlled STRUCTURE from K=2 to K= 14 using both EA populations share a common origin. genotypes were obtained at 54, 794 autosomal the complete data set and SNP subsets to exclude STRUCTURElfrappe and principal compo- single-nucleotide polymorphisms(SNPs)in 1928 those in strong linkage disequilibrium(Fig. I and nents analyses(PCA)(13)(Figs. I and 2 and figs. individuals representing 73 Asian and two non- figs. SI to S13). AtK=2 andk =3, all SEA and SI to $26) identify as many as 10 main popula- Asian Hap Map populations(9). Apart from de- EA samples are united by predominant member- tion components. Each component corresponds veloping a general description of Asian population ship in a common cluster, with the other cluster(s) largely to one of the five major linguistic groups structure and its relation to geography, language, corresponding largely to Indo-European(E)and (Altaic, Sino-Tibetan/Tai-Kadai, Hmong-Mien, and demographic history, we concentrated on un- African(AF)ancestries. At K= 4, a component Austro-Asiatic, and Austronesian), three ethnic most frequently found in Negrito populations that categones(Philippine Negritos, Malaysian Negritos, All authors with their affiliations appear at the end of this is also shared by all SEA populations emerges, and East Indonesians/Melanesians)and two small uggesting a common SEA ancestry. Each value population isolates(the Bidayuh of Borneo and in007@gmail com(LJ): liue @gis. d-star. edu.sg (ET. ); of K beyond 4 introduces a new component that the hunter-gatherer Mlabri population of central elstadm@gisa-star. edu.sg (M.S. ); xushua@picb ac cn(Sx) tends to be associated with a group of popula- and northem Thailand). The STRUCTURe results www.sciencemag.orgScieNceVol32611DecembEr2009 1541
hybrid sterility involves both the unusual abundance and retention of OdsHmau protein in the D. simulans testis, as well as an unusual localization and possibly decondensation of the D. simulans Y chromosome. We conclude on the basis of these data that hybrid male sterility is caused by a gain-of-function interaction between OdsHmau and some component of the D. simulans Y chromosome heterochromatin, with this protein-DNA interaction representing the Dobzhansky-Muller incompatibility. OdsH shares similarities with the hybrid sterility genes Prdm9 (or Meisetz) in mouse (23) and Overdrive (Ovd) in Drosophila (24), all of which encode proteins with putative DNAbinding domains. Satellite DNAs have also been implicated in hybrid inviability, including a pericentric satellite locus (Zhr) (25, 26) and a gene encoding a heterochromatin-binding protein (Lhr) (27). Thus, rapidly evolving repetitive DNA elements driven by genetic conflict may represent a major evolutionary force driving sequence divergence of speciation genes that would ultimately result in hybrid incompatibilities (13, 14, 28). References and Notes 1. E. Mayr, Systematics and the Origin of Species from the Viewpoint of a Zoologist (Columbia Univ. Press, New York, 1942). 2. J. A. Coyne, H. A. Orr, Speciation (Sinauer Associates, Sunderland, MA, 2004). 3. C. C. Laurie, Genetics 147, 937 (1997). 4. R. M. Kliman et al., Genetics 156, 1913 (2000). 5. C. T. Ting, S. C. Tsaur, M. L. Wu, C. I. Wu, Science 282, 1501 (1998). 6. S. Sun, C. T. Ting, C. I. Wu, Science 305, 81 (2004). 7. D. E. Perez, C. I. Wu, Genetics 140, 201 (1995). 8. D. E. Perez, C. I. Wu, N. A. Johnson, M. L. Wu, Genetics 134, 261 (1993). 9. S. D. Hueber, I. Lohmann, Bioessays 30, 965 (2008). 10. C. T. Ting et al., Proc. Natl. Acad. Sci. U.S.A. 101, 12232 (2004). 11. K. Tabuchi, S. Yoshikawa, Y. Yuasa, K. Sawamoto, H. Okano, Neurosci. Lett. 257, 49 (1998). 12. M. Nei, J. Zhang, Science 282, 1428 (1998). 13. S. Henikoff, K. Ahmad, H. S. Malik, Science 293, 1098 (2001). 14. S. Henikoff, H. S. Malik, Nature 417, 227 (2002). 15. L. Fishman, A. Saunders, Science 322, 1559 (2008). 16. A. Daniel, Am. J. Med. Genet. 111, 450 (2002). 17. N. Aulner et al., Mol. Cell. Biol. 22, 1218 (2002). 18. M. Ashburner, K. G. Golic, R. S. Hawley, Drosophila: A Laboratory Handbook (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, ed. 2, 2005). 19. G. Cenci, S. Bonaccorsi, C. Pisano, F. Verni, M. Gatti, J. Cell Sci. 107, 3521 (1994). 20. B. D. McKee, Curr. Top. Dev. Biol. 37, 77 (1998). 21. J. E. Tomkiel, Genetica 109, 95 (2000). 22. J. Forejt, Trends Genet. 12, 412 (1996). 23. O. Mihola, Z. Trachtulec, C. Vlcek, J. C. Schimenti, J. Forejt, Science 323, 373 (2009). 24. N. Phadnis, H. A. Orr, Science 323, 376 (2009). 25. K. Sawamura, M. T. Yamamoto, T. K. Watanabe, Genetics 133, 307 (1993). 26. P. M. Ferree, D. A. Barbash, PLoS Biol. 7, e1000234 (2009). 27. N. J. Brideau et al., Science 314, 1292 (2006). 28. H. S. Malik, S. Henikoff, Cell 138, 1067 (2009). 29. We thank C-I. Wu for the D. simulans fertile and sterile introgression lines; C. Ting for scientific discussions and sharing data; G. Findlay for initial observations on OdsH cytology; and K. Ahmad, S. Biggins, N. Elde, S. Henikoff, N. Phadnis, T. Tsukiyama, and D. Vermaak for comments on the manuscript. Supported by NIH training grant PHS NRSA T32 GM07270 (J.J.B.), and grants from the Mathers foundation and NIH R01-GM74108 (H.S.M.). H.S.M. is an Early-Career Scientist of the Howard Hughes Medical Institute. Supporting Online Material www.sciencemag.org/cgi/content/full/1181756/DC1 Materials and Methods Figs. S1 to S8 References 10 September 2009; accepted 13 October 2009 Published online 22 October 2009; 10.1126/science.1181756 Include this information when citing this paper. Mapping Human Genetic Diversity in Asia The HUGO Pan-Asian SNP Consortium*† Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations. Several genome-wide studies of human genetic diversity focusing primarily on broad continental relationships, or fine-scale structure in Europe, have been published recently (1–8). We have extended this approach to Southeast Asian (SEA) and East Asian (EA) populations by using the Affymetrix GeneChip Human Mapping 50K Xba Array. Stringently quality-controlled genotypes were obtained at 54,794 autosomal single-nucleotide polymorphisms (SNPs) in 1928 individuals representing 73 Asian and two nonAsian HapMap populations (9). Apart from developing a general description of Asian population structure and its relation to geography, language, and demographic history, we concentrated on uncovering the geographic source(s) of EA and SEA populations. We first performed a Bayesian clustering procedure using the STRUCTURE algorithm (10) to examine the ancestry of each individual. Each person is posited to derive from an arbitrary number of ancestral populations, denoted by K. We ran STRUCTURE from K = 2 to K = 14 using both the complete data set and SNP subsets to exclude those in strong linkage disequilibrium (Fig. 1 and figs. S1 to S13). At K = 2 and K = 3, all SEA and EA samples are united by predominant membership in a common cluster, with the other cluster(s) corresponding largely to Indo-European (IE) and African (AF) ancestries. At K = 4, a component most frequently found in Negrito populations that is also shared by all SEA populations emerges, suggesting a common SEA ancestry. Each value of K beyond 4 introduces a new component that tends to be associated with a group of populations united by membership in a linguistic family, by geographic proximity, by a known history of admixture, or, especially at higher Ks, by membership in a small population isolate. The results obtained using frappe (11), a maximum-likelihood– based clustering analysis, showed a general concordance with those of STRUCTURE (figs. S14 to S26). These analyses show that most individuals within a population share very similar ancestry estimates at all Ks, an observation that is consistent also with a phylogeny relating individuals (fig. S27) based on an allele-sharing distance (12). Therefore, we proceeded to evaluate the relationships among populations. A maximumlikelihood tree of populations, based on 42,793 SNPs whose ancestral states were known (Fig. 1), showed that all the SEA and EA populations make up a monophyletic clade that is supported by 100% of bootstrap replicates. This pattern remained even after data from 51 additional populations and 19,934 commonly typed SNPs from a recent study were integrated into the tree (fig. S28). These observations suggest that SEA and EA populations share a common origin. STRUCTURE/frappe and principal components analyses (PCA) (13) (Figs. 1 and 2 and figs. S1 to S26) identify as many as 10 main population components. Each component corresponds largely to one of the five major linguistic groups (Altaic, Sino-Tibetan/Tai-Kadai, Hmong-Mien, Austro-Asiatic, and Austronesian), three ethnic categories (Philippine Negritos, Malaysian Negritos, and East Indonesians/Melanesians) and two small population isolates (the Bidayuh of Borneo and the hunter-gatherer Mlabri population of central and northern Thailand). The STRUCTURE results *All authors with their affiliations appear at the end of this paper. †To whom correspondence should be addressed. E-mail: ljin007@gmail.com (L.J.); liue@gis.a-star.edu.sg (E.T.L.); seielstadm@gis.a-star.edu.sg (M.S.); xushua@picb.ac.cn (S.X.) www.sciencemag.org SCIENCE VOL 326 11 DECEMBER 2009 1541 REPORTS
REPORTS and figs. SI to S13), population pl Mantel test confirms the correlation between lin- 0.005 with 10.0 (Fig. I and figs. S27 and S28), and PCa guistic and genetic affinities(R=0. 253: P<0.0001 identified eight (Fig. 2)all show that populations from the with 10,000 permutations), even after controlling and genetic affinities are inconsistent [Affymetrix same linguistic group tend to cluster together. a for geography (partial correlation= 0.136: P Melanesian(AX-ME), Malaysia-Jehai (MY-JH) n Latitude Longitude Ethnicity K=14 10 ean +mn32 ee ngapoi Cantonese 20 1 al kort tai keri一 Tai Yuan Tai Yuan 2492 Plang ng NWI Lawa LAwa aland aren and 19 embark dongsi 四pPPPP Mamanwa negrito A旧 M工 Batak KaroBatak I ID-DY Minangkabau分 s0mm111题 Yynm MY-BD I MY-TM Malaysia Proto-Malay Temuan Negrito R#十m由 ndia origin Tamil Upper.casteBengali European English Fig. 1. Maximum-likelihood tree of 75 populations. a hypothetical most- population IDs except the four HapMap samples are denoted by four recent common ancestor(MRCA)composed of ancestral alleles as inferred characters. The first two letters indicate the country where the samples from the genotypes of one gorilla and 21 chimpanzees was used to root the were collected or(in the case of Affymetrix) genotyped, according to the tree. Branches with bootstrap values less than 50% were condensed. following convention: AX, Affymetrix; CN, China; ID, Indonesia; IN, India: Population identification numbers(IDs), sample collection locations with JP, Japan; KR, Korea; MY, Malaysia; PL, the Philippines: SG, Singapore; TH atitudes and longitudes, ethnicities, Language spoken, and size of pop- Thailand; and TW, Taiwan. The last two letters are unique IDs for the ulation samples are shown in the table adjacent to each branch in the tree. population. To the right of the table, an averaged graph of results from Linguistic groups are indicated with colors as shown in the legend. All STRUCTURE is shown for K= 14 1542 11DecemBer2009Vol326scIencEwww.sciencemag.org
(Fig. 1 and figs. S1 to S13), population phylogenies (Fig. 1 and figs. S27 and S28), and PCA results (Fig. 2) all show that populations from the same linguistic group tend to cluster together. A Mantel test confirms the correlation between linguistic and genetic affinities (R2 = 0.253; P < 0.0001 with 10,000 permutations), even after controlling for geography (partial correlation = 0.136; P < 0.005 with 10,000 permutations). Nevertheless, we identified eight population outliers whose linguistic and genetic affinities are inconsistent [AffymetrixMelanesian (AX-ME), Malaysia-Jehai (MY-JH) Fig. 1. Maximum-likelihood tree of 75 populations. A hypothetical mostrecent common ancestor (MRCA) composed of ancestral alleles as inferred from the genotypes of one gorilla and 21 chimpanzees was used to root the tree. Branches with bootstrap values less than 50% were condensed. Population identification numbers (IDs), sample collection locations with latitudes and longitudes, ethnicities, language spoken, and size of population samples are shown in the table adjacent to each branch in the tree. Linguistic groups are indicated with colors as shown in the legend. All population IDs except the four HapMap samples are denoted by four characters. The first two letters indicate the country where the samples were collected or (in the case of Affymetrix) genotyped, according to the following convention: AX, Affymetrix; CN, China; ID, Indonesia; IN, India; JP, Japan; KR, Korea; MY, Malaysia; PI, the Philippines; SG, Singapore; TH, Thailand; and TW, Taiwan. The last two letters are unique IDs for the population. To the right of the table, an averaged graph of results from STRUCTURE is shown for K = 14. 1542 11 DECEMBER 2009 VOL 326 SCIENCE www.sciencemag.org REPORTS
(Negrito), Malaysia-Kensiu(MY-KS)(Negrito), practice endogamy based on linguistic, cultural, European-speaking populations(Fig. 1 and figs. Thailand-Mon(TH-MO), Thailand-Karen(TH- and ethnic information. In fact, most popula- SI to $26) KA), China-Jinuo(CN-JN), India-Spiti (IN-TB), tions studied, even at lower Ks, show evidence The geographic source(s) contributing to EA and China-Uyghur(CN-UG); see table $3).These of admixture in the STRUCtURe analyses For populations have long been debated. One hypoth linguistic outliers tend to chuster with their geo- example, the Han Chinese have grown to be- esis suggests that all SEA and Ea populations graphic neighbors or [especially evident in the come the largest ethnic group today in a de- derive primarily from a single initial migration, principal component(PC)plots of Fig. 2]occupy mographic expansion that has occurred mostly which entered the continent along a southern an intermediate position between their geographic within historical times. STRUCTURE reveals largely coastal route(19, 20). Another hypothesis neighbors and the more-distant members of their that the six Han Chinese population samples in argues for at independent migrations inguistic group. These pattens are consistent either our study show varying degrees of admixture into East Asia, first along a southem route, fol- with substantial recent admixture among the pop-(Fig. I and figs. SI to S26) between a northern lowed later by a series of migrations along a more ulations(14-16), a history of language replacement uster and a Sino-Tibetan/Tai-Kadai northem route that served to bridge European and (7), or uncertainties in the linguistic classifications which most frequently appears in the EA populations, but with little contribution to themselves(for example, the controversial Altaic groups sampled from southern China populations in Southeast Asia(20). The topology family, which groups Korean and Japanese with and northern Thailand. Finally, most of the of a maximum-likelihood tree(Fig. 1 and fig. Uyghur). Indian populations showed evidence of shared S28)displays a largely south-to-north ordering of Considerable gene flow Asian pop. ancestry with European populations, which is the populations, and a plot of the first two PCs lations was observed among subpopulations in consistent with the recent observations (18)and (Fig. 2)similarly orients most populations accord- these clusters, including those believed to our understanding of the expansion of Indo- ing to their geographic coordinates. The average B CEU CN-UG B IN-TB 0.02 :; 0.04 -0.04 TH-MA o East Southeast Asian 岁 e Han Chinese -0.1 40.06· Hmong-Mien Austro-Asiatic aysia Negritos 0.060.040.0200.020.040.06 40.06 -0.04 4.02 Pc2(240%) Pc1(360%) .02 0.04 AHHTA Austro-Asiatic ·mMlM 浮0.02 Philippine Negrito 0.02 0.06 0.04 Hmong-Mien MY-BD 0.03 0.07 0.03 0.05 Pc2(0.80 Pc2(0.53%) Fig. 2. Analysis of the first two PCs. (A)1928 individuals representing all 75 CN-UG, TH-MA, AX-ME, and Negritos from Malaysia).(D)1235 individuals populations. (B)1868 individuals representing 74 populations (excluding representing 44 populations(excluding Philippine Negritos, Pl-MA, and East YRD). (C)1471 individuals representing 58 populations (excluding all Indians, Indonesians) www.sciencemag.orgScieNceVol32611DecemBer2009 1543
(Negrito), Malaysia-Kensiu (MY-KS) (Negrito), Thailand-Mon (TH-MO), Thailand-Karen (THKA), China-Jinuo (CN-JN), India-Spiti (IN-TB), and China-Uyghur (CN-UG); see table S3]. These linguistic outliers tend to cluster with their geographic neighbors or [especially evident in the principal component (PC) plots of Fig. 2] occupy an intermediate position between their geographic neighbors and the more-distant members of their linguistic group. These patterns are consistent either with substantial recent admixture among the populations (14–16), a history of language replacement (17), or uncertainties in the linguistic classifications themselves (for example, the controversial Altaic family, which groups Korean and Japanese with Uyghur). Considerable gene flow among Asian populations was observed among subpopulations in these clusters, including those groups believed to practice endogamy based on linguistic, cultural, and ethnic information. In fact, most populations studied, even at lower Ks, show evidence of admixture in the STRUCTURE analyses. For example, the Han Chinese have grown to become the largest ethnic group today in a demographic expansion that has occurred mostly within historical times. STRUCTURE reveals that the six Han Chinese population samples in our study show varying degrees of admixture (Fig. 1 and figs. S1 to S26) between a northern Altaic cluster and a Sino-Tibetan/Tai-Kadai cluster, which most frequently appears in the ethnic groups sampled from southern China and northern Thailand. Finally, most of the Indian populations showed evidence of shared ancestry with European populations, which is consistent with the recent observations (18) and our understanding of the expansion of IndoEuropean–speaking populations (Fig. 1 and figs. S1 to S26). The geographic source(s) contributing to EA populations have long been debated. One hypothesis suggests that all SEA and EA populations derive primarily from a single initial migration, which entered the continent along a southern, largely coastal route (19, 20). Another hypothesis argues for at least two independent migrations into East Asia, first along a southern route, followed later by a series of migrations along a more northern route that served to bridge European and EA populations, but with little contribution to populations in Southeast Asia (20). The topology of a maximum-likelihood tree (Fig. 1 and fig. S28) displays a largely south-to-north ordering of the populations, and a plot of the first two PCs (Fig. 2) similarly orients most populations according to their geographic coordinates. The average Fig. 2. Analysis of the first two PCs. (A) 1928 individuals representing all 75 populations. (B) 1868 individuals representing 74 populations (excluding YRI). (C) 1471 individuals representing 58 populations (excluding all Indians, CN-UG, TH-MA, AX-ME, and Negritos from Malaysia). (D) 1235 individuals representing 44 populations (excluding Philippine Negritos, PI-MA, and East Indonesians). www.sciencemag.org SCIENCE VOL 326 11 DECEMBER 2009 1543 REPORTS
REPORTS value of the first PC is highly correlated with the frappe analyses, whereas the partial correlation (Fig. 3A)that haplotype di latitude at which the populations were sampled of the genetic and group indicator matrices was (R=0.79, P< 0.0001). Such a patten could 0.403(P< 0.0001) after controlling for geogra- with diversity decreasing from result simply from isolation-by-distance (IBD), as phy. The superior association between genetic lich is consistent with a loss of suggested by Ding et al.(21), although a recent distance and the group indicator matrix as mea- diversity as populations moved to higher lati- study failed to detect IBD in East Asia with data sured by the correlation coefficients suggests that tudes In estimating the contribution of SEA and from the human Genome Diversity Project(22). prehistorical population divergence is the favored Central-South Asian(CSA)haplotypes to the ea In an effort to distinguish between long-term model over IBD in explaining the data(24 gene pool by haplotype sharing analyses(16),we historical divergence and the effects of IBD, we conchusion is supported by simulation studies that found that more than 90% of haplotypes in EA applied partial and multiple Mantel tests to the also suggest that the observed pattems cannot be populations could be found in SEA and CSa pop- data(23) [see supporting online material (SOM) explained by simple IBD effects alone(see SOM ulations, of which about 50% were found in SEA text for details]. The primary approach was to text for details). and Ea only and 5% found in CSA only(Fig. 3B, ascertain the differential correlation between To further refine the analysis, we looked to see also SOM text). Phylogenetic analysis of pri- netic distance, geographical distance, and a group haplotype organization to limit the effect of fluc- vate haplotypes indicates greater similarity be- indicator matrix as an indication of prehistoric tuations in single-nucleotide determinations and tween EA and sEa populations relative to Ea and population divergence. The partial correlation co- to increase the resolution around genetic diversity. CSa populations(Fig. 3C). These observations efficient of genetic and geographic distances was The IBD model predicts a correlation of genetic suggest that the geographic source(s ) contributing 0. 228(P<0.0006), after controlling for the group distance with geographical distance but not ge- to Ea populations were mainly from SEA popula- indicator matrix (inferred from STRUCTURE/ netic diversity and geographic distance(24). By tions, with rather minor contributions from CSA, ① SEA private haplotypes CSA private haplotypes African haplotypes D American Pima 0010203040 Yakut Mongola East Asian KR-KR B CHB Han MY-KS Pl-IR PI-MW Negrito PI-AG PI-AE Melanesian Oceanian YKT N-CM USSI French Basque european allan BHSd35 Yoruba BiakaPygmy MbutiPygmy African San Fig. 3. Analysis of haplotype diversity, haplotype sharing, and population 0.0001).( B) Haplotype sharing analysis for EA populations and groups. YKT, phylogeny. (A) Haplotype diversity versus latitudes. Haplotypes were estimated Yakut; N-CM, Northern Chinese minorities; N-HAN, Northern Han Chinese; from combined data, and diversity was measured by heterozygosity of haplo- JP-KR, Japanese and Korean; S-HAN, Southern Han Chinese: S-CM, Southern types. HSa, b, c, and d and the corresponding colors show the percentages of Ea Chinese minorities; EA, East Asian. (C) Phylogeny of group private haplot oup haplotypes in each class: HSa, found in CSA only: HSb found in neither EA private haplotypes: haplotypes found only in EA samples; SEA private CSA nor SEA; HSc, found in both CSA and SEA: HSd, found in SEA only Latitudes haplotypes: haplotypes found only in SEA samples; CSA private haplotypes: fy axis) for groups were obtained from the center of sample collection locations. haplotypes found only in CSA samples; Shared haplotypes: haplotypes found Cirded numbers are as follows: 1, Indonesian; 2, Malay, 3, Philippine; 4, Thai; 5, in all EA, SEA, and CSA samples; African haplotypes were used as outgroup. (D) outhern Chinese minorities: 6, Southern Han Chinese; 7, Japanese and Korean; Maximum-likelihood tree of 29 populations. The tree is based on data from 8, Northern Han Chinese: 9, Northern Chinese minorities; and 10, Yakut. Haplo- 19,934 SNPs. Bootstrap values were based on 100 replicates. Only values on type heterozygosity of each group was estimated from 100-kb bins and taking splitting of African and non-African, European and Oceanian and Asian, and together all haplotypes within each group. R for the regression line is 0.91(P< Oceanian and Asian are shown 1544 11DecemBer2009Vol326scIencEwww.sciencemag.org
value of the first PC is highly correlated with the latitude at which the populations were sampled (R2 = 0.79, P < 0.0001). Such a pattern could result simply from isolation-by-distance (IBD), as suggested by Ding et al. (21), although a recent study failed to detect IBD in East Asia with data from the Human Genome Diversity Project (22). In an effort to distinguish between long-term historical divergence and the effects of IBD, we applied partial and multiple Mantel tests to the data (23) [see supporting online material (SOM) text for details]. The primary approach was to ascertain the differential correlation between genetic distance, geographical distance, and a group indicator matrix as an indication of prehistoric population divergence. The partial correlation coefficient of genetic and geographic distances was 0.228 (P < 0.0006), after controlling for the group indicator matrix (inferred from STRUCTURE/ frappe analyses), whereas the partial correlation of the genetic and group indicator matrices was 0.403 (P < 0.0001) after controlling for geography. The superior association between genetic distance and the group indicator matrix as measured by the correlation coefficients suggests that prehistorical population divergence is the favored model over IBD in explaining the data (24). This conclusion is supported by simulation studies that also suggest that the observed patterns cannot be explained by simple IBD effects alone (see SOM text for details). To further refine the analysis, we looked to haplotype organization to limit the effect of fluctuations in single-nucleotide determinations and to increase the resolution around genetic diversity. The IBD model predicts a correlation of genetic distance with geographical distance but not genetic diversity and geographic distance (24). By contrast, we found (Fig. 3A) that haplotype diversity is strongly correlated with latitude (R2 = 0.91, P < 0.0001), with diversity decreasing from south to north, which is consistent with a loss of diversity as populations moved to higher latitudes. In estimating the contribution of SEA and Central-South Asian (CSA) haplotypes to the EA gene pool by haplotype sharing analyses (16), we found that more than 90% of haplotypes in EA populations could be found in SEA and CSA populations, of which about 50% were found in SEA and EA only and 5% found in CSA only (Fig. 3B, see also SOM text). Phylogenetic analysis of private haplotypes indicates greater similarity between EA and SEA populations relative to EA and CSA populations (Fig. 3C). These observations suggest that the geographic source(s) contributing to EA populations were mainly from SEA populations, with rather minor contributions from CSA, Fig. 3. Analysis of haplotype diversity, haplotype sharing, and population phylogeny. (A) Haplotype diversity versus latitudes. Haplotypes were estimated from combined data, and diversity was measured by heterozygosity of haplotypes. HSa, b, c, and d and the corresponding colors show the percentages of EA group haplotypes in each class: HSa, found in CSA only; HSb, found in neither CSA nor SEA; HSc, found in both CSA and SEA; HSd, found in SEA only. Latitudes (y axis) for groups were obtained from the center of sample collection locations. Circled numbers are as follows: 1, Indonesian; 2, Malay; 3, Philippine; 4, Thai; 5, Southern Chinese minorities; 6, Southern Han Chinese; 7, Japanese and Korean; 8, Northern Han Chinese; 9, Northern Chinese minorities; and 10, Yakut. Haplotype heterozygosity of each group was estimated from 100-kb bins and taking together all haplotypes within each group. R2 for the regression line is 0.91 (P < 0.0001). (B) Haplotype sharing analysis for EA populations and groups. YKT, Yakut; N-CM, Northern Chinese minorities; N-HAN, Northern Han Chinese; JP-KR, Japanese and Korean; S-HAN, Southern Han Chinese; S-CM, Southern Chinese minorities; EA, East Asian. (C) Phylogeny of group private haplotypes. EA private haplotypes: haplotypes found only in EA samples; SEA private haplotypes: haplotypes found only in SEA samples; CSA private haplotypes: haplotypes found only in CSA samples; Shared haplotypes: haplotypes found in all EA, SEA, and CSA samples; African haplotypes were used as outgroup. (D) Maximum-likelihood tree of 29 populations. The tree is based on data from 19,934 SNPs. Bootstrap values were based on 100 replicates. Only values on splitting of African and non-African, European and Oceanian and Asian, and Oceanian and Asian are shown. 1544 11 DECEMBER 2009 VOL 326 SCIENCE www.sciencemag.org REPORTS
and that this clinal structure of EA populations 15. 5. Xu, W. Huang, J. Qian, L Jin, Am. I. Hum. Genet. 82, Singapore. "institute of Medical Biology, Chinese/ arose from prehistoric population divergence rather 883008) ce, Kunming, China. l than IBD or gene flow from CSA populations. 16.S.Xu, WJin, L Jin, MoL. BioL. Evol. 26, 2197(2009). National Institutes of Health, University of the Philippines Manila, On the basis of increased cultural, linguistic 17. L. Reid, i 25 Pedro Gil Street, Ermita Manila 1000, Philippines. Cen Information Biology and DNA Data Bank of Japan, National and genetic diversity, the origins of sEA popula- ruyter, Berlin, 1994)pp. 443-475 Institute of Genetics, Research Organization of Information and origins of those to their north. Notably, the Negritos 19. 1. Y Chu et al, roc. Nat. Acad. Sci. U.S.A 95, 11763 medicinal Information Research Cente, as40, Japan. "io- tions are thought to be more complex than the 18. Indian Genome Variation Consortium, GEnet. of the Philippines and Malaysia differ from 20. B su et al. Am. 1. Hum. Genet. 65. 1718(1999). ku, Tokyo 135-0064, Japan." National Engineering Center for neighboring populations in aspects of their phys- 21. Y. C. Ding et al, Proc. Natl. Acod. Sci. U.S.A 97, 14003 Biochip, at Shanghai, 151 Li Bing ical appearance, prompting intense speculation about models of human settlement in Southeast 22. A Manica, F. Prugnolle, F. Balloux, Hum.Genet. 118 niversiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Asia. The two-wave hypothesis, which suggests 23. M.P. elle..A. Diniz-Filho. Genet Mol Res. 4. 742 (2005). Genomics, Chinese National Human northerly migration originating in or near the Mid on,N1993) and Molecular Medicine, Tokai University School of Medicine, dle East, and spreading both toward Europe and 25. L. L Cavalli-Sforza, M W Feldman, Nat. Genet 33.26 i state Key d cont 143 Shimokasuya, lsehara-A Kanagawa-Pref A259-1193, Japar gineering and MOE Northeast Asia via Central Asia (25), has been sup- 26.GH ported by phylogenetic trees constructed from data Auton, D. Alush. PLos Genet. 4 e100007802008) on a limited number of protein markers(24, 25). 27. The entire consortium thanks all individuals wh The topology of our population trees, both with olunteered their DNA for this project. It is this and without the data from additional European collaboration between scientists and the public that is and Asian populations discussed in(1), is in- sential to progress in our field. All SNP data have been submitted to dbsNP with the submission handle pasni consistent with regard to this genetic similarity and will become accessible in dbsNP Build 131. See SoM of European and EA populations( Figs. I and text for a complete listing of all acknowledgments Francisco, CA 94080, USA.The Centre for Genomic Appli 3D). Instead, on the basis of variation at a large okhla Industrial Estate. New Delhi 110020. India. "500 mber of independent SNPs, we observed that The HUGO Pan-Asian SNP Consortium there is substantial genetic proximity of SEA and Mahmood Ameen Abdulla, , khlak Ahme chai university, Sangdo-5-dong 1-1, Dongjak-gu, Seoul 156-743 ulji University College of Medicine, 143-5 Yong-du- ong Bhak, Samir K EA populations(fig. S28). An identical pattern is Gayelline C. CalacaL, Amit Chaurasia, Chien-h dong Jung-gu, Dae-jeon City 301-832, Korea 2Department of uman Genetics, Graduate School of Medicine, University of on all of their 642.690 SNPS. Our forward-time Cultiongco-de la Paz o Maria Corazon A. De unga. logeny is not the result of ascertainment bias. Jinan gun Jung, Daorong Kangwanp stitute, DSO National Laboratories, 27 Medical Drive, 117510 levels of migration between populations after Hyung-lae ng iai gIulia C. Kennes'san Simulation studies also suggest that substantial Jatupod Kampuansa Singapore. Indian Statistical Institute(Kolkata)203 Barrad their initial separation are unlikely to distort the Yeon kim,. Kuchan Kimm, Ryosuke Kimura, Tomohiro opology of the phylogeny(SOM text). opolis Way, 03-01 Nanos, 138669, Singapore. Division of To unambiguously infer popul ulation histories Partha P Majumder, an Kumar Mandapat dences, School of Computer Engineering, Nanyang represents a considerable challenge(26). Although Marzuki, "Wayne Mitchell 3 Mitali Mukerji, Kenji Naritomi," Technological University, 50 Nanyang Avenue, 639798, Singa this study does not disprove a two-wave model of Chumpol Ngamphiw Norio ikawa Nao Nishida, Bermseok pore. Department of Medical Genetics, University of the and the accompanying simulation studies (figs. perdigon Maulde elvira phip p sa einen png roshi ik: opment Agena. I hal Soience and Technology Devel- S29 and S30)point toward a history that unites the Negrito and non-Negrito populations of Southeast Scaria, Mark Seielstad, t Mohd Ros Sidek, Amit Sinha, Selatan, 46150 Bandar Sunway, Selangor, Malaysia. RIKEN and East Asia via a single primary wave of entry of humans into the continent. Adrian Tan, Katsushi Tokunaga, Sissades Tongsima, Lilian P. istry, University of Hong Kong, 3/F Laboratory Block, Faculty of uam Wu, Huasheng Xiao, Shuhua Laboratory of Functional Genomics, Department of Medical in Yas shugart - Sook Yoo, Wer Genome Sciences graduate School of Frontier Sciences. Univer- References and notes 1. J.Z. Li et al. Science 319, 11 inato-ku, Tokyo 108-8639, Japan.Chinese Academy of 2. M. Kayser et al, Am. ).Hum. 194(2008) rtment of Molecular Medicine, Faculty of Medicine, Saiences-Max Planck Society Partner Institute for Computa- 3. N. A Rosenberg et al. Plos and the Department of Anthropology, Faculty of Arts and tional Biology, Shanghai Institutes of Biological Sciences, chi- 81(2002) Social Sciences, University of Malaya, Kuala Lumpur, 50603, nese Academy of Sciences, 320 Yueyang Rd, Shanghai 200031, 5.. Novembre et al., Nature 456, 98(2008). M. Nelis et al. PLos One 4, e5472(2009) Council for scientific and industrial Mall Road. Delhi tal Health, National Institutes of Health, 6001 Exeautive Bou- C Tian ef al. PLos Genet. 4. e4(2008) 10007,India. Mahidol University, Salaya Campus, 25/25 levar, Bethesda, MD 20892 USA. Research Institute of 8.0. Lao et al,dur.Bi.18.1241(2008) M3, Puttamonthon 4 Road, Puttamonthon, Nakornpathom Personalized Health Sciences, Health Sciences University of 9. The International HapMap Consortium, Nature 426, 789 73170, Thailand. Biostatistics and Informatics Laboratory, Hokaido, Tobetsu 061-0293, Japan. 10. ).K Pritchard, M. Stephens, P. Donnelly, Genetics 155. e Institute, National Center for Genetic Engwani 12120, upporting Online Material 945(2000) d.koreanbIoinformationCenter(kobio,KoreaRe-www.sciencemag.org/cgi/content/full/326/5959/1541/dc1 11. H Tang, ).Peng, P. Wang, N ]. Risch, Genet. Epidemiol. search Institute of Bioscience and Biotechnology (KRIBB), 111 Materials and Methods Gwahangno, Yuseong-gu, Deajeon 305-806, Korea. DNA Anal SoM 12.]. L. Mountain, L. L. Cavalli-Sforza, Am. ). Hum. Genet. Laboratory, Natural Sciences Research Institute, University Figs. S1 to $38 61,705(1997) opines, Diliman, Quezon City 1101, Philippin 13. N. Patterson, A. L Price. D. Reich. PLos Genet. 2. e190 nstitute of Biomedical Sciences. Academia Sinica. 128 Sec 2 Academia Road Nangang, Taipei City 115, Taiwan. Genome 1 June 2009: accepted 13 October 2009 14. 5. Xu, L Jin, Am. /. Hum. Genet. 83, 322(2008 nstitute of Singapore, 60 Biopolis Street 02-01 138672, 10 1126/science. 1177074 www.sciencemag.orgScieNceVol32611DecembEr2009 1545
and that this clinal structure of EA populations arose from prehistoric population divergence rather than IBD or gene flow from CSA populations. On the basis of increased cultural, linguistic, and genetic diversity, the origins of SEA populations are thought to be more complex than the origins of those to their north. Notably, the Negritos of the Philippines and Malaysia differ from neighboring populations in aspects of their physical appearance, prompting intense speculation about models of human settlement in Southeast Asia. The two-wave hypothesis, which suggests that ancestral Negrito populations settled in Southeast Asia, Australia, and Oceania before a more northerly migration originating in or near the Middle East, and spreading both toward Europe and Northeast Asia via Central Asia (25), has been supported by phylogenetic trees constructed from data on a limited number of protein markers (24, 25). The topology of our population trees, both with and without the data from additional European and Asian populations discussed in (1), is inconsistent with regard to this genetic similarity of European and EA populations (Figs. 1 and 3D). Instead, on the basis of variation at a large number of independent SNPs, we observed that there is substantial genetic proximity of SEA and EA populations (fig. S28). An identical pattern is seen in the population tree of Li et al. (1) based on all of their 642,690 SNPs. Our forward-time simulation results under extreme ascertainment scenarios (SOM text) show that the observed phylogeny is not the result of ascertainment bias. Simulation studies also suggest that substantial levels of migration between populations after their initial separation are unlikely to distort the topology of the phylogeny (SOM text). To unambiguously infer population histories represents a considerable challenge (26). Although this study does not disprove a two-wave model of migration, the evidence from our autosomal data and the accompanying simulation studies (figs. S29 and S30) point toward a history that unites the Negrito and non-Negrito populations of Southeast and East Asia via a single primary wave of entry of humans into the continent. References and Notes 1. J. Z. Li et al., Science 319, 1100 (2008). 2. M. Kayser et al., Am. J. Hum. Genet. 82, 194 (2008). 3. N. A. Rosenberg et al., PLoS Genet. 1, e70 (2005). 4. N. A. Rosenberg et al., Science 298, 2381 (2002). 5. J. Novembre et al., Nature 456, 98 (2008). 6. M. Nelis et al., PLoS One 4, e5472 (2009). 7. C. Tian et al., PLoS Genet. 4, e4 (2008). 8. O. Lao et al., Curr. Biol. 18, 1241 (2008). 9. The International HapMap Consortium, Nature 426, 789 (2003). 10. J. K. Pritchard, M. Stephens, P. Donnelly, Genetics 155, 945 (2000). 11. H. Tang, J. Peng, P. Wang, N. J. Risch, Genet. Epidemiol. 28, 289 (2005). 12. J. L. Mountain, L. L. Cavalli-Sforza, Am. J. Hum. Genet. 61, 705 (1997). 13. N. Patterson, A. L. Price, D. Reich, PLoS Genet. 2, e190 (2006). 14. S. Xu, L. Jin, Am. J. Hum. Genet. 83, 322 (2008). 15. S. Xu, W. Huang, J. Qian, L. Jin, Am. J. Hum. Genet. 82, 883 (2008). 16. S. Xu, W. Jin, L. Jin, Mol. Biol. Evol. 26, 2197 (2009). 17. L. Reid, in Language Contact and Change in the Austronesian World. T. Dutton, T. Tryon, Eds. (Mouton de Gruyter, Berlin, 1994) pp. 443–475. 18. Indian Genome Variation Consortium, J. Genet. 87, 3 (2008). 19. J. Y. Chu et al., Proc. Natl. Acad. Sci. U.S.A. 95, 11763 (1998). 20. B. Su et al., Am. J. Hum. Genet. 65, 1718 (1999). 21. Y. C. Ding et al., Proc. Natl. Acad. Sci. U.S.A. 97, 14003 (2000). 22. A. Manica, F. Prugnolle, F. Balloux, Hum. Genet. 118, 366 (2005). 23. M. P. Telles, J. A. Diniz-Filho, Genet. Mol. Res. 4, 742 (2005). 24. L. L. Cavalli-Sforza, P. Menozzi, A. Piazza, The History and Geography of Human Genes (Princeton Univ. Press, Princeton, NJ, 1993). 25. L. L. Cavalli-Sforza, M. W. Feldman, Nat. Genet. 33, 266 (2003). 26. G. Hellenthal, A. Auton, D. Falush, PLoS Genet. 4, e1000078 (2008). 27. The entire consortium thanks all individuals who volunteered their DNA for this project. It is this collaboration between scientists and the public that is essential to progress in our field. All SNP data have been submitted to dbSNP with the submission handle PASNPI and will become accessible in dbSNP Build 131. See SOM text for a complete listing of all acknowledgments. The HUGO Pan-Asian SNP Consortium Mahmood Ameen Abdulla,1 Ikhlak Ahmed,2 Anunchai Assawamakin,3,4 Jong Bhak,5 Samir K. Brahmachari,2 Gayvelline C. Calacal,6 Amit Chaurasia,2 Chien-Hsiun Chen,7 Jieming Chen,8 Yuan-Tsong Chen,7 Jiayou Chu,9 Eva Maria C. Cutiongco-de la Paz,10Maria Corazon A. De Ungria,6 Frederick C. Delfin,6 Juli Edo,1 Suthat Fuchareon,3 Ho Ghang,5 Takashi Gojobori,11,12 Junsong Han,13 Sheng-Feng Ho,7 Boon Peng Hoh,14 Wei Huang,15 Hidetoshi Inoko,16 Pankaj Jha,2 Timothy A. Jinam,1 Li Jin,17,38† Jongsun Jung,18 Daoroong Kangwanpong,19 Jatupol Kampuansai,19 Giulia C. Kennedy,20,21 Preeti Khurana,22 Hyung-Lae Kim,18 Kwangjoong Kim,18 Sangsoo Kim,23 WooYeon Kim,5 Kuchan Kimm,24 Ryosuke Kimura,25 Tomohiro Koike,11 Supasak Kulawonganunchai,4 Vikrant Kumar,8 Poh San Lai,26,27 Jong-Young Lee,18 Sunghoon Lee,5 Edison T. Liu,8 † Partha P. Majumder,28 Kiran Kumar Mandapati,22 Sangkot Marzuki,29 Wayne Mitchell,30,31Mitali Mukerji,2 Kenji Naritomi,32 Chumpol Ngamphiw,4Norio Niikawa,40Nao Nishida,25 Bermseok Oh,18 Sangho Oh,5 Jun Ohashi,25 Akira Oka,16 Rick Ong,8 Carmencita D. Padilla,10 Prasit Palittapongarnpim,33 Henry B. Perdigon,6 Maude Elvira Phipps,1,34 Eileen Png,8 Yoshiyuki Sakaki,35 Jazelyn M. Salvador,6 Yuliana Sandraling,29 Vinod Scaria,2 Mark Seielstad,8 † Mohd Ros Sidek,14 Amit Sinha,2 Metawee Srikummool,19 Herawati Sudoyo,29 Sumio Sugano,37 Helena Suryadi,29 Yoshiyuki Suzuki,11 Kristina A. Tabbada,6 Adrian Tan,8 Katsushi Tokunaga,25 Sissades Tongsima,4 Lilian P. Villamor,6 Eric Wang,20,21 Ying Wang,15 Haifeng Wang,15 JerYuarn Wu,7 Huasheng Xiao,13 Shuhua Xu,38† Jin Ok Yang,5 Yin Yao Shugart,39 Hyang-Sook Yoo,5 Wentao Yuan,15 Guoping Zhao,15 Bin Alwi Zilfalil,14 Indian Genome Variation Consortium2 1 Department of Molecular Medicine, Faculty of Medicine, and the Department of Anthropology, Faculty of Arts and Social Sciences, University of Malaya, Kuala Lumpur, 50603, Malaysia. 2 Institute of Genomics and Integrative Biology, Council for Scientific and Industrial Research, Mall Road, Delhi 110007, India. 3 Mahidol University, Salaya Campus, 25/25 M. 3, Puttamonthon 4 Road, Puttamonthon, Nakornpathom 73170, Thailand. 4 Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology, Thailand Science Park, Pathumtani 12120, Thailand. 5 Korean BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Deajeon 305-806, Korea. 6 DNA Analysis Laboratory, Natural Sciences Research Institute, University of the Philippines, Diliman, Quezon City 1101, Philippines. 7 Institute of Biomedical Sciences, Academia Sinica, 128 Sec 2 Academia Road Nangang, Taipei City 115, Taiwan. 8 Genome Institute of Singapore, 60 Biopolis Street 02-01, 138672, Singapore. 9 Institute of Medical Biology, Chinese Academy of Medical Science, Kunming, China. 10Institute of Human Genetics, National Institutes of Health, University of the Philippines Manila, 625 Pedro Gil Street, Ermita Manila 1000, Philippines. 11Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Shizuoka 411-8540, Japan. 12Biomedicinal Information Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Kotoku, Tokyo 135-0064, Japan. 13National Engineering Center for Biochip at Shanghai, 151 Li Bing Road, Shanghai 201203, China. 14Human Genome Center, School of Medical Sciences, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia. 15MOST-Shanghai Laboratory of Disease and Health Genomics, Chinese National Human Genome Center Shanghai, 250 Bi Bo Road, Shanghai 201203, China. 16Department of Molecular Life Science Division of Molecular Medical Science and Molecular Medicine, Tokai University School of Medicine, 143 Shimokasuya, Isehara-A Kanagawa-Pref A259-1193, Japan. 17State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China. 18Korea National Institute of Health, 194, Tongil-Lo, Eunpyung-Gu, Seoul, 122-701, Korea. 19Department of Biology, Faculty of Science, Chiang Mai University, 239 Huay Kaew Road, Chiang Mai 50202, Thailand. 20Genomics Collaborations, Affymetrix, 3420 Central Expressway, Santa Clara, CA 95051, USA. 21Veracyte, 7000 Shoreline Court, Suite 250, South San Francisco, CA 94080, USA. 22The Centre for Genomic Applications (an IGIB-IMM Collaboration), 254 Ground Floor, Phase III Okhla Industrial Estate, New Delhi 110020, India. 23Soongsil University, Sangdo-5-dong 1-1, Dongjak-gu, Seoul 156-743, Korea. 24Eulji University College of Medicine, 143-5 Yong-dudong Jung-gu, Dae-jeon City 301-832, Korea. 25Department of Human Genetics, Graduate School of Medicine, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. 26Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, National University Hospital, 5 Lower Kent Ridge Road, 119074, Singapore. 27Population Genetics Lab, Defence Medical and Environmental Research Institute, DSO National Laboratories, 27 Medical Drive, 117510, Singapore. 28Indian Statistical Institute (Kolkata) 203 Barrackpore Trunk Road, Kolkata 700108, India. 29Eijkman Institute for Molecular Biology, Jl. Diponegoro 69, Jakarta 10430, Indonesia. 30Informatics Experimental Therapeutic Centre, 31 Biopolis Way, 03-01 Nanos, 138669, Singapore. 31Division of Information Sciences, School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore. 32Department of Medical Genetics, University of the Ryukyus Faculty of Medicine, Nishihara, 207 Uehara, Okinawa 903-0215, Japan. 33National Science and Technology Development Agency, 111 Thailand Science Park, Pathumtani 12120, Thailand. 34Monash University (Sunway Campus), Jalan Lagoon Selatan, 46150 Bandar Sunway, Selangor, Malaysia. 35RIKEN Genomic Sciences Center, W502, 1-7-22 Suehiro-cho, Tsurumiku, Yokohama 230-0045, Japan. 36Department of Biochemistry, University of Hong Kong, 3/F Laboratory Block, Faculty of Medicine Building, 21 Sasson Road, Pokfulam, Hong Kong. 37Laboratory of Functional Genomics, Department of Medical Genome Sciences Graduate School of Frontier Sciences, University of Tokyo (Shirokanedai Laboratory), 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan. 38Chinese Academy of Sciences-Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Rd., Shanghai 200031, China. 39Genomic Research Branch, National Institute of Mental Health, National Institutes of Health, 6001 Executive Boulevard, Bethesda, MD 20892 USA. 40Research Institute of Personalized Health Sciences, Health Sciences University of Hokkaido, Tobetsu 061-0293, Japan. Supporting Online Material www.sciencemag.org/cgi/content/full/326/5959/1541/DC1 Materials and Methods SOM Text Figs. S1 to S38 Tables S1 to S4 1 June 2009; accepted 13 October 2009 10.1126/science.1177074 www.sciencemag.org SCIENCE VOL 326 11 DECEMBER 2009 1545 REPORTS