ARTICLE doi:10.1038/nature10842 Insights into hominid evolution 1 from the gorilla genome sequence Aylwyn Scally enY.Duthei LaDeana W.HillierGregoryE.Jordanan Goodhead Javier HerreroAsger Hobolth2 Tuuli Lappalainen,Thomas Mailund,Tomas Marques-Bonet Shane McCarthy',Stephen H.Montgomery PetraC.Schwalie,Y.Amy Tang,Michelle C.Ward,Yali Xue,Bryndis Yngvadottir Can Alkan Lars N.Andersen, Qasim Ayub',Edward V.Ball,Kathryn Beal4,Brenda J.Bradley,Yuan Chen',Chris M.Clee,Stephen Fitzgerald4, Tina A.Graves4,Yong Gu,Paul Heath',Andreas Hegerl5,Emre Karakoc3,Anja Kolb-Kokocinski,Gavin K.Laird, Gerton Lunter16,Stephen Meader15,Matthew Mort2,James C.Mullikin7,Kasper Munch2,Timothy D.O'Connors Andrew D.Phillips2,Javier Prado-Martinez,Anthony S.Rogers,Saba Sajjadian3,Dominic Schmidt10,Katy Shaw2 JaredT.Simpson Peter D.Stenson2,Daniel .Turner Linda VigilantsAibertJ Vilella,Weldon WhitenerBaoliZhu David N.Cooper Pieter de Jong Emmanouil T.Dermitzakis,Evan E.Eichler Paul Flicek,Nick Goldman Nicholas I.Mundy8,Zemin Ning',Duncan T.Odom.9.10,Chris P.Ponting'5,Michael A.Quail',Oliver A.Ryder20 Stephen M.Searle,Wesley C.Warren14,Richard K.Wilson14,Mikkel H.Schierup2,Jane Rogers Chris Tyler-Smith' Richard Durbin Gorillas are humans'closest living relatives after chimpanzees,and are of comparable importance for the study of human origins and evolution.Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera.We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago.In 30%ofthe genome,gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes,indicating pervasive selection throughout great ape evolution,and has functional consequences in gene expression.A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla,human and chimpanzee lineages,and evidence for parallel acceleration, particularly of genes involved in hearing.We also compare the western and eastern gorilla species,estimating an average sequence divergence time 1.75 million years ago,but with evidence for more recent genetic exchange and a population bottleneck in the eastern species.The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. Humans share many elements of their anatomy and physiology with remains a challenging computational problem.We generated a both gorillas and chimpanzees,and our similarity to these species was reference assembly from a single female western lowland gorilla emphasized by Darwin and Huxley in the first evolutionary accounts of (Gorilla gorillagorilla)named Kamilah,using 5.4X 10 base pairs human origins'.Molecular studies confirmed that we are closer to the (5.4 Gbp)of capillary sequence combined with 166.8 Gbp of African apes than to orang-utans,and on average closer to chimpanzees Illumina read pairs (Methods Summary).Genes,transcripts and pre- than gorillas?(Fig.1a).Subsequent analyses have explored functional dictions of gene orthologues and paralogues were annotated by differences between the great apes and their relevance to human evolu- Ensembls,and additional analysis found evidence for 498 functional tion,assisted recently by reference genome sequences for chimpanzee long (>200-bp)intergenic RNA transcripts.Table 1 summarizes the and orang-utan'.Here we provide a reference assembly and initial assembly and annotation properties.An assessment of assembly analysis of the gorilla genome sequence,establishing a foundation for quality using finished fosmid sequences found that typical(N50;see the further study of great ape evolution and genetics. Table 1 for definition)stretches of error-free sequence are 7.2 kbp in Recent technological developments have substantially reduced the length,with errors tending to be clustered in repetitive regions. costs of sequencing,but the assembly of a whole vertebrate genome Outside repeat masked regions and away from contig ends,the total WellcomeTrust Sanger Institute,Wellcome Trust Genome Campus,Hinxton.2Bioinformatics Research Center,Aarhus University.C.F.Mallers Alle,8000 AarhusC.DenmarkDepartment 0pnomca0aocsaga8pih52me6otnSmnosea8gs2tona Catalonia,Spain.Institucio Catalana de RecercaiEstudis Avangats,ICREA08010 Barcelona,Spain.Departmentof Zoology,University ofCambridge,Downing Street Cambridge CB23EJ.UKUniversity of Cambridge,Department of Oncology,Hutchison/MRC Research Centre.Hills Road.Cambridge CB2OXZUK.Cancer Research UK Cambridge Research Institute,Li Ka Shing Centre,Robinson Way. Cambridge CB2ORE,UK.Howard Hughes Medical Institute,University of Washington,Seattle,Washington 20815-6789,USA.12Institute of Medical Genetics,Cardiff University,Heath Park,Cardiff CF14 4XN.UK.Department of Anthropology,Yale University,10 Sachem Street,New Haven,Connecticut 06511.USA 14The Genome Institute at Washington University,Washington University School of Medicine.Saint Louis,Missouri 63108,USA.15MRC Functional Genomics Unit,University of Oxford,Department of Physiology,Anatomy and Genetics,South Parks Road,Oxford OX1 3QX,UK 1Wellcome Trust Centre for Human Genetics,Roosevelt Drive,Oxford OX3 7BN,UKComparative Genomics Unit,Genome Technology Branch,National Human Genome Research Institute,National Institutes of Health,Bethesda,Maryland,20892-2152,USAMax Planck Institute for Evolutionary Anthropology.Primatology Department,Deutscher Platz 6,Leipzig 04103,Germany.Children's Hospital Oakland Research Institute,Oakland,Califomia 94609,USA.San Diego Zoo's Institute for Conservation Research,Escondido,California 92027,USA.+Present addresses:Institut des Sciences de l'Evolution. Montpellier(1.S.E.-M.),Universite de Montpellier ll-CC 064,34095 Montpellier Cedex 05,France (J.Y.D):Centre for Genomic Research,Institute of Integrative Biology,University of Liverpool,Crown Street. .UK(G):Biological Anthropology,University of Cambridge,Fitzwilliam Street Cambridge CB2 1QH,UK(B.Y.):EASIH,University of Cambridge,Addenbrooke's Hospital, Cambridge CB2 0QQ.UK(A.S.R);Oxford Nanopore Technologies,Edmund Cartwright House,4 Robert Robinson Avenue,Oxford OX4 4GA,UK(DJ.T.):Institute of Microbiology.Chinese Academy of Sciences,Datun Road,Chaoyang District,Beijing 100101,China(B.Z);The Genome Analysis Centre,Norwich Research Park,Norwich NR4 7UH,UK (J.R) 8 MARCH 2012 VOL 483 I NATURE 169 2012 Macmillan Publishers Limited.All rights reserved
ARTICLE doi:10.1038/nature10842 Insights into hominid evolution from the gorilla genome sequence Aylwyn Scally1 , Julien Y. Dutheil2 {, LaDeana W. Hillier3 , Gregory E. Jordan4 , Ian Goodhead1 {, Javier Herrero4 , Asger Hobolth2 , Tuuli Lappalainen5 , Thomas Mailund2 , Tomas Marques-Bonet3,6,7, Shane McCarthy1 , Stephen H. Montgomery8 , Petra C. Schwalie4 , Y. Amy Tang1 , Michelle C. Ward9,10, Yali Xue1 , Bryndis Yngvadottir1 {, Can Alkan3,11, Lars N. Andersen2 , Qasim Ayub1 , Edward V. Ball12, Kathryn Beal4 , Brenda J. Bradley8,13, Yuan Chen1 , Chris M. Clee1 , Stephen Fitzgerald4 , Tina A. Graves14, Yong Gu1 , Paul Heath1 , Andreas Heger15, Emre Karakoc3 , Anja Kolb-Kokocinski1 , Gavin K. Laird1 , Gerton Lunter16, Stephen Meader15, Matthew Mort12, James C. Mullikin17, Kasper Munch2 , Timothy D. O’Connor8 , Andrew D. Phillips12, Javier Prado-Martinez6 , Anthony S. Rogers1 {, Saba Sajjadian3 , Dominic Schmidt9,10, Katy Shaw12, Jared T. Simpson1 , Peter D. Stenson12, Daniel J. Turner1 {, Linda Vigilant18, Albert J. Vilella4 , Weldon Whitener1 , Baoli Zhu19{, David N. Cooper12, Pieter de Jong19, Emmanouil T. Dermitzakis5 , Evan E. Eichler3,11, Paul Flicek4 , Nick Goldman4 , Nicholas I. Mundy8 , Zemin Ning1 , Duncan T. Odom1,9,10, Chris P. Ponting15, Michael A. Quail1 , Oliver A. Ryder20, Stephen M. Searle1 , Wesley C. Warren14, Richard K. Wilson14, Mikkel H. Schierup2 , Jane Rogers1 {, Chris Tyler-Smith1 & Richard Durbin1 Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. Humans share many elements of their anatomy and physiology with both gorillas and chimpanzees, and our similarity to these species was emphasized by Darwin and Huxley in the first evolutionary accounts of human origins1 . Molecular studies confirmed that we are closer to the African apes than to orang-utans, and on average closer to chimpanzees than gorillas2 (Fig. 1a). Subsequent analyses have explored functional differences between the great apes and their relevance to human evolution, assisted recently by reference genome sequences for chimpanzee3 and orang-utan4 . Here we provide a reference assembly and initial analysis of the gorilla genome sequence, establishing a foundation for the further study of great ape evolution and genetics. Recent technological developments have substantially reduced the costs of sequencing, but the assembly of a whole vertebrate genome remains a challenging computational problem. We generated a reference assembly from a single female western lowland gorilla (Gorilla gorilla gorilla) named Kamilah, using 5.4 3 109 base pairs (5.4 Gbp) of capillary sequence combined with 166.8 Gbp of Illumina read pairs (Methods Summary). Genes, transcripts and predictions of gene orthologues and paralogues were annotated by Ensembl5 , and additional analysis found evidence for 498 functional long (.200-bp) intergenic RNA transcripts. Table 1 summarizes the assembly and annotation properties. An assessment of assembly quality using finished fosmid sequences found that typical (N50; see Table 1 for definition) stretches of error-free sequence are 7.2 kbp in length, with errors tending to be clustered in repetitive regions. Outside repeat masked regions and away from contig ends, the total 1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. 2 Bioinformatics Research Center, Aarhus University, C.F. Møllers Alle´ 8, 8000 Aarhus C, Denmark. 3 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA. 4 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK. 5 Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva 4, Switzerland. 6 Institut de Biologia Evolutiva (UPF-CSIC), 08003 Barcelona, Catalonia, Spain. 7 Institucio Catalana de Recerca i Estudis Avançats, ICREA, 08010 Barcelona, Spain. 8 Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK. 9 University of Cambridge, Department of Oncology, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 0XZ, UK. 10Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK. 11Howard Hughes Medical Institute, University of Washington, Seattle, Washington 20815-6789, USA. 12Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK. 13Department of Anthropology, Yale University, 10 Sachem Street, New Haven, Connecticut 06511, USA. 14The Genome Institute at Washington University, Washington University School of Medicine, Saint Louis, Missouri 63108, USA. 15MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, Oxford OX1 3QX, UK. 16Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK. 17Comparative Genomics Unit, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892-2152, USA. 18Max Planck Institute for Evolutionary Anthropology, Primatology Department, Deutscher Platz 6, Leipzig 04103, Germany. 19Children’s Hospital Oakland Research Institute, Oakland, California 94609, USA. 20San Diego Zoo’s Institute for Conservation Research, Escondido, California 92027, USA. {Present addresses: Institut des Sciences de l’E´ volution – Montpellier (I.S.E.-M.), Universite´ de Montpellier II – CC 064, 34095 Montpellier Cedex 05, France (J.Y.D); Centre for Genomic Research, Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK (I.G.); Division of Biological Anthropology, University of Cambridge, Fitzwilliam Street, Cambridge CB2 1QH, UK (B.Y.); EASIH, University of Cambridge, Addenbrooke’s Hospital, Cambridge CB2 0QQ, UK (A.S.R.); Oxford Nanopore Technologies, Edmund Cartwright House, 4 Robert Robinson Avenue, Oxford OX4 4GA, UK (D.J.T.); Institute of Microbiology, Chinese Academy of Sciences, Datun Road, Chaoyang District, Beijing 100101, China (B.Z.); The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK (J.R.). 8 MARCH 2012 | VOL 483 | NATURE | 169 ©2012 Macmillan Publishers Limited. All rights reserved
RESEARCH ARTICLE a 30- 1:Ardipithecus 2:Orrorin 3:Sahelanthropus 25 4:Chororapithecus 5:Sivapithecus duc CG 30 15 10 dHO 1.37% 1.75% 3.40% Ref.11 Ref.13 Ref.12(CEU) Ref.12 (YRI) 0.0 0.5 10 1.5 Mutation rate(10-9 yr-1) Figure 1 Speciation of the great apes.a,Phylogeny of the great ape family, hominid fossil species(key at top right):each has a vertical extent spanning the showing the speciation ofhuman(H),chimpanzee(C),gorilla(G)and orang-utan range of dates estimated for it in the literatureand a horizontal position at the (O).Horizontal lines indicate speciation times within the hominine subfamily and maximum mutation rate consistent both with its proposed phylogenetic position the sequence divergence time between human and orang-utan.Interior grey lines and the CoalHMM estimates (including some allowance for ancestral illustrate an example of incomplete lineage sorting at a particular genetic locus-in polymorphism in the case of Sivapithecus).The grey shaded region shows that an this case(((C,G),H),O)rather than (((H,C),G),O).Below are mean nucleotide increase in mutation rate going back in time can accommodate present-day divergences between human and the other great apes from the EPO alignment. estimates,fossil hypotheses,and a middle Miocene speciation for orang-utan. b,Great ape speciation and divergence times.Upper panel,solid lines show how Lower panel,estimates of the average mutation rate in present-day humans times for the HC and HCG speciation events estimated by CoalHMM vary with grey bars show 95%confidence intervals,with black lines at the means.Estimates average mutation rate,dashed lines show the corresponding average sequence were made by the 1000 Genomes Project for trios of European(CEU)and divergence times,as well as the HO sequence divergence.Blue blocks represent Yoruban African (YRI)ancestry rate of single-base and indel errors is0.13 per kbp.See Supplementary CoalHMM,to estimate the timescales and population sizes involved Information for further details. in the speciation of the hominines(African great apes;see Sup- We also collected less extensive sequence data for three other gorillas, plementary Table 1.1 for terminology),with orang-utan as an out- to enable a comparison of species within the Gorilla genus.Gorillas group (Supplementary Information). survive today only within several isolated and endangered populations Two issues need to be addressed in interpreting the results from whose evolutionary relationships are uncertain.In addition to Kamilah, CoalHMM(Supplementary Table 4.2).First,the results themselves our analysis included two western lowland gorillas,Kwanza(male)and are obtained in units of sequence divergence rather than years,and so EB(JC)(female),and one eastern lowland gorilla,Mukisi(male). need to be scaled by an appropriate yearly mutation rate.Second,as with any model,CoalHMM makes several simplifying assumptions Speciation of the great apes whose consequences we need to understand in the context of realistic We included the Kamilah assembly with human,chimpanzee(Pan demography.We discuss these issues in turn. troglodytes),orang-utan (Pongo abelii)and macaque (Macaca Using a rate of 10mutations per bp per year,derived from fossil mulatta)in a five-way whole-genome alignment using the Ensembl calibration of the human-macaque sequence divergence and as used EPO pipeline(Supplementary Table 3.2).Filtering out low-quality in previous calculations,CoalHMM's results would correspond to regions of the chimpanzee assembly and regions with many alignment speciation time estimates THc(for human-chimpanzee)and THcG gaps,we obtained 2.01 Gbp of 1:1:1:1 great ape orthologous alignment (for human-chimpanzee-gorilla)of 3.7 and 5.95 Myr ago,respec- blocks,to which we then applied a coalescent inference model, tively(Fig.1b).These dates are consistent with other recent molecular estimates?4,but are at variance with certain aspects of the fossil Table 1 Assembly and annotation statistics record,including several fossils which have been proposed-though Assembly Annotation not universally accepted-to be hominins,and therefore to postdate Total length 3,041,976,159bp Protein-coding genes 20.962 the human-chimpanzee split (Fig.1b).Indeed,the relationship Contigs 465.847 Pseudogenes 1.553 between molecular and fossil evidence has remained difficult to Total contig length 2,829.670,843bp RNA genes 6,701 Placed contig length 2,712,844,129bp 237,216 resolve despite the accumulation of genetic data.Direct estimates Gene exons Unplaced contig length 116.826,714bp Gene transcripts 35.727 of the per-generation mutation rate in modern human populations, Max.contig length 191,556bp lincRNA transcripts 49只 based on the incidence of disease-causing mutations"or sequencing Contig N50 11.8kbp of familial trios'23,indicate that a lower value of (0.5-0.6)X 10 Scaffolds 22.164 Max.scaffold length 10.247.101bp bpyr is plausible(based on average hominine generation times Scaffold N50 914kbp of 20-25 yr).This would give substantially older estimates of approxi- N50:50%of the genome is in fragments of this length or longer:lincRNA:long intergenic non-coding mately 6 and 10 Myr ago for THc and THcG,potentially in better RNA agreement with the fossil record. 170 NATURE I VOL 483 8 MARCH 2012 2012 Macmillan Publishers Limited.All rights reserved
rate of single-base and indel errors is 0.13 per kbp. See Supplementary Information for further details. We also collected less extensive sequence data for three other gorillas, to enable a comparison of species within the Gorilla genus. Gorillas survive today only within several isolated and endangered populations whose evolutionary relationships are uncertain. In addition to Kamilah, our analysis included two western lowland gorillas, Kwanza (male) and EB(JC) (female), and one eastern lowland gorilla, Mukisi (male). Speciation of the great apes We included the Kamilah assembly with human, chimpanzee (Pan troglodytes), orang-utan (Pongo abelii) and macaque (Macaca mulatta) in a five-way whole-genome alignment using the Ensembl EPO pipeline6 (Supplementary Table 3.2). Filtering out low-quality regions of the chimpanzee assembly and regions with many alignment gaps, we obtained 2.01 Gbp of 1:1:1:1 great ape orthologous alignment blocks, to which we then applied a coalescent inference model, CoalHMM, to estimate the timescales and population sizes involved in the speciation of the hominines (African great apes; see Supplementary Table 1.1 for terminology), with orang-utan as an outgroup (Supplementary Information). Two issues need to be addressed in interpreting the results from CoalHMM (Supplementary Table 4.2). First, the results themselves are obtained in units of sequence divergence rather than years, and so need to be scaled by an appropriate yearly mutation rate. Second, as with any model, CoalHMM makes several simplifying assumptions whose consequences we need to understand in the context of realistic demography. We discuss these issues in turn. Using a rate of 1029 mutations per bp per year, derived from fossil calibration of the human–macaque sequence divergence and as used in previous calculations, CoalHMM’s results would correspond to speciation time estimates THC (for human–chimpanzee) and THCG (for human–chimpanzee–gorilla) of 3.7 and 5.95 Myr ago, respectively (Fig. 1b). These dates are consistent with other recent molecular estimates7,8, but are at variance with certain aspects of the fossil record, including several fossils which have been proposed—though not universally accepted9 —to be hominins, and therefore to postdate the human–chimpanzee split (Fig. 1b). Indeed, the relationship between molecular and fossil evidence has remained difficult to resolve despite the accumulation of genetic data10. Direct estimates of the per-generation mutation rate in modern human populations, based on the incidence of disease-causing mutations11 or sequencing of familial trios12,13, indicate that a lower value of (0.5–0.6) 3 1029 bp21 yr21 is plausible (based on average hominine generation times of 20–25 yr). This would give substantially older estimates of approximately 6 and 10 Myr ago for THC and THCG, potentially in better agreement with the fossil record. dHO THCG THC H CG O 1.37% 1.75% 3.40% a Mutation rate (10−9 yr −1) Time (Myr ago) 0.0 0.5 1.0 1.5 0 5 10 15 20 25 30 PLIO− CENE MIOCENE OLIGOCENE dHO dHG dHC THCG THC Ref.11 Ref.13 Ref.12 (CEU) Ref.12 (YRI) 1 1 : Ardipithecus 2 2 : Orrorin 3 3 : Sahelanthropus 4 4 : Chororapithecus 5 5 : Sivapithecus b Figure 1 | Speciation of the great apes. a, Phylogeny of the great ape family, showing the speciation of human (H), chimpanzee (C), gorilla (G) and orang-utan (O). Horizontal lines indicate speciation times within the hominine subfamily and the sequence divergence time between human and orang-utan. Interior grey lines illustrate an example of incomplete lineage sorting at a particular geneticlocus—in this case (((C, G), H), O) rather than (((H, C), G), O). Below are mean nucleotide divergences between human and the other great apes from the EPO alignment. b, Great ape speciation and divergence times. Upper panel, solid lines show how times for the HC and HCG speciation events estimated by CoalHMM vary with average mutation rate; dashed lines show the corresponding average sequence divergence times, as well as the HO sequence divergence. Blue blocks represent hominid fossil species (key at top right): each has a vertical extent spanning the range of dates estimated for it in the literature9,50, and a horizontal position at the maximum mutation rate consistent both with its proposed phylogenetic position and the CoalHMM estimates (including some allowance for ancestral polymorphism in the case of Sivapithecus). The grey shaded region shows that an increase in mutation rate going back in time can accommodate present-day estimates, fossil hypotheses, and a middle Miocene speciation for orang-utan. Lower panel, estimates of the average mutation rate in present-day humans11–13; grey bars show 95% confidence intervals, with black lines at the means. Estimates were made by the 1000 Genomes Project for trios of European (CEU) and Yoruban African (YRI) ancestry. Table 1 | Assembly and annotation statistics Assembly Annotation Total length 3,041,976,159 bp Protein-coding genes 20,962 Contigs 465,847 Pseudogenes 1,553 Total contig length 2,829,670,843 bp RNA genes 6,701 Placed contig length 2,712,844,129 bp Gene exons 237,216 Unplaced contig length 116,826,714 bp Gene transcripts 35,727 Max. contig length 191,556 bp lincRNA transcripts 498 Contig N50 11.8 kbp Scaffolds 22,164 Max. scaffold length 10,247,101 bp Scaffold N50 914 kbp N50: 50% of the genome is in fragments of this length or longer; lincRNA: long intergenic non-coding RNA. RESEARCH ARTICLE 170 | NATURE | VOL 483 | 8 MARCH 2012 ©2012 Macmillan Publishers Limited. All rights reserved
ARTICLE RESEARCH However,this timetable for hominine speciation must also be recon- ciled with older events,such as the speciation of orang-utan,which is thought to have occurred no earlier than the Middle Miocene(12- 16 Myr ago),as fossil apes before that differ substantially from what we might expect of an early great ape4.This is possible if we allow for 0.2 mutation rates changing over time,with a mutation rate of around 1x 10bpyr in the common ancestor of great apes,decreasing 0.0 to lower values in all extant species(Fig.Ib).Comparable changes in 8910 12 16 8 21X ●nromosome mutation rate have been observed previously in primate evolution on larger timescales,including an approximately 30%branch length decrease in humans compared to baboons since their common 0.140 ancestor's.A decrease within the great apes is also a predicted con- sequence of the observed increase in body sizes over this time period and the association of small size with shorter generation times in other primates,and is consistent with deviations froma molecular clock seen 6 in sequence divergences of the great apes and macaque(Supplemen- tary Table 3.3).We discuss these and other constraints on estimates of 显0.120 great ape speciation times in the Supplementary Information.However we note that Sahelanthropus and Chororapithecus remain difficult to incorporate in this model,and can be accommodated as hominin and 0.110 gorillin genera only if most of the decrease occurred early in great ape -4×105-2×105 2×105 4×105 evolution. Physical distance to gene start/stop(bp) An alternative explanation for the apparent discrepancy in fossil Figure 2 Genome-wide incomplete lineage sorting (ILS)and selection and genetic dates (leaving aside the issue of whether fossil taxa have a,Variation in ILS.Each vertical blue line represents the fraction of ILS between been correctly placed)is that ancestral demography may have affected human,chimpanzee and gorilla estimated in a 1-Mbp region.Dashed black the genetic inferences.Certainly CoalHMM's model does not fit the lines show the average ILS across the autosomes and on X;the red line shows data in all respects.Perhaps most importantly,it assumes that ancestral the expected ILS on X,given the autosomal average and assuming neutral population sizes are constant in time and that no gene flow occurred evolution.b,Reduction in ILS around protein coding genes.The blue line between separated populations,approximations that may not hold in shows the mean rate of ILS sites normalized by mutation rate as a function of reality.Simulations (details in Supplementary Information)suggest distance upstream or downstream of the nearest gene(see Supplementary Information).The horizontal dashed line indicates the average value outside that an ancestral population bottleneck would have had limited impact 300 kbp from the nearest gene;error bars are s.e.m. on the inference of THc,its influence being captured largely by changes in the model's effective population size.Under conditions expected under a model of genome-wide neutral evolution (Sup- of genetic exchange between populations after the main separation plementary Fig.5.1).This variation reflects local differences in the of the chimpanzee and human lineages,the speciation time estimated ancestral effective population size Ne during the period between the by CoalHMM represents an average weighted by gene flow over the gorilla and chimpanzee speciationevents,most probably due to natural period of separation.This means in some cases it can be substantially selection reducing Ne and making ILS less likely.Within coding exons older than the date of most recent exchange.However it would only be mean ILS drops to 22%,and the suppression of ILS extends out to more recent than the speciation time inferred from fossils if there had several hundred kbp from coding genes,evident even in raw site been strong gene flow between populations after the development of patterns before any model inference (Fig.2b).An analysis of ILS sites derived fossil characteristics.To the extent that this is plausible,for in human segmental duplications suggests that assembly errors do not example as part of a non-allopatric speciation process,it constitutes an contribute significantly to this signal(Supplementary Information). alternative explanation for the dating discrepancy without requiring a We therefore attribute it to the effects of linkage around selected muta- change in mutation rate. tions,most probably in the form of background selection'7,observing In summary,although whole-genome comparisons can be strongly that it is greater around genes with lower ratios of non-synonymous to conclusive about the ordering of speciation events,the inability to synonymous mutation rates(dN/ds)(Supplementary Fig.8.4).Given observe past mutation rates means that the timing of events from that more than90%ofthe genome lies within 300 kbp ofa coding gene, genetic data remains uncertain.In our view,possible variation in and noting the similar phenomenon reported for recent human evolu- mutation rates allows hominid genomic data to be consistent with tion,this supports the suggestion that selection has affected almost all values of THc from 5.5 to 7 Myr ago and THcG from 8.5 to 12 Myr ago, of the genome throughout hominid evolution's with ancestral demographic structure potentially adding inherent In fitting the transitions between genealogies along the alignment, ambiguity to both events.Better resolution may come from further CoalHMM also estimates a regional recombination rate.This is primarily integrated analysis of fossil and genetic evidence. sensitive to ancestral crossover events before human-chimpanzee spe- ciation,yet despite the expectation of rapid turnover in recombination Incomplete lineage sorting and selection hotspots,averaged over 1-Mbp windows there is a good correlation The genealogy relating human(H),chimpanzee(C)and gorilla (G) with estimates from present-day crossovers in humans (R=0.49: varies between loci across the genome.CoalHMM explicitly models P<10;Supplementary Fig.5.5),consistent with the conservation this and infers the genealogy at each position:either the standard of recombination rates between humans and chimpanzees on the ((H,C),G)relationship or the alternatives ((H,G),C)or ((C,G),H), 1-Mbp scale1 which are the consequences of incomplete lineage sorting(ILS)in As expected,we see reduced ILS(Fig.2a)and human-chimpanzee the ancestral human-chimpanzee population.We can use the pattern sequence divergence dic(Supplementary Fig.6.1)on the X chromosome, of ILS to explore evolutionary forces during the human-chimpanzee- corresponding to a difference in N.between X and the autosomes within gorilla speciation period.Across the genome we find 30%of bases the ancestral human-chimpanzee population.Several factors can con- exhibiting ILS,with no significant difference between the number tribute to this difference2,notably the X chromosome's haploidy in sorting as ((H,G),C)and ((C,G),H).However,the fraction of ILS males,which reduces Ne on X by 0.75,enhances purifying selection in varies with respect to genomic position (Fig.2a)by more than males,and reduces the recombination rate,thereby increasing the 8 MARCH 2012 VOL 483 NATURE 171 2012 Macmillan Publishers Limited.All rights reserved
However, this timetable for hominine speciation must also be reconciled with older events, such as the speciation of orang-utan, which is thought to have occurred no earlier than the Middle Miocene (12– 16 Myr ago), as fossil apes before that differ substantially from what we might expect of an early great ape14. This is possible if we allow for mutation rates changing over time, with a mutation rate of around 13 1029 bp21 yr21 in the common ancestor of great apes, decreasing to lower values in all extant species (Fig. 1b). Comparable changes in mutation rate have been observed previously in primate evolution on larger timescales, including an approximately 30% branch length decrease in humans compared to baboons since their common ancestor15. A decrease within the great apes is also a predicted consequence of the observed increase in body sizes over this time period and the association of small size with shorter generation times in other primates16, and is consistent with deviations from a molecular clock seen in sequence divergences of the great apes and macaque (Supplementary Table 3.3). We discuss these and other constraints on estimates of great ape speciation times in the Supplementary Information. However we note that Sahelanthropus and Chororapithecus remain difficult to incorporate in this model, and can be accommodated as hominin and gorillin genera only if most of the decrease occurred early in great ape evolution. An alternative explanation for the apparent discrepancy in fossil and genetic dates (leaving aside the issue of whether fossil taxa have been correctly placed) is that ancestral demography may have affected the genetic inferences. Certainly CoalHMM’s model does not fit the data in all respects. Perhaps most importantly, it assumes that ancestral population sizes are constant in time and that no gene flow occurred between separated populations, approximations that may not hold in reality. Simulations (details in Supplementary Information) suggest that an ancestral population bottleneck would have had limited impact on the inference of THC, its influence being captured largely by changes in the model’s effective population size. Under conditions of genetic exchange between populations after the main separation of the chimpanzee and human lineages, the speciation time estimated by CoalHMM represents an average weighted by gene flow over the period of separation. This means in some cases it can be substantially older than the date of most recent exchange. However it would only be more recent than the speciation time inferred from fossils if there had been strong gene flow between populations after the development of derived fossil characteristics. To the extent that this is plausible, for example as part of a non-allopatric speciation process, it constitutes an alternative explanation for the dating discrepancy without requiring a change in mutation rate. In summary, although whole-genome comparisons can be strongly conclusive about the ordering of speciation events, the inability to observe past mutation rates means that the timing of events from genetic data remains uncertain. In our view, possible variation in mutation rates allows hominid genomic data to be consistent with values of THC from 5.5 to 7 Myr ago and THCG from 8.5 to 12 Myr ago, with ancestral demographic structure potentially adding inherent ambiguity to both events. Better resolution may come from further integrated analysis of fossil and genetic evidence. Incomplete lineage sorting and selection The genealogy relating human (H), chimpanzee (C) and gorilla (G) varies between loci across the genome. CoalHMM explicitly models this and infers the genealogy at each position: either the standard ((H,C),G) relationship or the alternatives ((H,G),C) or ((C,G),H), which are the consequences of incomplete lineage sorting (ILS) in the ancestral human–chimpanzee population. We can use the pattern of ILS to explore evolutionary forces during the human–chimpanzee– gorilla speciation period. Across the genome we find 30% of bases exhibiting ILS, with no significant difference between the number sorting as ((H,G),C) and ((C,G),H). However, the fraction of ILS varies with respect to genomic position (Fig. 2a) by more than expected under a model of genome-wide neutral evolution (Supplementary Fig. 5.1). This variation reflects local differences in the ancestral effective population size Ne during the period between the gorilla and chimpanzee speciation events, most probably due to natural selection reducing Ne and making ILS less likely. Within coding exons mean ILS drops to 22%, and the suppression of ILS extends out to several hundred kbp from coding genes, evident even in raw site patterns before any model inference (Fig. 2b). An analysis of ILS sites in human segmental duplications suggests that assembly errors do not contribute significantly to this signal (Supplementary Information). We therefore attribute it to the effects of linkage around selected mutations, most probably in the form of background selection17, observing that it is greater around genes with lower ratios of non-synonymous to synonymous mutation rates (dN/dS) (Supplementary Fig. 8.4). Given that more than 90% of the genome lies within 300 kbp of a coding gene, and noting the similar phenomenon reported for recent human evolution12, this supports the suggestion that selection has affected almost all of the genome throughout hominid evolution18. In fitting the transitions between genealogies along the alignment, CoalHMM also estimates a regional recombination rate. This is primarily sensitive to ancestral crossover events before human–chimpanzee speciation, yet despite the expectation of rapid turnover in recombination hotspots19, averaged over 1-Mbp windows there is a good correlation with estimates from present-day crossovers in humans (R 5 0.49; P , 10213; Supplementary Fig. 5.5), consistent with the conservation of recombination rates between humans and chimpanzees on the 1-Mbp scale19. As expected, we see reduced ILS (Fig. 2a) and human–chimpanzee sequence divergencedHC(Supplementary Fig.6.1) on theX chromosome, corresponding to a difference inNe between X and the autosomes within the ancestral human–chimpanzee population. Several factors can contribute to this difference20, notably the X chromosome’s haploidy in males, which reduces Ne on X by 0.75, enhances purifying selection in males, and reduces the recombination rate, thereby increasing the 0.0 0.2 0.4 ILS 1 2 3 4 5 6 7 8 9 10 12 14 16 18 21 X Chromosome −4 × 105 −2 × 105 0 2 × 105 4 × 105 0.110 0.120 0.130 0.140 Physical distance to gene start/stop (bp) Scaled rate of ILS sites a b Figure 2 | Genome-wide incomplete lineage sorting (ILS) and selection. a, Variation in ILS. Each vertical blue line represents the fraction of ILS between human, chimpanzee and gorilla estimated in a 1-Mbp region. Dashed black lines show the average ILS across the autosomes and on X; the red line shows the expected ILS on X, given the autosomal average and assuming neutral evolution. b, Reduction in ILS around protein coding genes. The blue line shows the mean rate of ILS sites normalized by mutation rate as a function of distance upstream or downstream of the nearest gene (see Supplementary Information). The horizontal dashed line indicates the average value outside 300 kbp from the nearest gene; error bars are s.e.m. ARTICLE RESEARCH 8 MARCH 2012 | VOL 483 | NATURE | 171 ©2012 Macmillan Publishers Limited. All rights reserved
RESEARCH ARTICLE effect of selection via linkage.However,sequence divergence is addi- Table 8.5).GPR98,which also shows significant evidence of positive tionally affected by the mutation rate,which is higher in males than in selection under the branch-sitetest(P=0.0081),is highly expressed in females,further reducing the relative divergence observed on X2. the developing central nervous system.The gene with the strongest Incorporating the ancestral N estimates from CoalHMM,we estimate evidence for acceleration along the branch leading to hominines a ratio of 0.87 +0.09 between average mutation rates on X and the is RNF213 (branch-site P4-bp gaps)and substitutions. 172 NATURE I VOL 483 8 MARCH 2012 2012 Macmillan Publishers Limited.All rights reserved
effect of selection via linkage. However, sequence divergence is additionally affected by the mutation rate, which is higher in males than in females, further reducing the relative divergence observed on X21. Incorporating the ancestral Ne estimates from CoalHMM, we estimate a ratio of 0.87 6 0.09 between average mutation rates on X and the autosomes on the human–chimpanzee lineage, corresponding to a male/female mutation rate bias a 5 2.3 6 0.4 (details in Supplementary Information). Previous estimates of a in hominids have ranged from 2 to 7 (refs 22, 23). It is possible that some of the higher values, having been estimated from sequence divergence only and in smaller data sets, were inflated by underestimating the suppression of ancestral Ne on X, in particular due to purifying selection. Our calculation of a assumes that a single speciation time applies across the genome, attributing differences between the X chromosome and autosomes to the factors mentioned above. An alternative model has been proposed24, involving complex speciation, with more recent human–chimpanzee ancestry on X than elsewhere. Given potential confounding factors in demography, selection, mutation rate bias and admixture, our analyses do not discriminate between these models; however if the effective human–chimpanzee separation time on X is indeed reduced in this way it would imply a still lower value of a. Functional sequence evolution We looked for loss or gain of unique autosomal sequence within humans, chimpanzees and gorillas by comparing raw sequence data for each in the context of their reference assemblies (Supplementary Information). The total amount is small: 3–7 Mbp per species, distributed genome-wide in fragments no more than a few kbp in length (Supplementary Table 7.1). The vast majority (97%) of such material was also found either in orang-utan or a more distant primate, indicating loss, and consistent with the expectation that gain is driven primarily by duplication (which our analysis excludes). Some fragments found only in one species overlap coding exons in annotated genes: 6 genes in human, 5 in chimpanzee and 9 in gorilla (Supplementary Tables 7.2, 7.3, 7.4), the majority being associated with olfactory receptor proteins or other rapidly evolving functions, such as male fertility and immune response. We did not assemble a gorilla Y chromosome, but by mapping ,63 reads from the male gorillas Kwanza and Mukisi to the human Y, we identified several regions in which human single-copy material is missing in gorilla, comprising almost 10% of the accessible malespecific region. Across the Y chromosome there is considerable variation in the copy number of shared material, and the pattern of coverage is quite different from that of reads from a male bonobo mapped in the same way (Supplementary Fig. 7.1). Some missing or depleted material overlaps coding genes (Supplementary Table 7.5), including for example VCY, a gene expressed specifically in male germ cells which has two copies in human and chimpanzee but apparently only one in gorilla (Supplementary Information). The resulting picture is consistent with rapid structural evolution of the Y chromosome in the great apes, as previously seen in the chimpanzee–human comparison25. Protein evolution The Ensembl EPO primate alignment was filtered to produce a highquality genome-wide set of 11,538 alignments representing orthologous primate coding sequences, which were then scored with codon-based evolutionary models for likelihoods of acceleration or deceleration of dN/dS in the terminal lineages, ancestral branch, and entire hominine subfamily (Supplementary Information). We find that genes with accelerated rates of evolution across hominines are enriched for functions associated with sensory perception, particularly in relation to hearing and brain development (Supplementary Table 8.4g, h). For example, among the most strongly accelerated genes are OTOF (P 5 0.0056), LOXHD1 (P , 0.01) and GPR98 (P 5 0.0056), which are all associated with diseases causing human deafness (Supplementary Table 8.5). GPR98, which also shows significant evidence of positive selection under the branch-site test (P 5 0.0081), is highly expressed in the developing central nervous system. The gene with the strongest evidence for acceleration along the branch leading to hominines is RNF213 (branch-site P , 2.9 3 1029 ), a gene associated with Moyamoya disease in which blood flow to the brain is restricted due to arterial stenosis26. Given that oxygen and glucose consumption scales with total neuron number27, RNF213 may have played a role in facilitating the evolution of larger brains. Together, these observations are consistent with a major role for adaptive modifications in brain development and sensory perception in hominine evolution. Turning to lineage-specific selection pressures, we find relatively similar numbers of accelerated genes in humans, chimpanzees and gorillas (663, 562 and 535 respectively at nominal P , 0.05, Supplementary Table 8.3a) and genome-wide dN/dS ratios (0.256, 0.249 and 0.239 in purifying sites, Supplementary Table 8.6). These numbers, which reflect variation in historical effective population sizes as well as environmental pressures, reveal a largely uniform landscape of recent hominine gene evolution—in accordance with previously published analyses in human and chimpanzee3,28 (Supplementary Table 8.7). Genes with accelerated rates of evolution along the gorilla lineage are most enriched for a number of developmental terms, including ear, hair follicle, gonad and brain development, and sensory perception of sound. Among the most significantly accelerated genes in gorilla is EVPL (P , 2.2 3 1025 ), which encodes a component of the cornified envelope of keratinocytes, and may be related to 0.0 0.2 0.4 0.6 0.34 0.33 0.32 0.31 0.30 Gene ILS fraction H−C expression distance o o o o o o o o o o o o o ooo o oo oo o o o o o oo o oo o ooo o o oo ooo o o oo oo ooo o oo o a b CTCF sites 19,451 human-specific 24,370 shared 5,228 gorilla-specific Human CpG 863 0 290 463 0 101 Gorilla CpG Shared CpG Non CpG 1746 16843 2103 21895 164 4969 Disruption Indel Substitution Unchanged 0 0.5 1 0 0.5 1 0 0.5 1 Fraction of CTCF sequence motif changes Figure 3 | Differences in expression and regulation. a, Mean gene expression distance between human and chimpanzee as a function of the proportion of ILS sites per gene. Each point represents a sliding window of 900 genes (over genes ordered by ILS fraction); s.d. error limits are shown in grey. b, Top row, classification of CTCF sites in the gorilla (EB(JC)) and human (GM12878) LCLs on the basis of species-uniqueness; numbers of alignable CTCF binding sites are shown for each category. Bottom three rows, sequence changes of CTCF motifs embedded in human-specific, shared and gorilla-specific CTCF binding sites located within shared CpG islands, species-specific CpG islands or outside CpG islands. Numbers of CTCF binding sites are shown for each CpG island category. Gorilla and human motif sequences are compared and represented as indels, disruptions (.4-bp gaps) and substitutions. RESEARCH ARTICLE 172 | NATURE | VOL 483 | 8 MARCH 2012 ©2012 Macmillan Publishers Limited. All rights reserved
ARTICLE RESEARCH increased cornification of knuckle pads in gorilla Interestingly,gorilla both of which were corroborated by additional capillary sequencing and human both yielded brain-associated terms enriched for accele- (Supplementary Table 8.10).Why variants that appear to cause disease rated genes,but chimpanzee did not (Supplementary Table 8.4a-c). in humans might be associated with a normal phenotype in gorillas is Genes expressed in the brain or involved in its development have not unknown;possible explanations are compensatory molecular changes typically been associated with positive selection in primates,but our elsewhere,or differing environmental conditions.Such variants have results show that multiple great ape lineages show elevated dN/ds in also been found in both the chimpanzee and macaque genomes".33 brain-related genes when evaluated against a primate background. We also identified cases of pairwise parallel evolution among Gene transcription and regulation hominines.Human and chimpanzee show the largest amount,with We carried out an analysis of hominine transcriptome variation using significantly more shared accelerations than expected by chance, total RNA extracted and sequenced from lymphoblastoid cell lines whereas gorilla shares more parallel acceleration with human than with (LCLs)of one gorilla,two chimpanzees and two bonobos (Sup- chimpanzee across a range of significance thresholds(Supplementary plementary Information),and published RNA sequence data for eight Fig.8.3).Genes involving hearing are enriched in parallel accelerations human individuals".After quantifying reads mapping to exons and for all three pairs,but most strongly in gorilla-human(Supplementary genes in each species,we calculated the degree of species-specific Table 8.4d-f),calling into question a previous link made between expression and splicing in 9,746 1:1:1 expressed orthologous genes. accelerated evolution of auditory genes in humans and language evolu- On average,expression levels in human and chimpanzee were more tion2.It is also interesting to note that ear morphology is one of the few similar to each other than either was to gorilla(Supplementary Fig.10.2). external traits in which humans are more similar to gorillas than to However this effect is reduced in genes with a higher proportion of ILS chimpanzees" sites,which tend to show greater expression distance between humans Next we considered gene loss and gain.We found 84 cases of gene and chimpanzees(Fig.3a).More generally,patterns seen in the relative loss in gorilla due to the acquisition of a premature stop codon,requir- expression distances between the three species showed a significant ing there to be no close paralogue(Supplementary Table 8.8):one such overlap with those derived from genomic lineage sorting (P=0.026; gene is TEX14,which codes for an intercellular bridge protein essential Supplementary Table 10.4),demonstrating that ILS can be reflected in for spermatogenesis in mice.Genome-wide analysis of gene gain is functional differences between primate species. confounded by the difficulty in assembling closely related paralogues. We also explored species specific variation in splicingby calculat- We therefore resequenced,by finishing overlapping fosmids,three ing the variance in differential expression of orthologous exons within gene clusters known to be under rapid adaptive evolution in primates: each gene.In total we found 7%of genes whose between-species the growth hormone cluster",the PRM clusters involved in sperm variance is significant at the 1%level (based on the distribution of function and the APOBEC cluster implicated in molecular adaptation within-human variances,Supplementary Fig.10.5).For example, to viral defence.In the growth hormone cluster,we observed four Supplementary Fig.10.6 illustrates gorilla-specific splicing in the chorionic somatomammotropin(CSH)genes in gorilla compared to SQLE gene,involved in steroid metabolism. three in humans and chimpanzees,with a novel highly similar pair of We further investigated great ape regulatory evolution by compar- CSH-like genes in gorilla that share a 3'end similar to human growth ing the binding in human and gorilla of CTCF,a protein essential to hormone GH2,suggesting a complex evolutionary history as in other vertebrate development that is involved in transcriptional regulation, primates".We saw sequence but not gene copy number changes in the chromatin loop formation and protein scaffolding".We performed PRM and APOBEC clusters(Supplementary Information). ChIP-seq (chromatin immunoprecipitation sequencing)of CTCF in a In several cases,a protein variant thought to cause inherited disease gorilla LCL (from EB(JC)),and compared this with matched human in humans"is the only version found in all three gorillas for which we experiments",using the EPO alignments to identify species-specific have genome-wide sequence data(Supplementary Table 8.9).Striking and shared binding regions(Fig.3b and Supplementary Information). examples are the dementia-associated variant Arg432Cys in the Consistent with previous results reporting strong CTCF binding con- growth factor PGRN and the hypertrophic cardiomyopathy- servation,and in contrast to the rapid turnover of some other tran- associated variant Arg153His in the muscle Z disk protein TCAP, scription factor binding sites",we found that approximately 70%of b d Nigeria Central African Rep 800 Sequence Camerool divergence 700 Eauatonal 600 Guinea 30 Split time r 500 Democratic Republic 400 of Congo Nw 300 200 Western Eastern 0.0 0.2 0.4 0.6 ■Cross River gorilla Eastern lowland gorilla Western Eastern Migration rate (events per generation) Western lowland gorilla ■Mountain gorilla Figure 4Gorilla species distribution and divergence.a,Distribution of reference assembly(photograph by J.R.).c,Eastern lowland gorilla Mukisi gorilla species in Africa.The western species (Gorilla gorilla)comprises two (photograph by M.Seres).d,Isolation-migration model of the western and subspecies:western lowland gorillas (G.gorilla gorilla)and Cross River gorillas eastern species.NA,Nw and NE are ancestral,western and eastern effective (G.gorilla diehli).Similarly,the eastern species (Gorilla beringei)is subclassified population sizes;m is the migration rate.e,Likelihood surface for migration into eastern lowland gorillas (G.beringei graueri)and mountain gorillas (G. and split time parameters in the isolation-migration model;colours from blue beringei beringei).(Based on data in ref.43.)Areas of water are shown pale blue. (minimum)to red(maximum)indicate the magnitude of likelihood. Inset,area of main map.b,Western lowland gorilla Kamilah,source of the 8 MARCH 2012 VOL 483 I NATURE 173 2012 Macmillan Publishers Limited.All rights reserved
increased cornification of knuckle pads in gorilla29. Interestingly, gorilla and human both yielded brain-associated terms enriched for accelerated genes, but chimpanzee did not (Supplementary Table 8.4a–c). Genes expressed in the brain or involved in its development have not typically been associated with positive selection in primates, but our results show that multiple great ape lineages show elevated dN/dS in brain-related genes when evaluated against a primate background. We also identified cases of pairwise parallel evolution among hominines. Human and chimpanzee show the largest amount, with significantly more shared accelerations than expected by chance, whereas gorilla shares more parallel acceleration with human than with chimpanzee across a range of significance thresholds (Supplementary Fig. 8.3). Genes involving hearing are enriched in parallel accelerations for all three pairs, but most strongly in gorilla–human (Supplementary Table 8.4d–f), calling into question a previous link made between accelerated evolution of auditory genes in humans and language evolution28. It is also interesting to note that ear morphology is one of the few external traits in which humans are more similar to gorillas than to chimpanzees30. Next we considered gene loss and gain. We found 84 cases of gene loss in gorilla due to the acquisition of a premature stop codon, requiring there to be no close paralogue (Supplementary Table 8.8): one such gene is TEX14, which codes for an intercellular bridge protein essential for spermatogenesis in mice. Genome-wide analysis of gene gain is confounded by the difficulty in assembling closely related paralogues. We therefore resequenced, by finishing overlapping fosmids, three gene clusters known to be under rapid adaptive evolution in primates: the growth hormone cluster31, the PRM clusters involved in sperm function and the APOBEC cluster implicated in molecular adaptation to viral defence. In the growth hormone cluster, we observed four chorionic somatomammotropin (CSH) genes in gorilla compared to three in humans and chimpanzees, with a novel highly similar pair of CSH-like genes in gorilla that share a 39 end similar to human growth hormone GH2, suggesting a complex evolutionary history as in other primates31. We saw sequence but not gene copy number changes in the PRM and APOBEC clusters (Supplementary Information). In several cases, a protein variant thought to cause inherited disease in humans32 is the only version found in all three gorillas for which we have genome-wide sequence data (Supplementary Table 8.9). Striking examples are the dementia-associated variant Arg432Cys in the growth factor PGRN and the hypertrophic cardiomyopathyassociated variant Arg153His in the muscle Z disk protein TCAP, both of which were corroborated by additional capillary sequencing (Supplementary Table 8.10). Why variants that appear to cause disease in humans might be associated with a normal phenotype in gorillas is unknown; possible explanations are compensatory molecular changes elsewhere, or differing environmental conditions. Such variants have also been found in both the chimpanzee and macaque genomes3,33. Gene transcription and regulation We carried out an analysis of hominine transcriptome variation using total RNA extracted and sequenced from lymphoblastoid cell lines (LCLs) of one gorilla, two chimpanzees and two bonobos (Supplementary Information), and published RNA sequence data for eight human individuals34. After quantifying reads mapping to exons and genes in each species, we calculated the degree of species-specific expression and splicing in 9,746 1:1:1 expressed orthologous genes. On average, expression levels in human and chimpanzee were more similar to each other than either was to gorilla (Supplementary Fig. 10.2). However this effect is reduced in genes with a higher proportion of ILS sites, which tend to show greater expression distance between humans and chimpanzees (Fig. 3a). More generally, patterns seen in the relative expression distances between the three species showed a significant overlap with those derived from genomic lineage sorting (P 5 0.026; Supplementary Table 10.4), demonstrating that ILS can be reflected in functional differences between primate species. We also explored species specific variation in splicing35 by calculating the variance in differential expression of orthologous exons within each gene. In total we found 7% of genes whose between-species variance is significant at the 1% level (based on the distribution of within-human variances, Supplementary Fig. 10.5). For example, Supplementary Fig. 10.6 illustrates gorilla-specific splicing in the SQLE gene, involved in steroid metabolism. We further investigated great ape regulatory evolution by comparing the binding in human and gorilla of CTCF, a protein essential to vertebrate development that is involved in transcriptional regulation, chromatin loop formation and protein scaffolding36. We performed ChIP-seq (chromatin immunoprecipitation sequencing) of CTCF in a gorilla LCL (from EB(JC)), and compared this with matched human experiments37, using the EPO alignments to identify species-specific and shared binding regions (Fig. 3b and Supplementary Information). Consistent with previous results reporting strong CTCF binding conservation38, and in contrast to the rapid turnover of some other transcription factor binding sites39, we found that approximately 70% of 0.0 0.2 0.4 0.6 800 700 600 500 400 300 200 Migration rate (events per generation) Split time (kyr ago) Western Cross River gorilla Western lowland gorilla Eastern lowland gorilla Mountain gorilla Eastern Congo Gabon Democratic Republic of Congo Rwanda Uganda Central African Rep. Nigeria Equatorial Guinea Cameroon Sequence divergence NA NE Eastern m m Western NW Split time τ a bd e c Figure 4 | Gorilla species distribution and divergence. a, Distribution of gorilla species in Africa. The western species (Gorilla gorilla) comprises two subspecies: western lowland gorillas (G. gorilla gorilla) and Cross River gorillas (G. gorilla diehli). Similarly, the eastern species (Gorilla beringei) is subclassified into eastern lowland gorillas (G. beringei graueri) and mountain gorillas (G. beringei beringei). (Based on data in ref. 43.) Areas of water are shown pale blue. Inset, area of main map. b, Western lowland gorilla Kamilah, source of the reference assembly (photograph by J.R.). c, Eastern lowland gorilla Mukisi (photograph by M. Seres). d, Isolation–migration model of the western and eastern species. NA, NW and NE are ancestral, western and eastern effective population sizes; m is the migration rate. e, Likelihood surface for migration and split time parameters in the isolation–migration model; colours from blue (minimum) to red (maximum) indicate the magnitude of likelihood. ARTICLE RESEARCH 8 MARCH 2012 | VOL 483 | NATURE | 173 ©2012 Macmillan Publishers Limited. All rights reserved
RESEARCH ARTICLE Table 2Nucleotide polymorphism in western and eastern gorillas Gorilla Species Heterozygous site rate(%) Homozygous site rate (% Homozygous:heterozygous ratio Kamilah Western lowland 0.189 0.0015 0.008 EB(JC) Western lowland 0.178 0.10 0.56 Mukisi Eastern lowland 0.076 0.19 2.5 Rates(columns 3and 4)are based on variants detected by mapping sequence data to the gorilla reference and filtering sites by depth and mapping quality(Supple entary Intormetion).The homozygosityrete for Kamilah is low (and is ffectively an error rate)becauseth this gorilla terozygosity in Mukisi is not du ng,as there are no long homozygous stretches. gorilla CTCF binding regions are shared with human.This compares We also collected whole-genome sequence data from an additional with around 80%pairwise overlaps between three human LCLs(Sup- male western lowland gorilla(Kwanza)at 12X,and further whole- plementary Fig.11.1a).Binding regions that are shared among all three genome sequence data for (eastern)Mukisi at 7X (Supplementary human individuals are three times more likely to be shared with gorilla Information).Differences between the western gorillas and Mukisi than individual-specific regions(Supplementary Fig.11.1b). represent a combination of inter-individual and inter-species var- The genomic changes leading to loss of CTCF binding differ between iants.These include 1,615 non-synonymous SNPs(single nucleotide regions within CpG islands and those in the rest of the genome.Losses polymorphisms)in 1,326 genes,seven of which have more than four of CTCF binding outside CpG islands and within species-specific CpG amino acid differences each (Supplementary Table 12.2),among regions co-occur with sequence changes in the binding motif,but for which are two olfactory receptor genes and EMR3,implicated in shared CpG islands most binding losses have no corresponding motif immune and inflammatory responses".Nineteen of the genes annotated sequence change(Fig.3b).It is possible that DNA methylation differ- in Kamilah carry an apparently homozygous premature stop codon in ences are driving this effect,as CTCF binding can be abolished by Mukisi.These include the gene encoding the seminal fluid protein methylation of specific target regions36.Alternatively,CTCF binding SEMG2,implicated in sperm competition and known to be inactivated within CpG islands may also depend more on other regulators'binding in some gorillas,where sperm competition is rare4s.Both EMR3 and and less on the CTCF motif itself. SEMG2 were corroborated by additional sequencing(Supplementary Tables12.3,12.4). Genetic diversity within Gorilla Finally,we investigated genomic duplication in gorilla using a Recent studies of molecular and morphological diversity within the whole-genome shotgun sequence detection method applied to data Gorilla genus have supported a classification into two species,eastern from the western lowland gorillas Kamilah and Kwanza(Supplemen- (Gorilla beringei)and western (Gorilla gorilla)4,with both species tary Information).This revealed a level of private segmental duplica- further divided into subspecies(Fig.4a).Although separated today by tion (0.9 Mbp and 1.5 Mbp in the two gorillas)well outside the range over 1,000km,it has been suggested that gene flow has occurred found in pairwise comparisons of humans(Supplementary Fig.13.1), between the eastern and western species since divergence".To invest- where a value of~100kbp is typical between any two individuals46. igate this,we collected reduced representation sequence data (Sup- These results suggest greater copy number diversity in gorillas than in plementary Information)for another female western lowland gorilla, humans,consistent with previous observations in the great apes". EB(JC),and a male eastern lowland gorilla,Mukisi. Discussion Table 2 summarizes the sequence diversity in these individuals and in Kamilah,based on alignment of sequence data to the gorilla assembly Since the middle Miocene-an epoch of abundance and diversity for The ratio ofhomozygous to heterozygous variant rates for EB(JC)(close apes throughout Eurasia and Africa-the prevailing pattern of ape evolu- to 0.5)is consistent with this gorilla coming from the same population as tion has been one of fragmentation and extinctions.The present-day Kamilah(Supplementary Information),and the rate of heterozygosity distribution of non-human great apes,existing only as endangered and for this gorilla matches Kamilah's.Mukisi,on the other hand,has subdivided populations in equatorial forest refugia,is a legacy of that process.Even humans,now spread around the world and occupying twice the rate of homozygous differences from the assembly,consistent with this gorilla coming from a separate population.Furthermore, habitats previously inaccessible to any primate,bear the genetic legacy heterozygosity in Mukisi is much lower,suggesting a reduced popu- of past population crises.All other branches of the genus Homo have passed into extinction.It may be that in the condition of Gorilla,Pan lation size in the eastern species.This agrees with previous studies based on fewer loci",and also with estimates of present-day numbers and Pongo we see some echo of our own ancestors before the last in the wild,which indicate that whereas the western lowland subspecies 100,000 years,and perhaps a condition experienced many times over several million years of evolution.It is notable that species within at may number up to 200,000 individuals,the eastern population as a whole is around ten times smaller43.Because it manifests in genetic least three of these genera continued to exchange genetic material long after separation4,a disposition that may have aided their survival in diversity,this disparity must have existed for many millennia,and the face of diminishing numbers.As well as teaching us about human cannot have resulted solely from the current pressure of human activity evolution,the study of the great apes connects us to a time when our in central Africa or recent outbreaks of the Ebola virus. existence was more tenuous,and in doing so,highlights the importance On the basis of an alignment of the EB(JC)and Mukisi data to the human reference sequence and comparing high confidence genotype of protecting and conserving these remarkable species. calls for the two individuals,we estimate a mean sequence divergence METHODS SUMMARY time between them of 1.75 Myr ago.However the pattern of shared Assembly.We constructed a hybrid de novo assembly combining 5.4 Gbp of heterozygosity is not consistent with a clean split between western and capillary read pairs with the contigs from an initial short read assembly of eastern gorillas(Supplementary Information).Under a model which 166.8 Gbp of Illumina paired reads.Improvements in long-range structure were allows symmetric genetic exchange between the populations after an then guided by human homology,placing contigs into scaffolds wherever read initial split (Fig.4d;Supplementary Information),the maximum pairs confirmed collinearity between gorilla and human.Base-pair contiguity was likelihood species split time is ~0.5 Myr ago with moderate sub- mproved by local reassembly within each scaffold,merging or extending contigs sequent exchange of~0.2 individuals per generation each way between using Illumina read pairs.Finally we used additional Kamilah bacterial artificial chromosome (BAC)and fosmid end pair capillary sequences to provide longer breeding pools,totalling ~5,000 in each direction over 0.5 Myr range scaffolding.Base errors were corrected by mapping all Illumina reads back (Fig.4e).Different model assumptions and parameterisations would to the assembly and rectifying apparent homozygous variants,while recording the lead to different values.More extensive sampling and sequencing of location of heterozygous sites.Further details and other methods are described in both gorilla populations will afford better resolution of this issue. Supplementary Information. 174 NATURE I VOL 483 8 MARCH 2012 2012 Macmillan Publishers Limited.All rights reserved
gorilla CTCF binding regions are shared with human. This compares with around 80% pairwise overlaps between three human LCLs (Supplementary Fig. 11.1a). Binding regions that are shared among all three human individuals are three times more likely to be shared with gorilla than individual-specific regions (Supplementary Fig. 11.1b). The genomic changes leading to loss of CTCF binding differ between regions within CpG islands and those in the rest of the genome. Losses of CTCF binding outside CpG islands and within species-specific CpG regions co-occur with sequence changes in the binding motif, but for shared CpG islands most binding losses have no corresponding motif sequence change (Fig. 3b). It is possible that DNA methylation differences are driving this effect, as CTCF binding can be abolished by methylation of specific target regions36. Alternatively, CTCF binding within CpG islands may also depend more on other regulators’ binding and less on the CTCF motif itself. Genetic diversity within Gorilla Recent studies of molecular and morphological diversity within the Gorilla genus have supported a classification into two species, eastern (Gorilla beringei) and western (Gorilla gorilla) 40, with both species further divided into subspecies (Fig. 4a). Although separated today by over 1,000 km, it has been suggested that gene flow has occurred between the eastern and western species since divergence41. To investigate this, we collected reduced representation sequence data (Supplementary Information) for another female western lowland gorilla, EB(JC), and a male eastern lowland gorilla, Mukisi. Table 2 summarizes the sequence diversity in these individuals and in Kamilah, based on alignment of sequence data to the gorilla assembly. The ratio of homozygous to heterozygous variant rates for EB(JC) (close to 0.5) is consistent with this gorilla comingfrom the same population as Kamilah (Supplementary Information), and the rate of heterozygosity for this gorilla matches Kamilah’s. Mukisi, on the other hand, has twice the rate of homozygous differences from the assembly, consistent with this gorilla coming from a separate population. Furthermore, heterozygosity in Mukisi is much lower, suggesting a reduced population size in the eastern species. This agrees with previous studies based on fewer loci41, and also with estimates of present-day numbers in the wild, which indicate that whereas the western lowland subspecies may number up to 200,000 individuals, the eastern population as a whole is around ten times smaller42,43. Because it manifests in genetic diversity, this disparity must have existed for many millennia, and cannot have resulted solelyfrom the current pressure of human activity in central Africa or recent outbreaks of the Ebola virus. On the basis of an alignment of the EB(JC) and Mukisi data to the human reference sequence and comparing high confidence genotype calls for the two individuals, we estimate a mean sequence divergence time between them of 1.75 Myr ago. However the pattern of shared heterozygosity is not consistent with a clean split between western and eastern gorillas (Supplementary Information). Under a model which allows symmetric genetic exchange between the populations after an initial split (Fig. 4d; Supplementary Information), the maximum likelihood species split time is ,0.5 Myr ago with moderate subsequent exchange of ,0.2 individuals per generation each way between breeding pools, totalling ,5,000 in each direction over 0.5 Myr (Fig. 4e). Different model assumptions and parameterisations would lead to different values. More extensive sampling and sequencing of both gorilla populations will afford better resolution of this issue. We also collected whole-genome sequence data from an additional male western lowland gorilla (Kwanza) at 123, and further wholegenome sequence data for (eastern) Mukisi at 73 (Supplementary Information). Differences between the western gorillas and Mukisi represent a combination of inter-individual and inter-species variants. These include 1,615 non-synonymous SNPs (single nucleotide polymorphisms) in 1,326 genes, seven of which have more than four amino acid differences each (Supplementary Table 12.2), among which are two olfactory receptor genes and EMR3, implicated in immune and inflammatory responses44. Nineteen of the genes annotated in Kamilah carry an apparently homozygous premature stop codon in Mukisi. These include the gene encoding the seminal fluid protein SEMG2, implicated in sperm competition and known to be inactivated in some gorillas, where sperm competition is rare45. Both EMR3 and SEMG2 were corroborated by additional sequencing (Supplementary Tables 12.3, 12.4). Finally, we investigated genomic duplication in gorilla using a whole-genome shotgun sequence detection method applied to data from the western lowland gorillas Kamilah and Kwanza (Supplementary Information). This revealed a level of private segmental duplication (0.9 Mbp and 1.5 Mbp in the two gorillas) well outside the range found in pairwise comparisons of humans (Supplementary Fig. 13.1), where a value of ,100 kbp is typical between any two individuals46. These results suggest greater copy number diversity in gorillas than in humans, consistent with previous observations in the great apes47. Discussion Since the middle Miocene—an epoch of abundance and diversity for apes throughout Eurasia and Africa—the prevailing pattern of ape evolution has been one of fragmentation and extinction48. The present-day distribution of non-human great apes, existing only as endangered and subdivided populations in equatorial forest refugia43, is a legacy of that process. Even humans, now spread around the world and occupying habitats previously inaccessible to any primate, bear the genetic legacy of past population crises. All other branches of the genus Homo have passed into extinction. It may be that in the condition of Gorilla, Pan and Pongo we see some echo of our own ancestors before the last 100,000 years, and perhaps a condition experienced many times over several million years of evolution. It is notable that species within at least three of these genera continued to exchange genetic material long after separation4,49, a disposition that may have aided their survival in the face of diminishing numbers. As well as teaching us about human evolution, the study of the great apes connects us to a time when our existence was more tenuous, and in doing so, highlights the importance of protecting and conserving these remarkable species. METHODS SUMMARY Assembly. We constructed a hybrid de novo assembly combining 5.4 Gbp of capillary read pairs with the contigs from an initial short read assembly of 166.8 Gbp of Illumina paired reads. Improvements in long-range structure were then guided by human homology, placing contigs into scaffolds wherever read pairs confirmed collinearity between gorilla and human. Base-pair contiguity was improved by local reassembly within each scaffold, merging or extending contigs using Illumina read pairs. Finally we used additional Kamilah bacterial artificial chromosome (BAC) and fosmid end pair capillary sequences to provide longer range scaffolding. Base errors were corrected by mapping all Illumina reads back to the assembly and rectifying apparent homozygous variants, while recording the location of heterozygous sites. Further details and other methods are described in Supplementary Information. Table 2 | Nucleotide polymorphism in western and eastern gorillas Gorilla Species Heterozygous site rate (%) Homozygous site rate (%) Homozygous:heterozygous ratio Kamilah Western lowland 0.189 0.0015 0.008 EB(JC) Western lowland 0.178 0.10 0.56 Mukisi Eastern lowland 0.076 0.19 2.5 Rates (columns 3 and 4) are based on variants detected by mapping sequence data to the gorilla reference and filtering sites by depth and mapping quality (Supplementary Information). The homozygosity rate for Kamilah is low (and is effectively an error rate) because the sequence of this gorilla was used for assembly. Reduced heterozygosity in Mukisi is not due to familial inbreeding, as there are no long homozygous stretches. RESEARCH ARTICLE 174 | NATURE | VOL 483 | 8 MARCH 2012 ©2012 Macmillan Publishers Limited. All rights reserved
ARTICLE RESEARCH Received 16 June 2011;accepted 10 January 2012. 38.Kunarso,G.etal Transposable elements have rewired the core regulatory network of human embryonic stem cells.Nature Genet 42,631-634(2010). 1. Huxley.T.H.Evidence as to Man's Place in Nature (Williams Norgate,1863). 39.Schmidt,D.et al.Five-vertebrate ChlP-seg reveals the evolutionary dynamics of King.M.C.Wilson,A.C.Evolution at two levels in humans and chimpanzees transcription factor binding.Science 328,1036-1040 (2010) Science188,107-116(1975. 40.Groves.C.Primate Taxonomy(Smithsonian Institution Press.2001). 3. Chimpanzee Sequencing and Analysis Consortium.Initial sequence of the 41.Thalmann,O.,Fischer,A,Lankester,F.,Paabo,S.Vigilant,L The complex chimpanzee genome and comparison with the human genome.Nature 437, evolutionary history of gorillas:insights from genomic data.Mol.Biol Evol.24. 69-87(2005). 146-158(2007). 4. Locke,D.P.et al.Comparative and demographic analysis of orang-utan genomes. 42.Stokes,E.Malonga.R.Rainey.H.Strindberg.S.Western Lowland Gorilla Surveys Nature469,529-533(2011). in Northern Republic of Congo 2006-2007.Summary Scientific Report (WCS Global Hubbard,T.J.et al.Ensembl 2009.Nucleic Acids Res.37,D690-D697(2009) Conservation.20081 6. Paten,B.,Herrero,J Beal,K.,Fitzgerald.S.Birey,E Enredo and Pecan:genome- 43.IUCN.The IUCN Red List of Threatened Species.Version 2010.I (http:// wide mammalian consistency-based multiple alignment with paralogs.Genome www.iucnredlistorg)(2010). Res.18,1814-1828(2008) 44. Stacey,M..Lin,H.H..Hilyard,K.L.,Gordon,S.McKnight,A.J.Human epidermal Bradley,B.J.Reconstructing phylog enies and phenotypes:a molecular view of growth factor (EGF)module-containing mucin-like hormone receptor 3 is a new human evolution.J.Anat 212,337-353(2008). member of the EGF-TM7 family that recognizes a ligand on human macrophages Burgess,R.Yang,Z.Estimation of hominoid ancestral population sizes under and activated neutroohils.J BoL Chem.276.18863-188/0(2001) bayesian coalescent models incorporating mutation rate variation and 45.Jensen-Seaman,M.I.Li,W.H.Evolution of the hominoid semenogelin genes,the sequencing errors.Mol Biol.Evol 25,1979-1994(2008). malor oroteins of elaculated semen.Mol Evol 5/261-270 (2003) 9. Woo d.B.Harrison,T.The evolutionary context of the first hominins.Nature 470. 46.Alkan,C.etal.Personalized copy number and segmental duplication maps using 347-352(2011) next-generation sequencing.Nature Genet 41,1061-1067(2009) 10.Steiper,M.E.Young,N.M.Timing primate evolution:lessons from the 47.Gazave,E.et al.Copy number variation analysis in the great apes reveals species- discordance between molecular and paleontological estimates.Evol.Anthropol. specific patterns of structural variation.Genome Res.21,1626-1639 (2011). 17.179-188(2008. 48.Begun,D.R.in HandbookofPalaeoanthropologyVol.2,Primate Evolution and Human 11. Lynch,M.Rate,molecular spectrum,and consequences of human mutation.Proc. Origins(eds Henke,W.Tattersall,L.)921-977 (Springer,2007) Natl Acad.Sci.US4107,961-968(2010). 49.Green,R.E.et al.A draft sequence of the Neandertal genome.Science 328 12.The 1000 Genomes Project Consortium..A map of human genome variation from 710-722(2010) population-scale sequencing.Nature 467,1061-1073 (2010):correction 473. 544(2011) 50.Lebatard,AEet al.Cosmogenic nuclide dating of Sahelanthropus tchadensis and Australopithecus bahrelghazali:Mio-Pliocene hominids from Chad.Proc.Natl Acad. 13.Roach,J.C.et al Analysis of genetic inheritanoe in a family quartet by whole- Sci.US4105.3226-3231(2008). genome sequencing.Science 328,636-639 (2010). 14.Hartwig,W.C.et al.The Primate Fossil Record (Cambridge Univ.Press,2002). Supplementary Information is linked to the online version of the paper at 15.Kim.S.H.,Elango,N.,Warden,C.,Vigoda,E.Yi,S.V.Heterogeneous genomic www.nature.com/nature. molecular clocks in primates.PLoS Genet 2,e163 (2006). 16.Fleagle,J.G.Primate Adaptation and Evolution 2nd edn (Academic Press,1998). Acknowledgements We thank H.Li and E.Birney for discussions.D.Zerbino,J.Stalker. 17.Charlesworth,D.Morgan,M.T.&Charlesworth,B.Mutation accumulation in finite L.Wilming,D.Rajan and H.Clawson for technical assistance,J.Ahringer for comments populations.J.Hered.84,321-325(1993). on the manuscript,K.Leus of the Center for Research and Conservation of the Royal 18.McVicker,G.,Gordon,D.,Davis,C.Green,P.Widespread genomic signatures of Zoological Society of Antwerp for sample material from Mukisi,and the Marmoset natural selection in hominid evolution.PLoS Genet.5,e1000471 (2009). Genome Analysis Consortium for permission to use the unpublished assembly of the 19.Myers,S Bottolo,L,Freeman,C.McVean,G.Donnelly,P.A fine-scale r ap of marmoset genome.This research was supported in part by Wellcome Trust grants recombination rates and hotspots across the human genome.Science 310, WT062023(to J.H.,KB.S.F.,A.J.V.,P.F.),WT089066(to R.D.),WT077192(to R.D.,S.M. 321-324(2005). A.K-K.J.T.S.W.W.).WT077009 (to Y.X..B.Y..OA..Y.C.C.T-S.)WTO77198(to G.KL) 20.Vicoso,B.Charlesworth,B.Evolution on the X chromosome:unusual patterns and 075491/Z/04 (to G.L.);EMBL grants (to P.C.S.P.F.);scholarships from the Gates Cambridge Trust (to G.EJ.and T.D.O'C.):an MRC Special Fellowship in Biomedical and processes.Nature Rev.Genet.7,645-653(2006). 21. Ellegren,H.Characteristics,causes and evolutionary consequences of male Informatics(to A.S.);funding from the Lundbeck Foundation(to A.H.);the Academy of biased mutation.Proc.R.Soc.Lond.B 274,1-10 (2007). Finland and the Emil Aaltonen Foundation (to T.L);a Marie Curie fellowship (to T.M.-B.); 22. Goetting-Minesky,M.P.Makova,K.D.Mammalian male mutation bias:impacts the European Community's Seventh Framework Programme(FP7/2007-2013)ERC of generation time and regional variation in substitution rates.J.Mol.Evol 63. Starting Grant(StG_20091118)(to T.M.-B.):an FPl grant from the Spanish Ministry of 537-544(2006). Education (BES-2010-032251)(to J.P.-M.);a BBSRC Doctoral Training Grant(to 23.Presgraves,D.C.&Yi,S.V.Doubtsabout complex speciation between humans and S.H.M.):grants from the UK Medical Research Council (to A.H..S.M.C.P.P.);the chimpanzees.Trends Ecol Evol.24,533-540(2009) Intramural Research Program of the national Human Genome Research Insttute 24.Patterson,N.,Richter,D.J.,Gnerre,S.,Lander,E.S.Reich,D.Genetic evidence for National Institutes of Health (to J.C.M.);the Danish Council for Independent Research complex speciation of humans and chimpanzees.Nature 441,1103-1108 Natural Sciences,grant no.09-062535(to K.M.,M.H.S.):a Commonwealth Scholarship (2006). (to M.C.W.);the Swiss National Science Foundation,Louis Jeantet Foundation (to 25.Hughes,J.F.et al.Chimpanzee and human Y chromosomes are remarkably E.T.D.):an ERC Starting Grant and an EMBO Young Investigator Award,Hutchinson divergent in structure and gene content.Nature 463,536-539(2010). Whampoa(to D.T.O.);NHGRI support(to W.C.W.);support from BIOBASE GmbH(to 26.Kamada,F.et al.A genome-wide association study identifies RNF213 as the first E.V.B..P.D.S.M.M.AD.P.K.S..D.N.C.):US National Science Foundation grant Moyamoya disease gene.J.Hum.Genet 56,34-40(2011). DGE-0739133(to W.W.):NHGRI U54 HG003079(to R.K.W.):NIH grant HG002385(to 27.Herculano-Houzel,S.Scaling of brain metabolism with a fixed energy budget per E.EE).EEE.is an investigator of the Howard Hughes Medical Institute. neuron:implications for neuronal activity,plasticity and evolution.PLoS ONE 6, Author Contributions Manuscript main text:AS.R.D..C.T.-S..N.I.M..G.EJ..P.C.S. e17514(2011) A.K-K Project coordination:A.S.,A.S.R.A.K-K.R.D.Project initiation:J.R..R.D.,R.K.W 28.Clark.A.G.et al.Inferring nonneutral evolution from human-chimp-mouse Library preparation and sequencing:I.G.D.J.T.MA.Q..C.M.C.B.Z..P.d...OA.R.Q.A. orthologous gene trios.Science 302,1960-1963(2003). BY.Y.X TAG.W.C.W.Assembly:A.S.LW.H.Y.G.J.T.S.J.C.M.W.W.ZN.Fosmid 29.Ellis,R.A.Montagna,W.The skin of primates.VL.The skin of the gorilla (Gorilla gorilla).Am.J.Phys.Anthropol 20,79-93(1962). finishing:P.H.Assembly quality:AS.S.Meader,G.L.,C.P.P.Annotation:Y.A.T..G.K.L A.J.V.A.Heger,S.M.S.Primate multiple alignments:J.H.,K.B.,S.F.Great ape speciation 30.Streeter,G.L Some uniform characteristics of the primate auricle.Anat.Rec.A23, and ILS:J.Y.D.,A.S..T.M.M.H.S,K.M,G.EJ.Sequence loss and gain:AS..S.M.C.T.-S. 335-341(1922). YA.T.AJ.V.Protein evolution:G.EJ..S.H.M.,N.LM.BJ.B..T.D.O'C..Y.X..Y.C..N.G.Human 31.Wallis,O.C.,Zhang.Y.P.Wallis,M.Molecular evolution of GH in primates: disease allele analysis:Y.X.Y.C..C.T.-S.P.D.S.E.V.B.AD.P.,M.M.K.S.,D.N.C. characterisation of the GH genes from slow loris and marmoset defines an episode Transcriptome analysis:T.L,E.T.D.ChlP-seg experiment and analysis:P.C.S.,M.C.W. of rapid evolutionary change.J.Mol.Endocrinol.26,249-258(2001). D.S.,P.F.D.T.O.Additional gorilla samples:B.Y.Y.X.,LV.C.T.-S.Gorillaspecies diversity 32.Stenson,P.D.et al.The Human Gene Mutation Database:2008 update.Genome and divergence:A.S,A.H.,T.M.,L.N.A.,B.Y.,LV.Gorilla species functional differences: Med.1,13(2009) Y.X..Y.C..C.T.-S.Segmental duplication analysis:T.M.-B..C.A.,S.S.,E.K,J.P.-M.,E.E.E 33.Gibbs,R.A.et al.Evolutionary and biomedical insights from the rhesus macaque genome.Science 316,222-234(2007) Author Information Accession numbers for all primary sequencing data are given in 34.Montgomery.S.B.et al Transcriptome genetics using second generation Supplementary Information.The assembly has been submitted to EMBL with sequencing in a Caucasian population.Nature 464,773-777(2010) accession numbers FR853080 to FR853106,and annotation is available at Ensembl 35.Blekhman,R.Marioni,J.C.Zumbo,P.,Stephens,M.Gilad,Y.Sex-specific and (http://www.ensembl.org/Gorilla_gorilla/Info/Index).Reprints and permissions lineage-specific alterative splicing in primates.Genome Res.20,180-189 information is available at www.nature.com/reprints.This paper is distributed under (2010). the terms of the Creatrve Commons Attributon-Non-Commercial-Share Allke licence 36.Phillips.J.E Corces,V.G.CTCF:master weaver of the genome.Cell 137. and is treelv avallable to all readers at www.nature com/nature.Ihe authors declare no 1194-1211(2009). competing financial interests.Readers are welcome to comment on the online version 37.McDaniell.R.et al.Heritable individual-specific and allele-specific chromatin of this article at www.nature.com/nature.Correspondence and requests for materials signatures in humans.Science 328,235-239(2010). should be addressed to R.D.(rd@sanger.ac.uk). 8 MARCH 2012 VOL 483 I NATURE 175 2012 Macmillan Publishers Limited.All rights reserved
Received 16 June 2011; accepted 10 January 2012. 1. Huxley, T. H. Evidence as to Man’s Place in Nature (Williams & Norgate, 1863). 2. King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975). 3. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005). 4. Locke, D. P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533 (2011). 5. Hubbard, T. J. et al. Ensembl 2009. Nucleic Acids Res. 37, D690–D697 (2009). 6. Paten, B., Herrero, J., Beal,K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genomewide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008). 7. Bradley, B. J. Reconstructing phylogenies and phenotypes: a molecular view of human evolution. J. Anat. 212, 337–353 (2008). 8. Burgess, R. & Yang, Z. Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994 (2008). 9. Wood, B. & Harrison, T. The evolutionary context of the first hominins. Nature 470, 347–352 (2011). 10. Steiper, M. E. & Young, N. M. Timing primate evolution: lessons from the discordance between molecular and paleontological estimates. Evol. Anthropol. 17, 179–188 (2008). 11. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl Acad. Sci. USA 107, 961–968 (2010). 12. The 1000 Genomes Project Consortium.. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010); correction 473, 544 (2011). 13. Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by wholegenome sequencing. Science 328, 636–639 (2010). 14. Hartwig, W. C. et al. The Primate Fossil Record (Cambridge Univ. Press, 2002). 15. Kim, S. H., Elango, N., Warden, C., Vigoda, E. & Yi, S. V. Heterogeneous genomic molecular clocks in primates. PLoS Genet. 2, e163 (2006). 16. Fleagle, J. G. Primate Adaptation and Evolution 2nd edn (Academic Press, 1998). 17. Charlesworth, D., Morgan, M. T. & Charlesworth, B. Mutation accumulation in finite populations. J. Hered. 84, 321–325 (1993). 18. McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009). 19. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005). 20. Vicoso, B. & Charlesworth, B. Evolution on the X chromosome: unusual patterns and processes. Nature Rev. Genet. 7, 645–653 (2006). 21. Ellegren, H. Characteristics, causes and evolutionary consequences of malebiased mutation. Proc. R. Soc. Lond. B 274, 1–10 (2007). 22. Goetting-Minesky, M. P. & Makova, K. D. Mammalian male mutation bias: impacts of generation time and regional variation in substitution rates. J. Mol. Evol. 63, 537–544 (2006). 23. Presgraves, D. C. & Yi, S. V. Doubts about complex speciation between humans and chimpanzees. Trends Ecol. Evol. 24, 533–540 (2009). 24. Patterson, N., Richter, D. J., Gnerre, S., Lander, E. S. & Reich, D. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006). 25. Hughes, J. F. et al. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463, 536–539 (2010). 26. Kamada, F. et al. A genome-wide association study identifies RNF213 as the first Moyamoya disease gene. J. Hum. Genet. 56, 34–40 (2011). 27. Herculano-Houzel, S. Scaling of brain metabolism with a fixed energy budget per neuron: implications for neuronal activity, plasticity and evolution. PLoS ONE 6, e17514 (2011). 28. Clark, A. G. et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, 1960–1963 (2003). 29. Ellis, R. A. & Montagna, W. The skin of primates. VI. The skin of the gorilla (Gorilla gorilla). Am. J. Phys. Anthropol. 20, 79–93 (1962). 30. Streeter, G. L. Some uniform characteristics of the primate auricle. Anat. Rec. A 23, 335–341 (1922). 31. Wallis, O. C., Zhang, Y. P. & Wallis, M. Molecular evolution of GH in primates: characterisation of the GH genes from slow loris and marmoset defines an episode of rapid evolutionary change. J. Mol. Endocrinol. 26, 249–258 (2001). 32. Stenson, P. D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009). 33. Gibbs, R. A. et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007). 34. Montgomery, S. B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010). 35. Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M. & Gilad, Y. Sex-specific and lineage-specific alternative splicing in primates. Genome Res. 20, 180–189 (2010). 36. Phillips, J. E. & Corces, V. G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009). 37. McDaniell, R. et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010). 38. Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nature Genet. 42, 631–634 (2010). 39. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010). 40. Groves, C. Primate Taxonomy (Smithsonian Institution Press, 2001). 41. Thalmann, O., Fischer, A., Lankester, F., Paabo, S. & Vigilant, L. The complex evolutionary history of gorillas: insights from genomic data. Mol. Biol. Evol. 24, 146–158 (2007). 42. Stokes, E., Malonga, R., Rainey, H. & Strindberg, S. Western Lowland Gorilla Surveys in Northern Republic of Congo 2006–2007. Summary Scientific Report (WCS Global Conservation, 2008). 43. IUCN. The IUCN Red List of Threatened Species. Version 2010. 1 Æhttp:// www.iucnredlist.orgæ (2010). 44. Stacey, M., Lin, H. H., Hilyard, K. L., Gordon, S. & McKnight, A. J. Human epidermal growth factor (EGF) module-containing mucin-like hormone receptor 3 is a new member of the EGF-TM7 family that recognizes a ligand on human macrophages and activated neutrophils. J. Biol. Chem. 276, 18863–18870 (2001). 45. Jensen-Seaman, M. I. & Li, W. H. Evolution of the hominoid semenogelin genes, the major proteins of ejaculated semen. J. Mol. Evol. 57, 261–270 (2003). 46. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009). 47. Gazave, E. et al. Copy number variation analysis in the great apes reveals speciesspecific patterns of structural variation. Genome Res. 21, 1626–1639 (2011). 48. Begun, D. R. inHandbook of PalaeoanthropologyVol. 2,Primate Evolution and Human Origins (eds Henke, W. & Tattersall, I.) 921–977 (Springer, 2007). 49. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). 50. Lebatard, A. E. et al. Cosmogenic nuclide dating of Sahelanthropus tchadensis and Australopithecus bahrelghazali: Mio-Pliocene hominids from Chad. Proc. Natl Acad. Sci. USA 105, 3226–3231 (2008). Supplementary Information is linked to the online version of the paper at www.nature.com/nature. Acknowledgements We thank H. Li and E. Birney for discussions, D. Zerbino, J. Stalker, L. Wilming, D. Rajan and H. Clawson for technical assistance, J. Ahringer for comments on the manuscript, K. Leus of the Center for Research and Conservation of the Royal Zoological Society of Antwerp for sample material from Mukisi, and the Marmoset Genome Analysis Consortium for permission to use the unpublished assembly of the marmoset genome. This research was supported in part by Wellcome Trust grants WT062023 (to J.H., K.B., S.F., A.J.V., P.F.), WT089066 (to R.D.), WT077192 (to R.D., S.M., A.K.-K., J.T.S., W.W.), WT077009 (to Y.X., B.Y., Q.A., Y.C., C.T.-S.), WT077198 (to G.K.L.) and 075491/Z/04 (to G.L.); EMBL grants (to P.C.S., P.F.); scholarships from the Gates Cambridge Trust (to G.E.J. and T.D.O’C.); an MRC Special Fellowship in Biomedical Informatics (to A.S.); funding from the Lundbeck Foundation (to A.H.); the Academy of Finland and the Emil Aaltonen Foundation (to T.L.); a Marie Curie fellowship (to T.M.-B.); the European Community’s Seventh Framework Programme (FP7/2007-2013)/ ERC Starting Grant (StG_20091118) (to T.M.-B.); an FPI grant from the Spanish Ministry of Education (BES-2010-032251) (to J.P.-M.); a BBSRC Doctoral Training Grant (to S.H.M.); grants from the UK Medical Research Council (to A.H., S.M., C.P.P.); the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (to J.C.M.); the Danish Council for Independent Research, Natural Sciences, grant no. 09-062535 (to K.M.,M.H.S.); a Commonwealth Scholarship (to M.C.W.); the Swiss National Science Foundation, Louis Jeantet Foundation (to E.T.D.); an ERC Starting Grant and an EMBO Young Investigator Award, Hutchinson Whampoa (to D.T.O.); NHGRI support (to W.C.W.); support from BIOBASE GmbH (to E.V.B., P.D.S., M.M., A.D.P., K.S., D.N.C.); US National Science Foundation grant DGE-0739133 (to W.W.); NHGRI U54 HG003079 (to R.K.W.); NIH grant HG002385 (to E.E.E). E.E.E. is an investigator of the Howard Hughes Medical Institute. Author Contributions Manuscript main text: A.S., R.D., C.T.-S., N.I.M., G.E.J., P.C.S., A.K.-K. Project coordination: A.S., A.S.R., A.K.-K., R.D. Project initiation: J.R., R.D., R.K.W. Library preparation and sequencing: I.G., D.J.T., M.A.Q., C.M.C., B.Z., P.d.J., O.A.R., Q.A., B.Y., Y.X., T.A.G., W.C.W. Assembly: A.S., L.W.H., Y.G., J.T.S., J.C.M., W.W., Z.N. Fosmid finishing: P.H. Assembly quality: A.S., S. Meader, G.L., C.P.P. Annotation: Y.A.T., G.K.L., A.J.V., A. Heger, S.M.S. Primate multiple alignments: J.H., K.B., S.F. Great ape speciation and ILS: J.Y.D., A.S., T.M., M.H.S., K.M., G.E.J. Sequence loss and gain: A.S., S.M., C.T.-S., Y.A.T., A.J.V. Protein evolution: G.E.J., S.H.M., N.I.M., B.J.B., T.D.O’C., Y.X., Y.C., N.G. Human disease allele analysis: Y.X., Y.C., C.T.-S., P.D.S., E.V.B., A.D.P., M.M., K.S., D.N.C. Transcriptome analysis: T.L., E.T.D. ChIP-seq experiment and analysis: P.C.S., M.C.W., D.S., P.F., D.T.O. Additional gorilla samples: B.Y., Y.X., L.V., C.T.-S. Gorilla species diversity and divergence: A.S., A.H., T.M., L.N.A., B.Y., L.V. Gorilla species functional differences: Y.X., Y.C., C.T.-S. Segmental duplication analysis: T.M.-B., C.A., S.S., E.K., J.P.-M., E.E.E. Author Information Accession numbers for all primary sequencing data are given in Supplementary Information. The assembly has been submitted to EMBL with accession numbers FR853080 to FR853106, and annotation is available at Ensembl (http://www.ensembl.org/Gorilla_gorilla/Info/Index). Reprints and permissions information is available at www.nature.com/reprints. This paper is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence, and is freely available to all readers at www.nature.com/nature. The authors declare no competing financial interests. Readers are welcome to comment on the online version of this article at www.nature.com/nature. Correspondence and requests for materials should be addressed to R.D. (rd@sanger.ac.uk). ARTICLE RESEARCH 8 MARCH 2012 | VOL 483 | NATURE | 175 ©2012 Macmillan Publishers Limited. All rights reserved