EXTENDED PDF FORMAT BIORAD httb: //discover bio-rad. com Science DNA Sequencing and Analysis of Neanderthal Genomic ames p. no et al NAAAS Science314,1113(2006) Do|:10.1126/ scIence.1131412 The following resources related to this article are available online at www.sciencemag.org(thisinformationiscurrentasofApril9,2007: Updated information and services, including high-resolution figures, can be found in the online version of this article at http://www.sciencemag.org/cgilcontent/full/314/5802/1113 Supporting Online Material can be found at http://www.sciencemag.org/cgi/content/full/314/5802/1113/dc1 A list of selected additional articles on the science web sites related to this article can be found at http://www.sciencemag.org/cgi/content/full/314/5802/1113#related-content This article cites 22 articles, 10 of which can be accessed for free http://www.sciencemag.org/cgi/content/full/314/5802/1113#otherarticles This article has been cited by 5 article(s)on the iSI Web of Science This article appears in the following subject collections Evolution http://www.sciencemag.org/cgi/collection/evolution Information about obtaining reprints of this article or about obtaining permission to reproduce this article in whole or in part can be found at http://www.sciencemag.org/about/permissions.dtl o0sE9g Science(print ISSN 0036-8075: online ISSN 1095-9203)is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright c 2006 by the American Association for the Advancement of Science; all rights reserved. The title SCIENCE is a egistered trademark of AAAs
DOI: 10.1126/science.1131412 Science 314, 1113 (2006); James P. Noonan, et al. DNA Sequencing and Analysis of Neanderthal Genomic www.sciencemag.org (this information is current as of April 9, 2007 ): The following resources related to this article are available online at http://www.sciencemag.org/cgi/content/full/314/5802/1113 version of this article at: Updated information and services, including high-resolution figures, can be found in the online http://www.sciencemag.org/cgi/content/full/314/5802/1113/DC1 Supporting Online Material can be found at: found at: A list of selected additional articles on the Science Web sites related to this article can be http://www.sciencemag.org/cgi/content/full/314/5802/1113#related-content http://www.sciencemag.org/cgi/content/full/314/5802/1113#otherarticles This article cites 22 articles, 10 of which can be accessed for free: This article has been cited by 5 article(s) on the ISI Web of Science. http://www.sciencemag.org/cgi/collection/evolution Evolution This article appears in the following subject collections: http://www.sciencemag.org/about/permissions.dtl this article in whole or in part can be found at: Information about obtaining reprints of this article or about obtaining permission to reproduce registered trademark of AAAS. c 2006 by the American Association for the Advancement of Science; all rights reserved. The title SCIENCE is a American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the on April 9, 2007 www.sciencemag.org Downloaded from
A remains(8-I1) In contrast to previous efforts to Sequencing and Analysis of obtain ancient sequences by direct analysis of extracts(3-6, 12), metagenomic libraries allow the Neanderthal genomic dna immortalization of DNA isolated from precious ancient samples, obviating the need for repeated destructive extractions(10). In addition, once an James P. Noonan, ,2 Graham Coop, Sridhar Kudaravalli, Doug Smith, ancient DNA fragment is cloned into a metagenomic Jonathan K. Pritchard, Edward M. Rubins Darren Platt, Svante Paabo 4 Johannes Krause, Joe Alessi, Feng Chen lbrary, it can be distinguished from contamination that might be introduced during subsequent PCR Our knowledge of Neanderthals is based on a limited number of remains and fication or sequencing by the vector sequences ve must make inferences about their biology, behavior, and relationship to ourselves. Here, we linked to each hbrary-derived insert(Fig. 1) describe the characterization of these extinct hominids from a new perspective overy of Neanderthal nuclear DNA development of a Neanderthal metagenomic library and its high-throughput sequencing and sequences using a metagenomic approach. In analysis. Several lines of evidence indicate that the 65, 250 base pairs of hominid sequence so far this study, we applied an amplification-independent identified in the library are of Neanderthal origin, the strongest being the ascertainment of direct cloning method to construct a Neanderthal sequence identities between Neanderthal and chimpanzee at sites where the human genomic metagenomic library, designated NEl, using DNA extracted from a 38,000-year-old specimen from sequence is different. These results enabled us to calculate the human-Neanderthal divergence vindija, Croatia(6, 13). We have recovered 65, 2: time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share base pairs(bp)of Neanderthal genome sequence a most recent common ancestor -706, 000 years ago, and that the human and Neanderthal humans.Our finding that the Neanderthal and human genomes are at least 99.5% identical led us We have also used the metagenomic library as a o ancestral populations split-370, 000 years ago, before the emergence of anatomically modern sequencing and massively parallel pyrosequencing. to develop and successfully implement a targeted method for recovering specific ancient DNA substrate to isolate specific Neanderthal sequences by direct genomic selection. Several lines of evi sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances dence indicated that the hominid sequences in this our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis nd signifies the dawn of Neanderthal genomics. lbrary were largely Neanderthal, rather than modem human contamination. Mitochondrial PCR analysis Ni eanderthals are the closest hominid rela- -500,000 years ago, well before the emergence of of the extract used to build the library, using an tives of modem humans(I). As late as modem humans (3-5). Further analyses of mito- amplicon of similar size as the average hominid 30,000 years ago, humans and Neander- chondrial data, inchuding the comparison of mito- sequence identified in the library, revealed that only thals coexisted in Europe and westem Asia(2). chondrial sequences obtained from several -2% of the products were from contaminating Since that time, our species has spread across Earth, Neanderthals and early modem humans, suggest modem human DNA, whereas the remaining 98% 9 far surpassing any previous hominid or primate little or no admixture between Neanderthal and were Neanderthal Signatures of damage in the species in numbers, technological development, modem human populations in Europe(3, 4, 6, 7). hominid sequences that are characterstic of ancient 画 onmental impact, while Neanderthals have Howeve- pnor molecular DNA also suggested that they were ancient Fina目 been exclusively constrained to the comparison of sequences reflect only matemal inheritance of a quences from the library to orthologous human and human and polymerase chain reaction(PCr single locus. Accordingly, in the absence of Nean- chimpanzee genomic sequences identified human- amplified Neanderthal mitochondrial sequences, derthal autosomal and Y-chromosome sequences, specific substitutions at sites where the hominid o which suggest that the most recent common the assessment of human-Neanderthal admixture sequence was identical to that of the chimpan- 5 ancestor of humans and Neanderthals existed remains incomplete. Mitochondrial data also p ee, enabling us to make estimates of the human- vide no access to the gene and gene regulatory Neanderthal divergence time ( 3, 4,6 of Energy joint Genome Institute, 2800 sequence differences between humans and Nean- Mitchell Drive, Walnut Creek, CA 94598, USA. 'Genomics derthals that would help to reveal biological features sequence content of library nEl by Sanger se Division. Lawrence National Laboratory, 1 Cyclotron unique to each. These insights await the recovery of quencing of individual clones, which allowed Road,Berkeley, CA 94720, USA. Department of Human Neanderthal genomic sequences. individual lbrary inserts to be completely sequenced The introduction of high-throughput sequencing and thus provided a direct measure of hominid insert nthropology, Deutscher Platz 6, 04103, Leipzig, Germany. technologies and recent advances in metagenomic size that could not be obtained from the -100-bp " To whom correspondence should be addressed. E-mail: analysis of complex DNA mixtures now provide a pyrosequencing reads described below(Table 1). trategy to recover genomic sequences from ancient We identified hominid sequences in the library by High-throughput capillary Extract and pyrosequencing of ag%cLone Batch culture ○66→ with± Direct selection of Small-scale genomic and mtDNA targets sequencing Other sequence in library vector Fig. 1. Generation of ancient metagenomic library DNAs for direct selection and pyrosequencing. /wsciencemag. org SCIENCE VOL 314 17 NOVEMBER 2006 1113
RESEARCH ARTICLE Sequencing and Analysis of Neanderthal Genomic DNA James P. Noonan,1,2 Graham Coop,3 Sridhar Kudaravalli,3 Doug Smith,1 Johannes Krause,4 Joe Alessi,1 Feng Chen,1 Darren Platt,1 Svante Pääbo,4 Jonathan K. Pritchard,3 Edward M. Rubin1,2* Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. Neanderthals are the closest hominid relatives of modern humans (1). As late as 30,000 years ago, humans and Neanderthals coexisted in Europe and western Asia (2). Since that time, our species has spread across Earth, far surpassing any previous hominid or primate species in numbers, technological development, and environmental impact, while Neanderthals have vanished. Molecular studies of Neanderthals have been exclusively constrained to the comparison of human and polymerase chain reaction (PCR)– amplified Neanderthal mitochondrial sequences, which suggest that the most recent common ancestor of humans and Neanderthals existed ~500,000 years ago, well before the emergence of modern humans (3–5). Further analyses of mitochondrial data, including the comparison of mitochondrial sequences obtained from several Neanderthals and early modern humans, suggest little or no admixture between Neanderthal and modern human populations in Europe (3, 4, 6, 7). However, a major limitation of all prior molecular studies of Neanderthals is that mitochondrial sequences reflect only maternal inheritance of a single locus. Accordingly, in the absence of Neanderthal autosomal and Y-chromosome sequences, the assessment of human-Neanderthal admixture remains incomplete. Mitochondrial data also provide no access to the gene and gene regulatory sequence differences between humans and Neanderthals that would help to reveal biological features unique to each. These insights await the recovery of Neanderthal genomic sequences. The introduction of high-throughput sequencing technologies and recent advances in metagenomic analysis of complex DNA mixtures now provide a strategy to recover genomic sequences from ancient remains (8–11). In contrast to previous efforts to obtain ancient sequences by direct analysis of extracts (3–6, 12), metagenomic libraries allow the immortalization of DNA isolated from precious ancient samples, obviating the need for repeated destructive extractions (10). In addition, once an ancient DNA fragment is cloned into a metagenomic library, it can be distinguished from contamination that might be introduced during subsequent PCR amplification or sequencing by the vector sequences linked to each library-derived insert (Fig. 1). Recovery of Neanderthal nuclear DNA sequences using a metagenomic approach. In this study, we applied an amplification-independent direct cloning method to construct a Neanderthal metagenomic library, designated NE1, using DNA extracted from a 38,000-year-old specimen from Vindija, Croatia (6, 13). We have recovered 65,250 base pairs (bp) of Neanderthal genome sequence from this library through a combination of Sanger sequencing and massively parallel pyrosequencing. We have also used the metagenomic library as a substrate to isolate specific Neanderthal sequences by direct genomic selection. Several lines of evidence indicated that the hominid sequences in this library were largely Neanderthal, rather than modern human contamination. Mitochondrial PCR analysis of the extract used to build the library, using an amplicon of similar size as the average hominid sequence identified in the library, revealed that only ~2% of the products were from contaminating modern human DNA, whereas the remaining 98% were Neanderthal. Signatures of damage in the hominid sequences that are characteristic of ancient DNA also suggested that they were ancient. Finally and most importantly, comparison of hominid sequences from the library to orthologous human and chimpanzee genomic sequences identified humanspecific substitutions at sites where the hominid sequence was identical to that of the chimpanzee, enabling us to make estimates of the humanNeanderthal divergence time (3, 4, 6). We initially assessed the Neanderthal genomic sequence content of library NE1 by Sanger sequencing of individual clones, which allowed individual library insertsto be completely sequenced and thus provided a direct measure of hominid insert size that could not be obtained from the ~100-bp pyrosequencing reads described below (Table 1). We identified hominid sequences in the library by 1 U.S. Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA. 2 Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA. 3 Department of Human Genetics, University of Chicago, 920 East 58th Street, Chicago, IL 60637, USA. 4 Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany. *To whom correspondence should be addressed. E-mail: emrubin@lbl.gov Fig. 1. Generation of ancient metagenomic library DNAs for direct selection and pyrosequencing. www.sciencemag.org SCIENCE VOL 314 17 NOVEMBER 2006 1113 on April 9, 2007 www.sciencemag.org Downloaded from
RESEARCH ARTICLE comparison to the reference human genome from assembled quality-filtered pyrosequencing data features consistent with the known distribution of (13, 14). In many cases, the human than in sequence obtained from Sanger sequencing. these features in the human genome(Fig. 3B). hit covered only part of the insert, because he low complexity of library nEl made these These sequences are therefore likely to represent a the direct cloning method we employed produces analyses possible, because it resulted in a limited random sampling of the Neanderthal genome. chimeric inserts consisting of smaller fragments number of clones in the library that were amplified Comparison of authentic Neanderthal sequence ligated into larger concatemers. The small average by batch culture and PC and then sequenced in with orthologous human and chimpanzee genomic size of these putatively ancient Neanderthal frag. depth (fig. S1). We estimated that the coverage sequences will reveal sites at which Neanderthal is ments(52 bp)is similar to results we previously obtained in library NEl (-0002%)is significantly identical to chimpanzee but at which the human obtained from two Pleistocene cave bear libraries, in lower than that previously obtained in cave bear sequence has undergone a mutation since the 100 ande average library insert size was between metagenomic libraries prepared from samples of human-Neanderthal divergence. Determining the DO bp, whereas BLAST hits to reference similar age as the Neanderthal sample used here number of human-specific mutations is critical to carnivore genome sequences were on average 69 bp (10). The low coverage in library NEl is more dating the human-Neanderthal split. To identify (Fig. 2)(10). The small BLAST hit sizes and insert likely due to the quality of this particular library these events, we constructed alignments of ortho- sizes in both cave bear and Neanderthal metage- rather than being a general feature of ancient DNA. ogous human, Neanderthal, and chimpanzee se- nomic libraries are consistent with the degradation of Nevertheless, we were able to obtain substantial quences and identified mutations specific to each ancient genomic DNA into small fragments over amounts of authentic Neanderthal genomic se. lineage by parsimony (15). We identified 34 tens of thousands of years, illustrating the general quence from the library by deep sequencing. human-specific substitutions in 37, 636 human, condition of nuclear dna in ancient remains Comparison of orthologous Neanderthal, Neanderthal, and chimpanzee aligned positions, Sanger sequencing of individual clones from human, and chimpanzee genomic sequences. including substitutions on chromosomes X and Y library NEl suggested that it contained sufficient To ascertain whether the library NEl hominid se- that were not considered in subsequent analyse amounts of Neanderthal sequence to conduct a ran. quence we obtained was a representative sampling We also identified 171 sites with Neanderthal- However, the small percentage of clones we library sequence for which the bit score of the best panzee. It has been shown that nucleotides inO)" dom sequence survey of the Neanderthal genome. of the Neanderthal genome, we identified each NEl specific substitutions relative to human and chim- cated that we would have to sequence a very large the bit scores of all other hits for that sequence. We damaged, most frequently because of the deamina. 9 identified as containing hominid sequences indi- BLASTN hit in the human genome was higher than genuine ancient DNA are occasionally chemically number of clones to obtain enough Neanderthal then determined the distribution of all such best tion of cytosine to uracil, resulting in the incorpora genome sequence for this analysis. We therefore BLASTN hits across human chromosomes [43, 946 tion of incorect bases during PCR and sequencing carried out deep sequencing of pooled inserts from bp in 1,039 loci(table SI and Fig 3A)).The amount (16). This results in an apparent excess ofC-to-Tand library NEl using massively parallel pyrosequenc. of Neanderthal sequence aligned to each human G-to-A mismatches(which are equivalent events ing. To obtain pooled inserts, we amplified trans- chromosome was highly correlated with sequenced between the ancient sequence and the modem 5oo:E formed NEl library DNA in liquid batch culture chromosome length, indicating that the Neanderthal genomic reference sequence. We observe a signifi- and recovered library inserts from purified plasmid sequences we obtained were randomly drawn from cant excess of C-to-T and G-to-A mismatches DNA by PCR(Fig. 1). We generated 1.47 million all chromosomes( Pearson correlation coefficient=(relative to T-to-C and A-to-G mismatches)between pyrosequencing reads, compared each to the human 0.904, Fig. 3A). The hominid hits included human and NEl hominid sequences obtained by enome sequence with MEGABLAST, and ob- Y-chromosome sequences, demonstrating that our both Sanger sequencing and pyrosequencing [P<< 8w时 and sample was denved from a Neanderthal ma画0sewg4她S目 oduced 1126 unique Neanderthal loci, yielding annotations(known genes, conserved noncoding specific substitutions we observe and further 64 302 bp of Neanderthal genomic sequence (3). sequences, and repeats) associated with the aligned supports the supposition that the hominid sequences 8 Assessment of pyrosequencing data qual- human ity by comparison to Sanger sequence data. obtained by both Sanger sequencing and pyro- to-T and G-to-A events im Neanderthal genomic 5 The pyrosequencing approach generates significant sequencing showed a distribution of sequence sequence, the overall frequency of these events is mounts of sequence but does so with a higher eor rate than Sanger sequencing (In). To assess the 52 bp quality of Neanderthal pyrosequencing data, we generated consensus sequences from pyrosequenc 69 bp ing reads overlapping the same Neanderthal genomic locus and filtered out low - quality positions Neanderthal metagenomic library in the resulting cor determine whether these contigs contained addition- al erors not detectable by quality-score filtering, we Cave bear metagenomic library so used Sanger sequencing to analyze 19, 200 560 clones from the same batch culture used to generate he pyrosequencing data. This sequencing yielded 130 loci(6.2 kb) that were also represented in the pyrosequencing data Sanger sequencing and pyro- sequencing results for these 130 Neanderthal loci 20 agreed at 99.89%of ungapped positions. In addition, LLL Sanger sequencing and pyrosequencing yielded Neanderthal sequences that were nearly equally divergent from the human reference sequence (pyrosequencing =0.47% divergence, Sanger Bin size equencing=0.49%). These results indicate that Fig distribution, plotted in 10-bp bins, of Neanderthal and cave bear sequences obtained the frequency of single-base erors is probably no from nomic libraries by Sanger sequencing of individual clones. The average hit size in each eater in Neanderthal genomic sequence obtained case ated by a dotted line. 1114 17NovemBer2006Vol314ScieNcewww.sciencemag.org
BLASTcomparison to the reference human genome sequence (13, 14). In many cases, the human BLAST hit covered only part of the insert, because the direct cloning method we employed produces chimeric inserts consisting of smaller fragments ligated into larger concatemers. The small average size of these putatively ancient Neanderthal fragments (52 bp) is similar to results we previously obtained from two Pleistocene cave bear libraries, in which the average library insert size was between 100 and 200 bp, whereas BLAST hits to reference carnivore genome sequences were on average 69 bp (Fig. 2) (10). The small BLAST hit sizes and insert sizes in both cave bear and Neanderthal metagenomic libraries are consistent with the degradation of ancient genomic DNA into small fragments over tens of thousands of years, illustrating the general condition of nuclear DNA in ancient remains. Sanger sequencing of individual clones from library NE1 suggested that it contained sufficient amounts of Neanderthal sequence to conduct a random sequence survey of the Neanderthal genome. However, the small percentage of clones we identified as containing hominid sequences indicated that we would have to sequence a very large number of clones to obtain enough Neanderthal genome sequence for this analysis. We therefore carried out deep sequencing of pooled inserts from library NE1 using massively parallel pyrosequencing. To obtain pooled inserts, we amplified transformed NE1 library DNA in liquid batch culture and recovered library inserts from purified plasmid DNA by PCR (Fig. 1). We generated 1.47 million pyrosequencing reads, compared each to the human genome sequence with MEGABLAST, and obtained 7880 hits. Assembly of these reads and reanalysis of the resulting scaffolds by BLASTN produced 1126 unique Neanderthal loci, yielding 54,302 bp of Neanderthal genomic sequence (13). Assessment of pyrosequencing data quality by comparison to Sanger sequence data. The pyrosequencing approach generates significant amounts of sequence but does so with a higher error rate than Sanger sequencing (11). To assess the quality of Neanderthal pyrosequencing data, we generated consensus sequences from pyrosequencing reads overlapping the same Neanderthal genomic locus and filtered out low-quality positions in the resulting contigs (quality score < 15). To determine whether these contigs contained additional errors not detectable by quality-score filtering, we also used Sanger sequencing to analyze 19,200 clones from the same batch culture used to generate the pyrosequencing data. This sequencing yielded 130 loci (6.2 kb) that were also represented in the pyrosequencing data. Sanger sequencing and pyrosequencing results for these 130 Neanderthal loci agreed at 99.89% of ungapped positions. In addition, Sanger sequencing and pyrosequencing yielded Neanderthal sequences that were nearly equally divergent from the human reference sequence (pyrosequencing = 0.47% divergence, Sanger sequencing = 0.49%). These results indicate that the frequency of single-base errors is probably no greater in Neanderthal genomic sequence obtained from assembled quality-filtered pyrosequencing data than in sequence obtained from Sanger sequencing. The low complexity of library NE1 made these analyses possible, because it resulted in a limited number of clones in the library that were amplified by batch culture and PCR and then sequenced in depth (fig. S1). We estimated that the coverage obtained in library NE1 (~0.002%) is significantly lower than that previously obtained in cave bear metagenomic libraries prepared from samples of similar age as the Neanderthal sample used here (10). The low coverage in library NE1 is more likely due to the quality of this particular library rather than being a general feature of ancient DNA. Nevertheless, we were able to obtain substantial amounts of authentic Neanderthal genomic sequence from the library by deep sequencing. Comparison of orthologous Neanderthal, human, and chimpanzee genomic sequences. To ascertain whether the library NE1 hominid sequence we obtained was a representative sampling of the Neanderthal genome, we identified each NE1 library sequence for which the bit score of the best BLASTN hit in the human genome was higher than the bit scores of all other hits for that sequence. We then determined the distribution of all such best BLASTN hits across human chromosomes [43,946 bp in 1,039 loci (table S1 and Fig. 3A)]. The amount of Neanderthal sequence aligned to each human chromosome was highly correlated with sequenced chromosome length, indicating that the Neanderthal sequences we obtained were randomly drawn from all chromosomes (Pearson correlation coefficient = 0.904, Fig. 3A). The hominid hits included Y-chromosome sequences, demonstrating that our sample was derived from a Neanderthal male. We annotated each Neanderthal locus according to the annotations (known genes, conserved noncoding sequences, and repeats) associated with the aligned human sequence (table S2). Neanderthal sequences obtained by both Sanger sequencing and pyrosequencing showed a distribution of sequence features consistent with the known distribution of these features in the human genome (Fig. 3B). These sequences are therefore likely to represent a random sampling of the Neanderthal genome. Comparison of authentic Neanderthal sequence with orthologous human and chimpanzee genomic sequences will reveal sites at which Neanderthal is identical to chimpanzee but at which the human sequence has undergone a mutation since the human-Neanderthal divergence. Determining the number of human-specific mutations is critical to dating the human-Neanderthal split. To identify these events, we constructed alignments of orthologous human, Neanderthal, and chimpanzee sequences and identified mutations specific to each lineage by parsimony (15). We identified 34 human-specific substitutions in 37,636 human, Neanderthal, and chimpanzee aligned positions, including substitutions on chromosomes X and Y that were not considered in subsequent analyses. We also identified 171 sites with Neanderthalspecific substitutions relative to human and chimpanzee. It has been shown that nucleotides in genuine ancient DNA are occasionally chemically damaged, most frequently because of the deamination of cytosine to uracil, resulting in the incorporation of incorrect bases during PCR and sequencing (16). This results in an apparent excess of C-to-Tand G-to-A mismatches (which are equivalent events) between the ancient sequence and the modern genomic reference sequence. We observe a significant excess of C-to-T and G-to-A mismatches (relative to T-to-C and A-to-G mismatches) between human and NE1 hominid sequences obtained by both Sanger sequencing and pyrosequencing [P << 0.0005, Fisher’s exact test (Fig. 4 and table S3)]. This accounts for the large number of Neanderthalspecific substitutions we observe and further supports the supposition that the hominid sequences are Neanderthal in origin. Despite the bias toward Cto-T and G-to-A events in Neanderthal genomic sequence, the overall frequency of these events is Fig. 2. Size distribution, plotted in 10-bp bins, of Neanderthal and cave bear sequences obtained from metagenomic libraries by Sanger sequencing of individual clones. The average hit size in each case is indicated by a dotted line. 1114 17 NOVEMBER 2006 VOL 314 SCIENCE www.sciencemag.org RESEARCH ARTICLE on April 9, 2007 www.sciencemag.org Downloaded from
RESEARCH ARTICLE low (0-37% of all sites), indicating that the vast modifications. However, we did not observe these This calculation does not make use of Neanderthal- majority of human-Neanderthal-chimpanzee aligned trends in our Neanderthal sequence. The human- specific changes, because many of those events are positions are not likely to be significantly affected by Neanderthal sequence divergence in all autosomal due to DNA damage as described above. In addition, misincorporation emrs(13) alignments greater than 52 bp(the approximate we restricted our analysis to autosomal data, because The length distribution of ancient DNA frag- midpoint of the distribution shown in Fig. 2)was these represent 97% of our total data set and nents shown in Fig. 2, when combined with the similar to the divergence obtained from the whole population genetic parameters are likely to differ sequence signatures of ancient DNA described data set(0.59% versus 0.52%) The excess of C-to-t between the autosomes and sex chromosomes. Our above, offers another metric for assessing the degree and G-to-a mismatches was also maintained in the estimate uses a mutation rate obtained by setting the of modem human contamination in our library. longer alignments. These results further support the average coalescence time for human and chimpar Based on the assumption that modem contaminating supposition that the hominid sequence we obtained zee autosomes to 6.5 million years ago, a vahue that DNA fragments would be longer than authentic is predominantly Neanderthal in origin. falls within the range suggested by recent studies ancient DNAs, which is supported by the observa- Coalescence time of human and Neander-(17, 18). Inaccuracies in the human-chimpanzee tion that contaminating modem human DNA frag- thal genomic sequences. These data allowed us divergence time would shift all the time estimates ments in the cave bear libraries were on average to examine for the first time the genetic relationship and Cls presented here in proportion to the emor much longer than the cave bear sequences(116 between humans and Neanderthals using nuclear Split time of ancestral human and Nean- versus 69 bp)(o), we examined the distribution of genomic sequences(13). We first considered the derthal populations. Our estimate of the average human-Neanderthal mismatches in our data set as a average coalescence time for the autosomes between common ancestor time reflects the average time at function of alignment length If a substantial fraction the Neanderthal genomic sequence that we obtained which the Neanderthal and human reference of the hominid sequence recovered from the and the reference human genome sequence. We sequences began to diverge in the common ancestral Neanderthal sample were actually modern human observed 502 human-chimpanzee autosomal differ- population, not the actual split time of the ancestral o DNA, we would expect to see a lower human- ences in the human-Neanderthal-chimpanzee populations that gave rise to Neanderthals and Neanderthal sequence divergence in the longer sequence alignments we constructed. Based on modem humans. To estimate the actual split timeo BLASTN alignments than we observe in the entire comparison to the Neanderthal sequence, 27 of of the ancestral human and Neanderthal populations, data set, because the longer hominid sequences these differences were human-specific and therefore we developed a method that incorporated data from a ould be enriched in modem human contaminants. postdate the most recent common ancestor (MRCA) the human and Neanderthal reference sequences, as nak-specific of the human and Neanderthal es. Using this well as genotypes from 210 individuals with mismatches described above would also be expected information, our maximum likelihood estimate of genome-wide single-nucleotide polymorphism 6o individual bases in the longer modern human 706,000 years, with a% confidence interval (C) Consortium (Table 2)(9). We included the g fragments would show relatively few chemical of 468,000 to 1,015,000 years( Figs. 5A and 6)(13). Hap Map data because they indicate what proportion of sites in the Neanderthal sequence fall within the 1. Amount of unique Neanderthal sequence obtained from library NEl by Sanger sequencing spectrum of modem human variation. For example, dividual clones, as well as Sanger sequencing and pyrosequencing of clones in batch culture. n a, if the ancestral human and Neanderthal populations not applicable. split long ago, before the rise of most modem human genetic diversity captured by the Hap Map Individual clones Batch culture data, then Neanderthal sequence would almost never Sequencing chemistry Sanger eamy the derived allele, relative to the orthologous 19,200 1,474,910 chimpanzee sequence, for a human SNP (Table 2). Average insert 134bp 196bp Conversely, a more recent population split would t Average BLAST hit 52 bp 52 bp 48 bp result in Neanderthal sequ frequently carrying Unique loci the derived allele for human SNPs. Total unique hominid 6845bp 4103bp 54302b To formalize this idea, we considered an explicit opulation model for the relationship between Nean- derthals and each Hap Map population(East Asians. A B Distribution of human sequence features in 8a3500 200c 35%3.5% 2500 150 口 Known genes 口 Conserved noncoding ■ Other 50 47% 12311105746161215X179819131420182221Y Fig 3. (A) Representation of each Neanderthal chromosome in 43.9 kb amount of Near quence aligned to each Chromosomes x and y of NEl hominid sequences displaying a statistically unambiguous best are shown at ha total length to correct for their haploid state in BLAST hit to the human genome, relative to the total sequenced length of males relative to utosomes. (B)Representation of sequence features chromosome minus gaps. Chromosomes are ranked by the in the NEl hominid sequence shown in(A). www.sciencemag.orgscIencEVol31417NovembEr2006 1115
low (~0.37% of all sites), indicating that the vast majority of human-Neanderthal-chimpanzee aligned positions are not likely to be significantly affected by misincorporation errors (13). The length distribution of ancient DNA fragments shown in Fig. 2, when combined with the sequence signatures of ancient DNA described above, offers another metric for assessing the degree of modern human contamination in our library. Based on the assumption that modern contaminating DNA fragments would be longer than authentic ancient DNAs, which is supported by the observation that contaminating modern human DNA fragments in the cave bear libraries were on average much longer than the cave bear sequences (116 versus 69 bp) (10), we examined the distribution of human-Neanderthal mismatches in our data set as a function of alignment length. If a substantial fraction of the hominid sequence recovered from the Neanderthal sample were actually modern human DNA, we would expect to see a lower humanNeanderthal sequence divergence in the longer BLASTN alignments than we observe in the entire data set, because the longer hominid sequences would be enriched in modern human contaminants. The excess of damage-induced Neanderthal-specific mismatches described above would also be expected to decrease as alignment length increases, because individual bases in the longer modern human fragments would show relatively few chemical modifications. However, we did not observe these trends in our Neanderthal sequence. The humanNeanderthal sequence divergence in all autosomal alignments greater than 52 bp (the approximate midpoint of the distribution shown in Fig. 2) was similar to the divergence obtained from the whole data set (0.59% versus 0.52%). The excess of C-to-T and G-to-A mismatches was also maintained in the longer alignments. These results further support the supposition that the hominid sequence we obtained is predominantly Neanderthal in origin. Coalescence time of human and Neanderthal genomic sequences. These data allowed us to examine for the first time the genetic relationship between humans and Neanderthals using nuclear genomic sequences (13). We first considered the average coalescence time for the autosomes between the Neanderthal genomic sequence that we obtained and the reference human genome sequence. We observed 502 human-chimpanzee autosomal differences in the human-Neanderthal-chimpanzee sequence alignments we constructed. Based on comparison to the Neanderthal sequence, 27 of these differences were human-specific and therefore postdate the most recent common ancestor (MRCA) of the human and Neanderthal sequences. Using this information, our maximum likelihood estimate of the average time to the MRCA of these sequences is 706,000 years, with a 95% confidence interval (CI) of 468,000 to 1,015,000 years (Figs. 5A and 6) (13). This calculation does not make use of Neanderthalspecific changes, because many of those events are dueto DNA damage as described above. In addition, we restricted our analysis to autosomal data, because these represent 97% of our total data set and population genetic parameters are likely to differ between the autosomes and sex chromosomes. Our estimate uses a mutation rate obtained by setting the average coalescence time for human and chimpanzee autosomes to 6.5 million years ago, a value that falls within the range suggested by recent studies (17, 18). Inaccuracies in the human-chimpanzee divergence time would shift all the time estimates and CIs presented here in proportion to the error. Split time of ancestral human and Neanderthal populations. Our estimate of the average common ancestor time reflects the average time at which the Neanderthal and human reference sequences began to diverge in the common ancestral population, not the actual split time of the ancestral populations that gave rise to Neanderthals and modern humans. To estimate the actual split time of the ancestral human and Neanderthal populations, we developed a method that incorporated data from the human and Neanderthal reference sequences, as well as genotypes from 210 individuals with genome-wide single-nucleotide polymorphism (SNP) data collected by the International HapMap Consortium (Table 2) (19). We included the HapMap data because they indicate what proportion of sites in the Neanderthal sequence fall within the spectrum of modern human variation. For example, if the ancestral human and Neanderthal populations split long ago, before the rise of most modern human genetic diversity captured by the HapMap data, then Neanderthal sequence would almost never carry the derived allele, relative to the orthologous chimpanzee sequence, for a human SNP (Table 2). Conversely, a more recent population split would result in Neanderthal sequence frequently carrying the derived allele for human SNPs. To formalize this idea, we considered an explicit population model for the relationship between Neanderthals and each HapMap population (East Asians, Fig. 3. (A) Representation of each Neanderthal chromosome in 43.9 kb of NE1 hominid sequences displaying a statistically unambiguous best BLAST hit to the human genome, relative to the total sequenced length of each human chromosome minus gaps. Chromosomes are ranked by the amount of Neanderthal sequence aligned to each. Chromosomes X and Y are shown at half their total length to correct for their haploid state in males relative to the autosomes. (B) Representation of sequence features in the NE1 hominid sequence shown in (A). Table 1. Amount of unique Neanderthal sequence obtained from library NE1 by Sanger sequencing of individual clones, as well as Sanger sequencing and pyrosequencing of clones in batch culture. n.a., not applicable. Individual clones Batch culture Sequencing chemistry Sanger Sanger Pyrosequencing Reads 9984 19,200 1,474,910 Average insert 134 bp 196 bp n.a. Average BLAST hit 52 bp 52 bp 48 bp Unique loci 131 69 1126 Total unique hominid 6845 bp 4103 bp 54,302 bp sequence www.sciencemag.org SCIENCE VOL 314 17 NOVEMBER 2006 1115 RESEARCH ARTICLE on April 9, 2007 www.sciencemag.org Downloaded from
RESEARCH ARTICLE Europeans, and Yoruba) separately (fig. $3)(13). include bottlenecks for East Asians and Europeans We used simulations to estimate the probability of We assumed that Neanderthals and modem humans and modest exponential growth for Yoruba(13). ch possible data configuration at a single site as a evolved from a single ancestral population of 10,000 We then constructed a simulation-based function of the human-Neanderthal split time. The individuals and that the Neanderthal population split composite likelhood framework to estimate the time simulations used the estimated population demogra- away from the human ancestral population instan- of the human-Neanderthal population split(13, 21). phy for each Hap Map population and a probabilistic taneously at a time T in the past, with no subsequent At each site in the human-Neanderthal-chimpanzee model of SNP ascertainment to match the overall gene flow. In order to model the demographic alignments we constructed, we recorded the Nean- density and frequency spectrum of Hap Map Phase ll histories of the HapMap populations, we made use derthal and human reference alleles relative to SNPs. Likelihood curves for the split time were of models and parameters estimated by Voight et al. chimpanzee. We also determined, separately for each computed by multiplying likelihoods across sites as (20) based on resequencing data from 50 unlinked, population, whether each site was a Hap Map SNP in though they were independent. In practice, this is an ncoding regions. Those demographic models that population and if so, the allele frequency (Table 2) excellent approximation for our data because the Neanderthal sequence reads are very short and ju Fig. 4. Frequency distri- P<<0.0005 I out of 905 aligned fragments contains more than ution of 171 Neanderthal- 0.007 one human-specific allele or SNP. Bootstrap simu- lations confimed that our composite likelihood observed in 37, 636 bp of aligned human, Nean- method yields appropriate CIs for the split time(13). 80005 Using this approach, the maximum likelihood anomic seq estimates for the split time of the ancestral human lemental restitutions and Neanderthal populations are 440,000 years (such as C to t and g to=0.003 95%CI of 170,000 to 620,000 years)based on O A)are considered equiv- the European data, 390,000 years(170,000to alent events 0.002 670,000 years) for East Asians, and 290,000 years (120,000 to 570,000 years)for Yoruba(Figs. 5B and 6). These values predate the earliest known ap- 3 pearance of anatomically modem humans in Africa C toT T toc T to A T to G toT C to G 195,000 years ago(22). Because these split times C to a G to c are before the migration of modem humans out of Transitions Transversions Africa, the three population-specific estimates should Time to most recent common ancestor Human-Neanderthal split time hood curve of to the mrca Neanderthal and human eference sequences. (B likelihood estimates of the split times between lations and the Nean - 5 0333S389=308oS derthal population.(o甲 for five models that are nsistent with modern polymorphism data. Ky thousand years. Each Impact of population size on split time D Neanderthal contribution to European Ancestry curve is the smoothed 8 log likelihood relative a8=8=8 一一并二羊二一 to the maximum over all five models.For舀 each model, the text on the plot indicates x Contract v Constant the time before the o Expand present at which the ze change occurred The expansion models are less likely as com- to either con- 1000 05010015020025 stant population size Time( Kyrs) Admixture Proportion or the contraction mod els. D)The log-likelihood estimates of the contribution of the Ne- (A), B), and(D)represents a 2 log-likelihood drop, and the anderthal population to the ancestry of Europeans. The light blue line is a by this line represents the 95% CI around the maximum likelihood smoothed version of the estimates the horizontal dashed maroon line in estimates 1116 17NovemBer2006Vol314ScieNcewww.sciencemag.org
Europeans, and Yoruba) separately (fig. S3) (13). We assumed that Neanderthals and modern humans evolved from a single ancestral population of 10,000 individuals and that the Neanderthal population split away from the human ancestral population instantaneously at a time T in the past, with no subsequent gene flow. In order to model the demographic histories of the HapMap populations, we made use of models and parameters estimated by Voight et al. (20) based on resequencing data from 50 unlinked, noncoding regions. Those demographic models include bottlenecks for East Asians and Europeans and modest exponential growth for Yoruba (13). We then constructed a simulation-based composite likelihood framework to estimate the time of the human-Neanderthal population split (13, 21). At each site in the human-Neanderthal-chimpanzee alignments we constructed, we recorded the Neanderthal and human reference alleles relative to chimpanzee.We also determined, separately for each population, whether each site was a HapMap SNP in that population and if so, the allele frequency (Table 2). We used simulations to estimate the probability of each possible data configuration at a single site as a function of the human-Neanderthal split time. The simulations used the estimated population demography for each HapMap population and a probabilistic model of SNP ascertainment to match the overall density and frequency spectrum of HapMap Phase II SNPs. Likelihood curves for the split time were computed by multiplying likelihoods across sites as though they were independent. In practice, this is an excellent approximation for our data because the Neanderthal sequence reads are very short and just 1 out of 905 aligned fragments contains more than one human-specific allele or SNP. Bootstrap simulations confirmed that our composite likelihood method yields appropriate CIs for the split time (13). Using this approach, the maximum likelihood estimates for the split time of the ancestral human and Neanderthal populations are 440,000 years (95% CI of 170,000 to 620,000 years) based on the European data, 390,000 years (170,000 to 670,000 years) for East Asians, and 290,000 years (120,000 to 570,000 years) for Yoruba (Figs. 5B and 6). These values predate the earliest known appearance of anatomically modern humans in Africa ~195,000 years ago (22). Because these split times are before the migration of modern humans out of Africa, the three population-specific estimates should Fig. 5. (A) Log-likelihood curve of the time to the MRCA of the Neanderthal and human reference sequences. (B) Smoothed relative loglikelihood estimates of the split times between different human populations and the Neanderthal population. (C) Impact of changes in the ancient population size on split time estimates for five models that are consistent with modern polymorphism data. Ky, thousand years. Each curve is the smoothed log likelihood relative to the maximum over all five models. For each model, the text on the plot indicates the degree of expansion or contraction and the time before the present at which the size change occurred. The expansion models are less likely as compared to either constant population size or the contraction models. (D) The log-likelihood estimates of the contribution of the Neanderthal population to the ancestry of Europeans. The light blue line is a smoothed version of the estimates. The horizontal dashed maroon line in (A), (B), and (D) represents a 2 log-likelihood drop, and the region bounded by this line represents the 95% CI around the maximum likelihood estimates. Fig. 4. Frequency distribution of 171 Neanderthalspecific substitutions observed in 37,636 bp of aligned human, Neanderthal, and chimpanzee genomic sequence. Complementary substitutions (such as C to T and G to A) are considered equivalent events. 1116 17 NOVEMBER 2006 VOL 314 SCIENCE www.sciencemag.org RESEARCH ARTICLE on April 9, 2007 www.sciencemag.org Downloaded from
RESEARCH ARTICLE al genome by this method. More- that at least 99.5% of the Phase Il HapMap that do not (without SN Neanderthal e state in With SNPs Ancestral Derived Fossil data Sequence state Ancestral 24 8 in Neanderthal Derived Evolutionary lineage of human and Neanderthal reference sequences Evolutionary lineage of ancestral human and Neanderthal populations Sequence state in human reference Fig. 6. Divergence estimates for human and Neanderthal genomic sequences and ancestral human Without SNPs Ancestral Derived and Neanderthal populations, shown relative to dates of critical events in modern human a and Sequence sta Ancestral 35,801 1 Neanderthal evolution (2, 22, 25). The branch lengths are schematic and not to scale. y., years ago. in Neanderthal Derived161475 www.sciencemag.org science vol 314 17 november 2006 1117
all be estimates of the same actual split time. The average of these estimates, ~370,000 years, is thus a sensible point estimate for the split time. Substantial contamination with modern human DNA would cause these estimates to be artificially low, but 2% contamination, the rate suggested by mitochondrial PCR analysis of the primary extract used to construct the library, would have essentially no impact (13). Our estimates of the human-Neanderthal split time might depend heavily on the assumption that the ancestral effective population size of humans was 10,000 individuals. To address this, we explored a set of models in which the ancestral human population expanded or contracted at least 200,000 years ago (13). We found that much of the parameter space—though not the original model— could be excluded on the basis of modern human polymorphism data from Voight et al. (20). We repeated our likelihood analysis of the Neanderthal data using models incorporating ancient expansion or contraction that are consistent with modern data and found that these did not substantially change our population split time estimates (Fig. 5C). Our data include three sites at which Neanderthal carries the derived allele for a polymorphic HapMap SNP. These sites are unlikely to represent modern contamination because for two of the SNPs, the derived allele is found only in Yoruba; also, one of the SNPs lies on a fragment that contains a C-to-T transition in Neanderthals that is characteristic of chemical damage to DNA. These observations indicate that the Neanderthal sequence may often coalesce within the human ancestral tree. Based on simulations of our best-fit model for Yoruba, we estimate that Neanderthal is a true outgroup for approximately 14% (assuming a split time of 290,000 years, the Yoruba estimate) to 26% (assuming a split time of 440,000 years, the European estimate) of the autosomal genome of modern humans, although more data will be required to achieve a precise estimate. Lack of evidence for admixture between humans and Neanderthals. Because Neanderthals coexisted with modern humans in Europe, there has long been interest in whether Neanderthals might have contributed to the European gene pool. Previous studies comparing human and Neanderthal mitochondrial sequences did not find evidence of a Neanderthal genetic contribution to modern humans. However, the utility of mitochondrial data in addressing this question is limited in that it is restricted to a single locus and, due to the maternal inheritance of mitochondrial DNA, is informative only about admixture between Neanderthal females and modern human males (3–6). Moreover, it has been argued that some aspects of modern human autosomal data may be the result of modest levels of Neanderthal admixture (23). If Neanderthal admixture did indeed occur, then this could manifest in our data as an abundance of low-frequency derived alleles in Europeans where the derived allele matches Neanderthal. No site in the data set appears to be of this type. In order to formally evaluate this hypothesis, we extended our composite likelihood simulations to include a single admixture event 40,000 years ago in which a fraction p of the European gene pool was derived from Neanderthals. We fixed the human-Neanderthal split at 440,000 years ago (the split time estimate for Europeans). With these assumptions, the maximum likelihood estimate for the Neanderthal contribution to modern genetic diversity is zero. However, the 95% CI for this estimate ranges from 0 to 20%, so a definitive answer to the admixture question will require additional Neanderthal sequence data (Fig. 5D). Targeted recovery of specific Neanderthal sequences by direct genomic selection. Although we have recovered significant amounts of Neanderthal genome sequence using a metagenomic approach, hundreds of gigabases of sequence would be required to achieve reasonable coverage of a single Neanderthal genome by this method. Moreover, our results indicate that at least 99.5% of the Neanderthal sequence that would be obtained would be identical to the modern human sequence. The human-Neanderthal sequence differences that would yield great insight into human biology and evolution are thus rare events in an overwhelming background of uninformative sequence. We therefore explored the potential of metagenomic libraries to serve as substrates to recover specific Neanderthal sequences of interest by targeted methods. To this end, we developed a direct genomic selection approach to recover known and unknown sequences from metagenomic ancient DNA libraries (Fig. 7) (24). We first attempted to recover specific sequences from a Pleistocene cave bear metagenomic library we previously constructed.We designed PCR probes corresponding to 96 sequences highly conserved among mammals but not previously shown to be present in the cave bear library. We amplified these sequences from the human genome and hybridized the resulting probes to PCR-amplified cave bear library inserts produced as described above (Fig. 1). Recovered library DNAs were amplified by PCR and sequenced. We successfully recovered five targets consisting of a known enhancer of Sox9 and conserved sequences near Tbx3, Shh, Msx2, and Gdf6 (table S4). In principle, these sequences could be derived from contaminating DNA rather than the cave bear library. Critically, the captured cave bear sequences were flanked by library vector sequence, directly demonstrating that these sequences were derived from a cloned library insert and not from contaminating DNA introduced during direct selection (Fig. 7 and fig. S2). Based on these results, we attempted to recover specific Neanderthal sequences from library NE1. We focused on recovering sequences that we had previously identified by shotgun sequencing because of the low complexity of library NE1, and were able to recover 29 of 35 sequences we targeted (table S4). The authenticity of these sequences was confirmed by the presence of library vector sequences in the reads. Our Fig. 6. Divergence estimates for human and Neanderthal genomic sequences and ancestral human and Neanderthal populations, shown relative to dates of critical events in modern human and Neanderthal evolution (2, 22, 25). The branch lengths are schematic and not to scale. y.a., years ago. Table 2. Summary of all autosomal sites sequenced in Neanderthal and uniquely aligned to the human and chimpanzee reference sequences. The designations “ancestral” and “derived” indicate whether each site is, respectively, a match or mismatch with chimpanzee. Sites are partitioned into those that overlap a Phase II HapMap SNP (with SNPs) and those that do not (without SNPs). Sequence state in human reference With SNPs Ancestral Derived Sequence state in Neanderthal Ancestral 24 8 Derived 3 0 Sequence state in human reference Without SNPs Ancestral Derived Sequence state in Neanderthal Ancestral 35,801 19 Derived 161 475 www.sciencemag.org SCIENCE VOL 314 17 NOVEMBER 2006 1117 RESEARCH ARTICLE on April 9, 2007 www.sciencemag.org Downloaded from
三N products Capture heteroduplexes Hybridize in solution streptavidin Wash, elute captured DNAs coated beads Amplify by PCR with vector primers Fig. 7. Recovery of Neanderthal genomic sequences from library NEl by direct genomic selection g both previously unknown and evolution. Future Neanderthal ge Nature, in press; published online 17 May 2006 bear and known Neanderthal genomi cluding targeted and whole-geno (10.1038/atue04789) nces using direct genomic selection indicates sequencing, will provide insight into the 19. The Intemational HapMap Consortium ef al., Nature that this is a feasible strategy for purifying specific phenotypic divergence of humans both from the great 20. B. F. Voight et al., Proc. Natl. Acad. ScU.S.A.102 cloned Neanderthal sequences out of a high and from our extinct hominid relatives, 18508(2005) background of Neanderthal and contaminating allow us to explore aspects of Neanderthal biology not 21. A M. Adams, R. R. Hudson, Genetics 168, 1699(2004). s microbial DNA. This raises the possibility that, evident from artifacts and fossils. 23. V. Plagnol, J. D. Wall, PLos Gene, in press(1101371/ should multiple Neanderthal metagenomic libra ournal pgen. 0020105. eor) res be constructed from independent samples, 24. S. Bashiardes et al., Nat. Methods 2, 63(2005). direct selection could be used to recover Neander- 1. P. Mellars. Nature 432. 461(2004) 25. P Mellars, Nature 439, 931 (2006 thal sequences from several individuals to obtain 2. F.H.Smith, E. Trinkaus, P B Pettt, L. Karavanic, M. Paunovc 26. Neanderthal sequences reported in this study have been and confirm important human-specific and Nean- 3 M K inas ef at &.:19 1997) posited in GenBank under accession numbers DX935178 to DX936503. We thank E Green, M. Lovett, and members of Conclusions. The curent state of our knowl. 4. M. Krings et al., Proc. Natl. Acad. Sc. U.S.A 96, 55 edge conceming Neanderthals and their relationship 5. S. Paabo et al, Annu. Rev. Genet. 38, 645(2004) to modem humans is largely inference and speculation 6. D Serre et al., PLos Biol. 2, e57(2004) M074367. G.C. and S K were supported by grant RO1 6oo:E based on archaeological data and a limited number of 7. M. Currat, L Excoffier, PLos Biol. 2, e421(2004) HGOO2772-1(NIH) to LK.P. This wor S G Fringe et al, Science 308, 554(2005). hominid remains. In this study, we have demonstrated 9. s.G. Tringe, E. M. Rubin, Nat Rev Genet. 6,805(2005) that Neanderthal genomic sequences can be recovered 10. I.P. Noonan et al, Science 309, 597(2005) the Director, Office of Science, Office of Basic Energy b using a metagenomic hbrary based approach and that 11. M. Margulies et al, Nature 437, 376(2005) Sciences, of the U.S. Department of Energy under contract specific Neanderthal be obtained from 12. H N. Poinar et al, Science 311, 392(2006). ch libraries by direct selection. Our study thus pro- material an g metros are avallabie as supporting pporting Online Material vides a framework for the rapid recovery of Nean- 14. 5. F Altschul et al., Nucleic Acids Res. 25, 3389(1997) wwsciencemag. org/cgi/content/ull314/5802/1113/DC1 derthal sequences of interest from multiple 15. Chimpanzee Sequencing and Analysis Consortium, Nature aterials and Methods independent specimens, without the need for whole 37,69(2005) genome resequencing Such a collection of target 16. M. Hofreiter et aL, Nucleic Acids Res. 29, 4793(2001). Tables S1 to $12 Neanderthal sequences would be of immense value for understanding human and Neanderthal biology 18. N. Patterson, D. Richter, S. Gnere, E Lander, D. Reich, 16 June 2006; accepted 17August 2006 Proc. Natl. Acad. Sci. U.S.A 102, 18842(2005) 16 Jur ○ for repetitive tasks in structured environments, Resilient Machines Through one of the long-standing challenges is achieving robust performance under uncertainty (8). Most Continuous Self-Modeling robotic systems use a manually constructed mathematical model that captures the robots dynamics and is then used to plan actions(9) Josh Bongard, *t victor Zykov, Hod Lipson Although some parametric identification methods Animals sustain the ability to operate after injury by creating qualitatively different compensator xist for automatically improving these models behaviors. Although such robustness would be desirable in engineered systems, most machines fail (10-12), making accurate models is difficult for complex machines, especially when trying to autonomously, through continuous self-modeling. A four-legged machine uses actuation-sensation account for possible topological changes to the relationships to indirectly infer its own structure, and it then uses this self-model to generate forward locomotion. When a leg part is removed it adapts the self-models leading to the generation of alternative gaits. This concept may help develop more robust machines and shed sity, Ithaca, NY 14853, USA Computing and Information light on self-modeling in animals. ersity, Ithaca, NY Department of ter Science Rinas growing d human and animal behavior(1-3), se of their many practical (4-6), and physical performance(7). tTo whom correspondence should be addressed. E-mail as well as their ability industrial robots have long been used josh bongard@uvm.edu 1118 17NovemBer2006Vol314ScieNcewww.sciencemag.org
success in recovering both previously unknown cave bear and known Neanderthal genomic sequences using direct genomic selection indicates that this is a feasible strategy for purifying specific cloned Neanderthal sequences out of a high background of Neanderthal and contaminating microbial DNA. This raises the possibility that, should multiple Neanderthal metagenomic libraries be constructed from independent samples, direct selection could be used to recover Neanderthal sequences from several individuals to obtain and confirm important human-specific and Neanderthal-specific substitutions. Conclusions. The current state of our knowledge concerning Neanderthals and their relationship tomodern humansislargelyinference and speculation based on archaeological data and a limited number of hominid remains. In this study, we have demonstrated that Neanderthal genomic sequences can be recovered using a metagenomic library-based approach and that specific Neanderthal sequences can be obtained from such libraries by direct selection. Our study thus provides a framework for the rapid recovery of Neanderthal sequences of interest from multiple independent specimens, without the need for wholegenome resequencing. Such a collection of targeted Neanderthal sequences would be of immense value for understanding human and Neanderthal biology and evolution. Future Neanderthal genomic studies, including targeted and whole-genome shotgun sequencing, will provide insight into the profound phenotypic divergence of humans both from the great apes and from our extinct hominid relatives, and will allow usto explore aspects of Neanderthal biology not evident from artifacts and fossils. References and Notes 1. P. Mellars, Nature 432, 461 (2004). 2. F. H. Smith, E. Trinkaus, P. B. Pettitt, I. Karavanic, M. Paunovic, Proc. Natl. Acad. Sci. U.S.A. 96, 12281 (1999). 3. M. Krings et al., Cell 90, 19 (1997). 4. M. Krings et al., Proc. Natl. Acad. Sci. U.S.A. 96, 5581 (1999). 5. S. Pääbo et al., Annu. Rev. Genet. 38, 645 (2004). 6. D. Serre et al., PLoS Biol. 2, e57 (2004). 7. M. Currat, L. Excoffier, PLoS Biol. 2, e421 (2004). 8. S. G. Tringe et al., Science 308, 554 (2005). 9. S. G. Tringe, E. M. Rubin, Nat. Rev. Genet. 6, 805 (2005). 10. J. P. Noonan et al., Science 309, 597 (2005). 11. M. Margulies et al., Nature 437, 376 (2005). 12. H. N. Poinar et al., Science 311, 392 (2006). 13. Materials and methods are available as supporting material on Science Online. 14. S. F. Altschul et al., Nucleic Acids Res. 25, 3389 (1997). 15. Chimpanzee Sequencing and Analysis Consortium, Nature 437, 69 (2005). 16. M. Hofreiter et al., Nucleic Acids Res. 29, 4793 (2001). 17. S. Kumar, A. Filipski, V. Swarna, A. Walker, S. B. Hedges, Proc. Natl. Acad. Sci. U.S.A. 102, 18842 (2005). 18. N. Patterson, D. Richter, S. Gnerre, E. Lander, D. Reich, Nature, in press; published online 17 May 2006 (10.1038/nature04789). 19. The International HapMap Consortium et al., Nature 437, 1299 (2005). 20. B. F. Voight et al., Proc. Natl. Acad. Sci. U.S.A. 102, 18508 (2005). 21. A. M. Adams, R. R. Hudson, Genetics 168, 1699 (2004). 22. I. McDougall et al., Nature 433, 733 (2005). 23. V. Plagnol, J. D. Wall, PLoS Genet., in press (110.1371/ journal.pgen.0020105.eor). 24. S. Bashiardes et al., Nat. Methods 2, 63 (2005). 25. P. Mellars, Nature 439, 931 (2006). 26. Neanderthal sequences reported in this study have been deposited in GenBank under accession numbers DX935178 to DX936503. We thank E. Green, M. Lovett, and members of the Rubin, Pääbo, and Pritchard laboratories for insightful discussions and support. J.P.N. was supported by NIH National Research Service Award fellowship 1-F32- GM074367. G.C. and S.K. were supported by grant R01 HG002772-1 (NIH) to J.K.P. This work was supported by grant HL066681, NIH Programs for Genomic Applications, funded by the National Heart, Lung and Blood Institute; and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under contract number DE-AC02-05CH11231. Supporting Online Material www.sciencemag.org/cgi/content/full/314/5802/1113/DC1 Materials and Methods Figs. S1 to S6 Tables S1 to S12 References 16 June 2006; accepted 17 August 2006 10.1126/science.1131412 REPORTS Resilient Machines Through Continuous Self-Modeling Josh Bongard,1 *† Victor Zykov,1 Hod Lipson1,2 Animals sustain the ability to operate after injury by creating qualitatively different compensatory behaviors. Although such robustness would be desirable in engineered systems, most machines fail in the face of unexpected damage. We describe a robot that can recover from such change autonomously, through continuous self-modeling. A four-legged machine uses actuation-sensation relationships to indirectly infer its own structure, and it then uses this self-model to generate forward locomotion. When a leg part is removed, it adapts the self-models, leading to the generation of alternative gaits. This concept may help develop more robust machines and shed light on self-modeling in animals. Robotic systems are of growing interest because of their many practical applications as well as their ability to help understand human and animal behavior (1–3), cognition (4–6), and physical performance (7). Although industrial robots have long been used for repetitive tasks in structured environments, one of the long-standing challenges is achieving robust performance under uncertainty (8). Most robotic systems use a manually constructed mathematical model that captures the robot’s dynamics and is then used to plan actions (9). Although some parametric identification methods exist for automatically improving these models (10–12), making accurate models is difficult for complex machines, especially when trying to account for possible topological changes to the body, such as changes resulting from damage. Fig. 7. Recovery of Neanderthal genomic sequences from library NE1 by direct genomic selection. 1 Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853, USA. 2 Computing and Information Science, Cornell University, Ithaca, NY 14853, USA. *Present address: Department of Computer Science, University of Vermont, Burlington, VT 05405, USA. †To whom correspondence should be addressed. E-mail: josh.bongard@uvm.edu 1118 17 NOVEMBER 2006 VOL 314 SCIENCE www.sciencemag.org on April 9, 2007 www.sciencemag.org Downloaded from