Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Sty RESEAREH Origins, evolution, and phenoty pic impact of new genes Henrik Kaessmann Genome Res 2010 20: 1313-1326 originally published online July 22, 2010 Access the most recent version at doi: 10. 1101/gr. 101386.109 References This article cites 123 articles, 48 of which can be accessed free at http:/genome.cshlp.org/content/20/10/1313.fullhtmlref-list-1 Article cited in http:/iGenome.cshlp.org/content/20/10/1313.full.htmlrelated-urls Email alerting Receive free email alerts when new articles cite this article -sign up in the box at the service top right corner of the article or click here To subscribe to Genome Research go to http:/iGenome.cshlp.org/subscriptions Copyright C 2010 by Cold Spring Harbor Laboratory Press
Access the most recent version at doi:10.1101/gr.101386.109 Genome Res. 2010 20: 1313-1326 originally published online July 22, 2010 Henrik Kaessmann Origins, evolution, and phenotypic impact of new genes References http://genome.cshlp.org/content/20/10/1313.full.html#related-urls Article cited in: http://genome.cshlp.org/content/20/10/1313.full.html#ref-list-1 This article cites 123 articles, 48 of which can be accessed free at: service Email alerting top right corner of the article or click here Receive free email alerts when new articles cite this article - sign up in the box at the http://genome.cshlp.org/subscriptions To subscribe to Genome Research go to: Copyright © 2010 by Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Review. Origins evolution and phenotypic impact of new genes Henrik Kaessmann Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major con- tributor to adaptive evolutionary innovation. here, I review the origin and evolution of new genes and their functions in eukaryotes, an area of research that has made rapid progress in the past decade thanks to the genomics revolution. Indeed organisms. The array of mechanisms underlying the origin of new genes is compelling extending way beyond the tra ditionally well-studied source of gene duplication. Thus, it was shown that novel genes also regularly arose from mes- senger RNAs of ancestral genes, protein-coding genes metamorphosed into new rna genes, genomic parasites were co- opted as new genes, and that both protein and rna genes were composed from scratch (i.e from previously non- functional sequences). These mechanisms then also contributed to the formation of numerous novel chimeric gene structures. Detailed functional investigations uncovered different evolutionary pathways that led to the emergence of novel functions from these newly minted sequences and, with respect to animals attributed a potentially important role to one specific tissue-the testisin the process of gene birth Remarkably these studies also demonstrated that novel genes of the various types significantly impacted the evolution of cellular physiological, morphological, behavioral, and reproductive phenotypic traits. Consequently it is now firmly established that new genes have indeed been major con the origin of adap What is the nature of mutations underlying adaptive evolution- change, which further underscores the importance of novel gene ary innovations? In addition to subtle genetic modifications of for organismal evolution preexisting ancestral genes that can lead to differences in their In this review, I discuss in detail the different genomic sources (protein or RNA) sequences or activities, new genes with novel of new genes in eukaryotes(with a particular emphasis on animals) functions may have significantly contributed to the evolution of and assess their relative contributions and functional implications lineage- or species-specific phenotypic traits. Consequently, the in different species and evolutionary lineages. I also examine how process of the "birth"and evolution of novel genes has attracted new protein or RNA functions may evolve from newly minted gene much attention from biologists in the past. Indeed, quite re- structures and discuss the associated selective forces. I then discuss markably, considerations pertaining to the origin and functional a hypothesis that suggests a key role of one tissue-the testis-in ate of new genes trace back to a time when the molecular nature of the establishment of new functional genes. Finally, I highlight enes had not yet been established. Based on cytological obser- recent new developments in the field and identify potential future vations of chromosomal duplications, Haldane(1933)and Muller research directions. Notably, I focus on recent developments in (1935)already hypothesized in the 1930s that new gene functions this review, while referring to previous reviews and other litera may emerge from refashioned copies of old genes, highlighting ture for details pertaining to long-established concepts and earlier for the first time the potential importance of gene duplication for findings the process of new gene origination. The early notions that gene duplication provides a significant reservoir for the emep globally of new genes of genes and hence phenotypic adaptation have now be Gene duplication-raw material for the emergence confirmed (but also refined) based on numerous large- and small scale molecular studies that were facilitated by the genomics rev. Gene duplication is a very common phenomenon in all eukaryotic olution. New duplicate genes have been shown to be abundant organisms(but also in prokaryotes; for review, see Romero and in all eukaryotic genomes sequenced to date and to have evolved Palacios 1997) that may occur in several different ways ( lynch pivotal functional roles(Lynch 2007) 2007). Traditionally, DNA-mediated duplication mechanisms have However, studies from the genomics era have also accelerated been considered and widely studied in this context, although pe. the discovery of fascinating novel mechanisms underlying the culiar intronless duplicate gene copies may also arise from RNA emergence of new genes. These include the origin of new protein- sources(see further below). DNA duplication mechanisms include coding and RNA genes"from scratch"(that is, from previously small-scale events, such as the duplication of chromosomal seg. nonfunctional genomic sequences), various types of gene fusions, ments containing whole genes or gene fragments(termed seg and the formation of new genes from RNA intermediates. It is now mental duplication), which are essentially outcomes of misguided well established that all of these mechanisms have significantly recombination processes during meiosis(Fig. 1A).However, they ontributed to functional genome evolution and phenotypic also include duplication of whole genomes through various poly ploidization mechanisms(Lynch 2007; Conant and Wolfe 2008; I Henrik Kaessmann unilch Van de peer et al. 2009). Thus, duplicate gene copies can arise in 加m blished online before print. Article and publication date are at many different ways. But what is their functional fate and evolu- wwwgenome. org/cgi/doi/10. 1101/gr. 101386 tionary relevance? :0 1313-1326 e 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10: Genome Research 1313
Review Origins, evolution, and phenotypic impact of new genes Henrik Kaessmann1 Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major contributor to adaptive evolutionary innovation. Here, I review the origin and evolution of new genes and their functions in eukaryotes, an area of research that has made rapid progress in the past decade thanks to the genomics revolution. Indeed, recent work has provided initial whole-genome views of the different types of new genes for a large number of different organisms. The array of mechanisms underlying the origin of new genes is compelling, extending way beyond the traditionally well-studied source of gene duplication. Thus, it was shown that novel genes also regularly arose from messenger RNAs of ancestral genes, protein-coding genes metamorphosed into new RNA genes, genomic parasites were coopted as new genes, and that both protein and RNA genes were composed from scratch (i.e., from previously nonfunctional sequences). These mechanisms then also contributed to the formation of numerous novel chimeric gene structures. Detailed functional investigations uncovered different evolutionary pathways that led to the emergence of novel functions from these newly minted sequences and, with respect to animals, attributed a potentially important role to one specific tissue—the testis—in the process of gene birth. Remarkably, these studies also demonstrated that novel genes of the various types significantly impacted the evolution of cellular, physiological, morphological, behavioral, and reproductive phenotypic traits. Consequently, it is now firmly established that new genes have indeed been major contributors to the origin of adaptive evolutionary novelties. What is the nature of mutations underlying adaptive evolutionary innovations? In addition to subtle genetic modifications of preexisting ancestral genes that can lead to differences in their (protein or RNA) sequences or activities, new genes with novel functions may have significantly contributed to the evolution of lineage- or species-specific phenotypic traits. Consequently, the process of the ‘‘birth’’ and evolution of novel genes has attracted much attention from biologists in the past. Indeed, quite remarkably, considerations pertaining to the origin and functional fate of new genes trace back to a time when the molecular nature of genes had not yet been established. Based on cytological observations of chromosomal duplications, Haldane (1933) and Muller (1935) already hypothesized in the 1930s that new gene functions may emerge from refashioned copies of old genes, highlighting for the first time the potential importance of gene duplication for the process of new gene origination. The early notions that gene duplication provides a significant reservoir for the emergence of genes and hence phenotypic adaptation have now been globally confirmed (but also refined) based on numerous large- and smallscale molecular studies that were facilitated by the genomics revolution. New duplicate genes have been shown to be abundant in all eukaryotic genomes sequenced to date and to have evolved pivotal functional roles (Lynch 2007). However, studies from the genomics era have also accelerated the discovery of fascinating novel mechanisms underlying the emergence of new genes. These include the origin of new proteincoding and RNA genes ‘‘from scratch’’ (that is, from previously nonfunctional genomic sequences), various types of gene fusions, and the formation of new genes from RNA intermediates. It is now well established that all of these mechanisms have significantly contributed to functional genome evolution and phenotypic change, which further underscores the importance of novel genes for organismal evolution. In this review, I discuss in detail the different genomic sources of new genes in eukaryotes (with a particular emphasis on animals) and assess their relative contributions and functional implications in different species and evolutionary lineages. I also examine how new protein or RNA functions may evolve from newly minted gene structures and discuss the associated selective forces. I then discuss a hypothesis that suggests a key role of one tissue—the testis—in the establishment of new functional genes. Finally, I highlight recent new developments in the field and identify potential future research directions. Notably, I focus on recent developments in this review, while referring to previous reviews and other literature for details pertaining to long-established concepts and earlier findings. Gene duplication—raw material for the emergence of new genes Gene duplication is a very common phenomenon in all eukaryotic organisms (but also in prokaryotes; for review, see Romero and Palacios 1997) that may occur in several different ways (Lynch 2007). Traditionally, DNA-mediated duplication mechanisms have been considered and widely studied in this context, although peculiar intronless duplicate gene copies may also arise from RNA sources (see further below). DNA duplication mechanisms include small-scale events, such as the duplication of chromosomal segments containing whole genes or gene fragments (termed segmental duplication), which are essentially outcomes of misguided recombination processes during meiosis (Fig. 1A). However, they also include duplication of whole genomes through various polyploidization mechanisms (Lynch 2007; Conant and Wolfe 2008; Van de Peer et al. 2009). Thus, duplicate gene copies can arise in many different ways. But what is their functional fate and evolutionary relevance? 1 E-mail Henrik.Kaessmann@unil.ch. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.101386.109. 20:1313–1326 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10; www.genome.org Genome Research 1313 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann A time that the most probable fate of a du- plicate gene copy is pseudogenization (Ohno 1972)and that hence the majority of duplicate gene copies are eventually lost from the genome. While these fundamental hypot ses have been confirmed by a large bod of data, they have since also been signif icantly extended and refined In particu- Unequal crossing-ov 人上一 lar, in addition to the process of neo- functionalization (i.e., the emergence of new functions from one copy-Ohno's basic concept), it was proposed that the 上个:已个m斯mm be shaped by natural selection or in volve purely neutral processes(Force et al. 1999: Conant and wolfe 2008: Innan and Kondrashov 2010 Global genomic screens combined with detailed experimental scrutiny have uncovered numerous intriguing examples Transcription AAA ganisms, solidly supporting their validity Detailed analyses of young duplicate gen Reverse transcription and integration have been particularly informative, be. cause many of the details associated with the emergence of new genes from gene duplicates become obscured over longer periods of time(Long et al. 2003). A pa Figure 1. Origin of new gene copies through gene duplication. (A) DNA-based duplication. a ticularly illustrative case of neofunction- via unequal alization, arguably the most intriguing ossing-over that is mediated by transposable elements(light green). There are different fates of fate of a duplicate gene, occurred in the ication). New retroposed gene copies may arise after duplication in an African leaf-eating monkey, the protein encoded by one of evolution of promoters in their 5 flanking regions that may drive their transcription. (Pink right. the copies of the ancestral RNASEI gene ngled arrow TSS, (transparent pink box) additionally transcribed flanking sequence at the insertion rapidly adapted at specific sites to derive nutrients from bacteria in the foregut under the influence of strong positive selection( Zhang et al. 2002). Remarkably, Gene duplication and new gene functions both the duplication and subsequent adaptation of this gene were later shown to have occurred independently in a very similar At least since a famous monograph, authored by Susumu Ohno, manner in an Asian leaf-eating monkey(Zhang 2006). Thus, these vas published over 40 yr ago(Ohno 1970), the word has spread RNASEl duplications represent striking cases of convergent hat gene duplication may underlie the origin of many or even lecular evolution. They were likely facilitated by the frequent oc. most novel genes and hence represents an important process for currence of segmental duplication, which allows similar duplica- functional innovation during evolution. Essentially and consis- tion events that are highly beneficial to be repeatedly fixed during tent with earlier ideas(Haldane 1933; Muller 1935), Ohno em- evolution. More generally, the convergent RNASEI duplications phasized that the presence of a second copy of a gene would open are in line with several other recent reports that include other cases up unique new opportunities in evolution by allowing one of the of new gene formation(see below)and therefore lend further two duplicate gene copies to evolve new functional properties, support to the more general idea that adaptive genome evolution whereas the other copy is preserved to take care of the ancestral is, to some extent, predictable(Stern and Orgogozo 2009). Nu (usually important) function(the concept of neofunctionalization). merous other classical or recent examples from diverse organisms Ohno also reviewed that duplicate genes can be preserved by could be discussed here that illustrate the immense potential that natural selection for gene dosage, thus allowing an increased DNA-based gene duplication has held for phenotypic evolution production of the ancestral gene product(Ohno 1970). Finally, it in different organisms(for reviews, see Li 1997; Long et al. 2003; should be emphasized that it has been widely agreed for a long Zhang 2003; Lynch 2007; Conant and wolfe 2008) 1314 Genome
Gene duplication and new gene functions At least since a famous monograph, authored by Susumu Ohno, was published over 40 yr ago (Ohno 1970), the word has spread that gene duplication may underlie the origin of many or even most novel genes and hence represents an important process for functional innovation during evolution. Essentially and consistent with earlier ideas (Haldane 1933; Muller 1935), Ohno emphasized that the presence of a second copy of a gene would open up unique new opportunities in evolution by allowing one of the two duplicate gene copies to evolve new functional properties, whereas the other copy is preserved to take care of the ancestral (usually important) function (the concept of neofunctionalization). Ohno also reviewed that duplicate genes can be preserved by natural selection for gene dosage, thus allowing an increased production of the ancestral gene product (Ohno 1970). Finally, it should be emphasized that it has been widely agreed for a long time that the most probable fate of a duplicate gene copy is pseudogenization (Ohno 1972) and that hence the majority of duplicate gene copies are eventually lost from the genome. While these fundamental hypotheses have been confirmed by a large body of data, they have since also been significantly extended and refined. In particular, in addition to the process of neofunctionalization (i.e., the emergence of new functions from one copy—Ohno’s basic concept), it was proposed that the potentially multiple functions of an ancestral gene may be partitioned between the two daughter copies. This process was dubbed ‘‘subfunctionalization’’ and may be shaped by natural selection or involve purely neutral processes (Force et al. 1999; Conant and Wolfe 2008; Innan and Kondrashov 2010). Global genomic screens combined with detailed experimental scrutiny have uncovered numerous intriguing examples for each of these models in many organisms, solidly supporting their validity. Detailed analyses of young duplicate genes have been particularly informative, because many of the details associated with the emergence of new genes from gene duplicates become obscured over longer periods of time (Long et al. 2003). A particularly illustrative case of neofunctionalization, arguably the most intriguing fate of a duplicate gene, occurred in the course of the recent duplication of a pancreatic ribonuclease gene in leaf-eating monkeys. Zhang et al. demonstrated that after duplication in an African leaf-eating monkey, the protein encoded by one of the copies of the ancestral RNASE1 gene rapidly adapted at specific sites to derive nutrients from bacteria in the foregut under the influence of strong positive selection (Zhang et al. 2002). Remarkably, both the duplication and subsequent adaptation of this gene were later shown to have occurred independently in a very similar manner in an Asian leaf-eating monkey (Zhang 2006). Thus, these RNASE1 duplications represent striking cases of convergent molecular evolution. They were likely facilitated by the frequent occurrence of segmental duplication, which allows similar duplication events that are highly beneficial to be repeatedly fixed during evolution. More generally, the convergent RNASE1 duplications are in line with several other recent reports that include other cases of new gene formation (see below) and therefore lend further support to the more general idea that adaptive genome evolution is, to some extent, predictable (Stern and Orgogozo 2009). Numerous other classical or recent examples from diverse organisms could be discussed here that illustrate the immense potential that DNA-based gene duplication has held for phenotypic evolution in different organisms (for reviews, see Li 1997; Long et al. 2003; Zhang 2003; Lynch 2007; Conant and Wolfe 2008). Figure 1. Origin of new gene copies through gene duplication. (A) DNA-based duplication. A common type of segmental duplication—tandem duplication—is shown. It may occur via unequal crossing-over that is mediated by transposable elements (light green). There are different fates of the resulting duplicate genes. For example, one of the duplicates may acquire new functions by evolving new expression patterns and/or novel biochemical protein or RNA functions (see main text for details). (Gold and blue boxes) Exons, (black connecting lines) exon splicing, (red rightangled arrows) transcriptional start sites (TSSs), (gray tubes) nonexonic chromatin. (B) RNA-based duplication (termed retroposition or retroduplication). New retroposed gene copies may arise through the reverse transcription of messenger RNAs (mRNAs) from parental source genes. Functional retrogenes with new functional properties may evolve from these copies after acquisition or evolution of promoters in their 59 flanking regions that may drive their transcription. (Pink rightangled arrow) TSS, (transparent pink box) additionally transcribed flanking sequence at the insertion site. 1314 Genome Research www.genome.org Kaessmann Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Evolution of new genes Duplication of noncoding RNAs For example, analyses of fully sequenced genomes have Suffice it to add in this review that studies pertaining to the origin revealed high rates of origin but also loss of duplicate genes(Lynch of novel genes from duplicated DNA segments have begun to be d Conery 2003; Demuth and Hahn 2009). New duplicates are extended beyond the traditionally studied protein-coding gene estimated to be "born"at the rate of -0 001 thanks to the rapid recent advances in the genomics field. For million years in eukaryotes (Lynch and Conery 2003; Lynch 2007 example, it has become clear that microRNAs(miRNAs), small rna while the death rate of duplicates is at least an order of magnitude molecules that have emerged as major post-transcriptional regu. higher, consistent with the early notion(see above) that the fate lators( Carthew and Sontheimer 2009), have expanded and func- f most duplicates is pseudogenization(Ohno 1972). Notably, not all functional categories of genes are equally prone to expand by et al. 2006). Interestingly, several individual studies indicate that duplication. In particular, a relatively small number of gene fami- the X chromosome may provide a particularly fruitful ground for lies(1.6%3%)with functions in, for example, immunity, host the origination of new lineage-specific miRNAs(Zhanget al. 2007: defense, chemosensation, and reproduction, show rapid, selec Devor and Samollow 2008: Murchison et al. 2008: Guoet al 2009), tively driven copy number changes in various eukaryotic lineages, a pattern that may be explained by the specific sex-related forces ing to that have shaped the x, given that new X-born miRNAs appear et al. 2003; Demuth and Hahn 2009) to be However, in addition to these commonalities detailed whole. mental gene duplication also seems to play a major role for the genome investigations also suggest intriguing fundamental dif- expansion of another class of small RNAS, Piwi-interacting RNAs ferences with respect to the generation and functional fate of du- (piRNAs, Malone and Hannon 2009), which are expressed in the plicates in different evolutionary lineages. For example, careful germline and are thought to be mainly involved in transposon analyses in primates revealed a burst of segmental gene duplication control. A recent study revealed that pirNA clusters rapidly in hominoids(humans and apes), especially in humans and the genomes, a process driven by intense positive selection(Assis and of these duplicates are dispersed and mediate major gend rearrangements associated with disease. The accelerated fixation Kondrashov 2009). Segmental duplication therefore provides an rate of segmental duplicons in hominoids could, in principle, be efficient vehicle for the expansion of piRNA repertoires and hen allows organisms to swiftly evolve protection barriers against the lained by the selective benefit of newly formed genes embed- lineage-specific expansion of transposable elements. There is so far ded within these s, which outweigh deleterious effects in many cases(Marques-Bonet et al. 2009b). New gene formation in little evidence for duplication of sequences transcribed into long hominoids indeed seems to have profited from the substantial raw noncoding RNAs(lnCRNAs), an abundant class of nontranslated RNAs(>200 nucleotides [nt] in length), whose functional impact material provide led by massive segmental duplication(Marques- is only beginning to be understood (Mercer et al. 2009; Ponting Bonet et al. 2009b; see below). However, the overall accelerated et al. 2009). The paucity of known duplicated IncRNA genes is fixation rate of segmental duplicons in humans and apes is prob- perhaps mainly due to their rapid sequence divergence, which ably best explained by the reduction of the effective population which will benefit from the rapidly accumulating genomic and drift and, at the same time, rendered purifying selection less effi- transcriptomic data, will clarify the role of gene duplication in the cient, thus probably allowing disproportionately high numbers of evolution of new IncRNA genes with altered or novel functions. lightly deleterious segmental duplications to be fixed in homi- noids compared with other species with larger long-term effective Global patterns population sizes (and hence more efficient selection). This hy pothesis is consistent with other types of molecular evolutionary n spite of the numerous well-founded examples of functionally data(Keightley et al. 2005; Gherman et al. 2007) important newly minted genes that arose from duplicate gene In addition to lineage-specific selection intensities, differ copies, a more global picture of the functional relevance and ences pertaining to the mutational basis of gene duplication can adaptive value of the large number of duplicate gene copies scat- lead to different characteristics of segmental duplications between tered in genomes is only beginning to emerge. Only for some species. a good example is the finding that, in contrast to humans, whole-genome duplication (WGD) events in model organisms recently duplicated chromosomal regions in the mouse are de- (in particular yeast), global assessments of the relevance of dupli- pleted in genes and transcriptsShe et al. 2008). Detailed analyses cate genes for the emergence of new gene functions have been suggest that species-specific distributions of retrotransposons tempted( Conant and wolfe 2008). However, WGD represents a which represent major promoters of segmental duplication events special case of gene duplication, which involves specific selective ( Marques-Bonet et al. 2009a), account for much of this discrepancy pressures related to dosage balance of gene products that seem to ignificantly influence the fate of resulting gene duplicates And RNA-based duplication and the emergence ven in the case of WGD, it remains largely unclear whether gene duplications often conferred novel functions or not( Conant ar of"stripped-down"new genes Wolfe 2008) As outlined above, the traditionally studied DNA-mediated gene Thus, a more global understanding of the implications of duplication mechanisms have significantly contributed to fund gene duplication for the emergence of new gene functions and its tional genome evolution and have provided many fundamental importance relative to other mutational mechanisms that affect insights regarding new gene origination. However, new gene preexisting genes will have to await future efforts. However, a copies can also arise through an alternative, less well known closer examination of the reported general distributions and char- duplication mechanism termed retroposition or retroduplication acteristics of gene duplicates in different genomes is nevertheless(Brosius 1991; Long et al. 2003; Kaessmann et al. 2009). In this instructive mechanism, a mature messenger RNA (mRNA) that is transcribed Genome Research 1315
Duplication of noncoding RNAs Suffice it to add in this review that studies pertaining to the origin of novel genes from duplicated DNA segments have begun to be extended beyond the traditionally studied protein-coding genes, thanks to the rapid recent advances in the genomics field. For example, it has become clear that microRNAs (miRNAs), small RNA molecules that have emerged as major post-transcriptional regulators (Carthew and Sontheimer 2009), have expanded and functionally diversified during evolution by gene duplication (Hertel et al. 2006). Interestingly, several individual studies indicate that the X chromosome may provide a particularly fruitful ground for the origination of new lineage-specific miRNAs (Zhang et al. 2007; Devor and Samollow 2008; Murchison et al. 2008; Guo et al. 2009), a pattern that may be explained by the specific sex-related forces that have shaped the X, given that new X-born miRNAs appear to be predominantly expressed in male-reproductive tissues. Segmental gene duplication also seems to play a major role for the expansion of another class of small RNAs, Piwi-interacting RNAs (piRNAs, Malone and Hannon 2009), which are expressed in the germline and are thought to be mainly involved in transposon control. A recent study revealed that piRNA clusters rapidly expanded through segmental duplication in primate and rodent genomes, a process driven by intense positive selection (Assis and Kondrashov 2009). Segmental duplication therefore provides an efficient vehicle for the expansion of piRNA repertoires and hence allows organisms to swiftly evolve protection barriers against the lineage-specific expansion of transposable elements. There is so far little evidence for duplication of sequences transcribed into long noncoding RNAs (lncRNAs), an abundant class of nontranslated RNAs (>200 nucleotides [nt] in length), whose functional impact is only beginning to be understood (Mercer et al. 2009; Ponting et al. 2009). The paucity of known duplicated lncRNA genes is perhaps mainly due to their rapid sequence divergence, which may render the detection of such events difficult. Future work, which will benefit from the rapidly accumulating genomic and transcriptomic data, will clarify the role of gene duplication in the evolution of new lncRNA genes with altered or novel functions. Global patterns In spite of the numerous well-founded examples of functionally important newly minted genes that arose from duplicate gene copies, a more global picture of the functional relevance and adaptive value of the large number of duplicate gene copies scattered in genomes is only beginning to emerge. Only for some whole-genome duplication (WGD) events in model organisms (in particular yeast), global assessments of the relevance of duplicate genes for the emergence of new gene functions have been attempted (Conant and Wolfe 2008). However, WGD represents a special case of gene duplication, which involves specific selective pressures related to dosage balance of gene products that seem to significantly influence the fate of resulting gene duplicates. And even in the case of WGD, it remains largely unclear whether gene duplications often conferred novel functions or not (Conant and Wolfe 2008). Thus, a more global understanding of the implications of gene duplication for the emergence of new gene functions and its importance relative to other mutational mechanisms that affect preexisting genes will have to await future efforts. However, a closer examination of the reported general distributions and characteristics of gene duplicates in different genomes is nevertheless instructive. For example, analyses of fully sequenced genomes have revealed high rates of origin but also loss of duplicate genes (Lynch and Conery 2003; Demuth and Hahn 2009). New duplicates are estimated to be ‘‘born’’ at the rate of ;0.001–0.01 per gene per million years in eukaryotes (Lynch and Conery 2003; Lynch 2007), while the death rate of duplicates is at least an order of magnitude higher, consistent with the early notion (see above) that the fate of most duplicates is pseudogenization (Ohno 1972). Notably, not all functional categories of genes are equally prone to expand by duplication. In particular, a relatively small number of gene families (1.6%–3%) with functions in, for example, immunity, host defense, chemosensation, and reproduction, show rapid, selectively driven copy number changes in various eukaryotic lineages, thus significantly contributing to their adaptive evolution (Emes et al. 2003; Demuth and Hahn 2009). However, in addition to these commonalities, detailed wholegenome investigations also suggest intriguing fundamental differences with respect to the generation and functional fate of duplicates in different evolutionary lineages. For example, careful analyses in primates revealed a burst of segmental gene duplication in hominoids (humans and apes), especially in humans and the African apes (Marques-Bonet and Eichler 2009). Notably, many of these duplicates are dispersed and mediate major genomic rearrangements associated with disease. The accelerated fixation rate of segmental duplicons in hominoids could, in principle, be explained by the selective benefit of newly formed genes embedded within these regions, which outweigh deleterious effects in many cases (Marques-Bonet et al. 2009b). New gene formation in hominoids indeed seems to have profited from the substantial raw material provided by massive segmental duplication (MarquesBonet et al. 2009b; see below). However, the overall accelerated fixation rate of segmental duplicons in humans and apes is probably best explained by the reduction of the effective population size in the hominoid lineage. This reduction increased genetic drift and, at the same time, rendered purifying selection less efficient, thus probably allowing disproportionately high numbers of slightly deleterious segmental duplications to be fixed in hominoids compared with other species with larger long-term effective population sizes (and hence more efficient selection). This hypothesis is consistent with other types of molecular evolutionary data (Keightley et al. 2005; Gherman et al. 2007). In addition to lineage-specific selection intensities, differences pertaining to the mutational basis of gene duplication can lead to different characteristics of segmental duplications between species. A good example is the finding that, in contrast to humans, recently duplicated chromosomal regions in the mouse are depleted in genes and transcripts (She et al. 2008). Detailed analyses suggest that species-specific distributions of retrotransposons, which represent major promoters of segmental duplication events (Marques-Bonet et al. 2009a), account for much of this discrepancy. RNA-based duplication and the emergence of ‘‘stripped-down’’ new genes As outlined above, the traditionally studied DNA-mediated gene duplication mechanisms have significantly contributed to functional genome evolution and have provided many fundamental insights regarding new gene origination. However, new gene copies can also arise through an alternative, less well known duplication mechanism termed retroposition or retroduplication (Brosius 1991; Long et al. 2003; Kaessmann et al. 2009). In this mechanism, a mature messenger RNA (mRNA) that is transcribed Evolution of new genes Genome Research 1315 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann from a"parental"source gene is reverse transcribed into a com- likely to be redundant) than gene copies arising from DNA-based plementary DNA copy, which is then inserted into the genome duplication mechanisms. Indeed, a number of new retrogenes (Fig. 1B). The enzymes necessary for retroposition(in particular with intriguing functions have been identified. Detailed analyses the reverse transcriptase)are encoded by different retrotranspos. of these retrogenes uncovered novel mechanisms underlying the able elements in different species. In mammals, LINE-1 retro- emergence of new gene functions. For example, analyses of young transposons provide the required enzymatic machinery(Mathias retrogenes in primates not only revealed that retrogenes have et al. 1991; Feng et al. 1996; Esnault et al. 2000). Given that the contributed to hominoid brain evolution, but dentified dif- resulting intronless retroposed gene copies (retrocopies) only ferent molecular levels at which new genes may adapt to new contain the parental exon information (i.e,, they usually lack pa- functions Namely, in addition to evolving new spatial expressic rental introns and core promoter sequences), retrocopies were long patterns relative to the parental source genes, the proteins encoded thought to be consigned to the scrapheap of genome evolution by these retrogenes evolved new biochemical properties (Burki and and were routinely labeled as"processed pseudogenes"(Mighell Kaessmann 2004)and/or subcellular localization patterns(Burki et al. 2000). However, after anecdotal findings of individual func- and Kaessmann 2004; Rosso et al. 2008a, b). The latter process, tional retrocopies(so-called retrogenes) in the 1980s and 1990s, a dubbed subcellular adaptation or rele n, could be estab- uprising number of retrogenes could be discovered with the ad- lished and generalized as a new trajectory for the evolution of new vent of the genomics era. Notably, detailed analyses of this strip- gene functions after these observations (Marques et al. 2008 ped-down type of new genes have revealed previously unknown Kaessmann et al. 2009) echanisms underlying the appearance of new genes and their Other interesting retrogenes have recently been unveiled that functions and demonstrated that new retrogenes have contributed exemplify the sometimes unexpected and curious pathways of to the appearance of lineage-specific phenotypic innovations evolutionary change. An example is a mouse retrocopy of a ribo- Kaessmann et al. 2009 somal protein gene(Rps23), of which there are hundreds in mammalian g Sources of regulatory elements retropseudogenes, consistent with the idea that duplication of The observation of numerous functional retrogenes in various these genes is usually redundant and/or is subject to dosage bal- genomes(detailed below) immediately raises the question of how ance constraints. Yet the Rps23 retrocopy evolved a completely w function, not by changes in the pr retrocopies can obtain regulatory sequences that allow them to by being transcribed from the reverse strand and the incorporation become transcribed-a precondition for gene functionality. Stud- es that sought to address this question uncovered various sources of sequences flanking its insertion site as new(coding and of retrogene promoters and regulators and therefore also provided oding) exons(Zhang et al. 2009). This gave rise to a new protein general insights into how new genes can acquire promoters and (completely unrelated to that encoded by its parental gene), which had profound functional implications in that it conferred in- evolve new expression patterns(Kaessmann et al. 2009) First, it creased resistance in mice against the formation of Alzheimer- was shown that the expression of new retrogenes often benefits from preexisting regulatory machinery and expression capacities causing amyloid plaques. of genes in their vicinity. Thus, retrogenes profited from the open illustrates the far-reaching and immediate phenotypic conse. hearby genes, directly fused to host genes into which they inserted quences a retroduplication event may have. Parker et al.(2009) found that a retrocopy derived from a growth factor gene(fgf4)is (also see below), or captured bidirectional promoters of genes solely responsible for the short-legged phenotype characteristic of in their proximity (Vinckenbosch et al. 2006; Fablet et al. 2009; everal common dog breeds. Remarkably, the phenotypic impact Kaessmann et al. 2009). Second, retrogenes recruited CpG di- of the fsf 4 retrogene seems to be a rather direct consequence of the nucleotide-enriched proto-promoter sequences in their genomic vicinity not previously associated with other genes for their tran- FGF4 expression during bone development), given that its coding et al. 2009). Fourth, unexpectedly, retrogenes also seem to fre- immediately lead to phenotypic innovation (in this case o r s of of retrocopy insertion sites were shown to have provided retro- sequence is identical to that of its parental gene. The analy genes with regulatory potential(Zaiss and Kloetzel 1999; Fablet morphological trait)merely thro parental transcripts that gave rise to them(Okamura and Nakai 2008; Kaessmann et al. 2009). Finally, basic retrogene promoters Retrogenes and meiotic sex chromosome inactivation may sometimes have evolved de novo through small substitu- tional changes under the influence of natural selection(Betran and Numerous other illuminating cases of retrogenes known to 2007), Remarkably, the process of promoter and flies to plants have recently been described (for review, se lated exon-intron structure Kaessmann et al. 2009). However, global surveys of retroposition distances between the recruited promoters and retrogene insertion conducted in mammals and fruit flies have also identified a com- sites(Fablet et al. 2009) mon theme uniting a significant subset of new retrogenes in these species: expression and functionality in testes. while these retro- genes seem to have evolved a variety of functional roles(a process New retrogene functions hat may have a mechanistic basis and was likely influenced by Given that retrocopies usually need to acquire regulatory elements sexual selection, see below), the functions of a disproportionately for their transcription, retrocopies that eventually do become high number among them are apparen associated with the transcribed-a surprisingly frequent event (Vinckenbosch et al. transcriptional inactivation of the sex chromosomes in the male 2006) -are much more prone to evolve novel functions(and less germline during and(to a lesser extent)after meiosis (Turner 2007) 1316 Genome
from a ‘‘parental’’ source gene is reverse transcribed into a complementary DNA copy, which is then inserted into the genome (Fig. 1B). The enzymes necessary for retroposition (in particular the reverse transcriptase) are encoded by different retrotransposable elements in different species. In mammals, LINE-1 retrotransposons provide the required enzymatic machinery (Mathias et al. 1991; Feng et al. 1996; Esnault et al. 2000). Given that the resulting intronless retroposed gene copies (retrocopies) only contain the parental exon information (i.e., they usually lack parental introns and core promoter sequences), retrocopies were long thought to be consigned to the scrapheap of genome evolution and were routinely labeled as ‘‘processed pseudogenes’’ (Mighell et al. 2000). However, after anecdotal findings of individual functional retrocopies (so-called retrogenes) in the 1980s and 1990s, a surprising number of retrogenes could be discovered with the advent of the genomics era. Notably, detailed analyses of this stripped-down type of new genes have revealed previously unknown mechanisms underlying the appearance of new genes and their functions and demonstrated that new retrogenes have contributed to the appearance of lineage-specific phenotypic innovations (Kaessmann et al. 2009). Sources of regulatory elements The observation of numerous functional retrogenes in various genomes (detailed below) immediately raises the question of how retrocopies can obtain regulatory sequences that allow them to become transcribed—a precondition for gene functionality. Studies that sought to address this question uncovered various sources of retrogene promoters and regulators and therefore also provided general insights into how new genes can acquire promoters and evolve new expression patterns (Kaessmann et al. 2009). First, it was shown that the expression of new retrogenes often benefits from preexisting regulatory machinery and expression capacities of genes in their vicinity. Thus, retrogenes profited from the open chromatin state and accessory regulators (enhancers/silencers) of nearby genes, directly fused to host genes into which they inserted (also see below), or captured bidirectional promoters of genes in their proximity (Vinckenbosch et al. 2006; Fablet et al. 2009; Kaessmann et al. 2009). Second, retrogenes recruited CpG dinucleotide-enriched proto-promoter sequences in their genomic vicinity not previously associated with other genes for their transcription (Fablet et al. 2009). Third, retrotransposons upstream of retrocopy insertion sites were shown to have provided retrogenes with regulatory potential (Zaiss and Kloetzel 1999; Fablet et al. 2009). Fourth, unexpectedly, retrogenes also seem to frequently have directly inherited alternative promoters embedded in parental transcripts that gave rise to them (Okamura and Nakai 2008; Kaessmann et al. 2009). Finally, basic retrogene promoters may sometimes have evolved de novo through small substitutional changes under the influence of natural selection (Betran and Long 2003; Bai et al. 2007). Remarkably, the process of promoter acquisition sometimes involved the evolution of new 59 untranslated exon–intron structures, which span the often substantial distances between the recruited promoters and retrogene insertion sites (Fablet et al. 2009). New retrogene functions Given that retrocopies usually need to acquire regulatory elements for their transcription, retrocopies that eventually do become transcribed—a surprisingly frequent event (Vinckenbosch et al. 2006)—are much more prone to evolve novel functions (and less likely to be redundant) than gene copies arising from DNA-based duplication mechanisms. Indeed, a number of new retrogenes with intriguing functions have been identified. Detailed analyses of these retrogenes uncovered novel mechanisms underlying the emergence of new gene functions. For example, analyses of young retrogenes in primates not only revealed that retrogenes have contributed to hominoid brain evolution, but also identified different molecular levels at which new genes may adapt to new functions. Namely, in addition to evolving new spatial expression patterns relative to the parental source genes, the proteins encoded by these retrogenes evolved new biochemical properties (Burki and Kaessmann 2004) and/or subcellular localization patterns (Burki and Kaessmann 2004; Rosso et al. 2008a,b). The latter process, dubbed subcellular adaptation or relocalization, could be established and generalized as a new trajectory for the evolution of new gene functions after these observations (Marques et al. 2008; Kaessmann et al. 2009). Other interesting retrogenes have recently been unveiled that exemplify the sometimes unexpected and curious pathways of evolutionary change. An example is a mouse retrocopy of a ribosomal protein gene (Rps23), of which there are hundreds in mammalian genomes and that usually represent nonfunctional retropseudogenes, consistent with the idea that duplication of these genes is usually redundant and/or is subject to dosage balance constraints. Yet the Rps23 retrocopy evolved a completely new function, not by changes in the protein-coding sequence, but by being transcribed from the reverse strand and the incorporation of sequences flanking its insertion site as new (coding and noncoding) exons (Zhang et al. 2009). This gave rise to a new protein (completely unrelated to that encoded by its parental gene), which had profound functional implications in that it conferred increased resistance in mice against the formation of Alzheimercausing amyloid plaques. Another intriguing recent case of new retrogene formation illustrates the far-reaching and immediate phenotypic consequences a retroduplication event may have. Parker et al. (2009) found that a retrocopy derived from a growth factor gene (fgf4) is solely responsible for the short-legged phenotype characteristic of several common dog breeds. Remarkably, the phenotypic impact of the fgf4 retrogene seems to be a rather direct consequence of the gene dosage change associated with its emergence (i.e., increased FGF4 expression during bone development), given that its coding sequence is identical to that of its parental gene. The analysis of fgf4 in dogs thus strikingly illustrates that gene duplication can immediately lead to phenotypic innovation (in this case a new morphological trait) merely through gene dosage alterations. Retrogenes and meiotic sex chromosome inactivation Numerous other illuminating cases of retrogenes known to have evolved diverse functions in species ranging from primates and flies to plants have recently been described (for review, see Kaessmann et al. 2009). However, global surveys of retroposition conducted in mammals and fruit flies have also identified a common theme uniting a significant subset of new retrogenes in these species: expression and functionality in testes. While these retrogenes seem to have evolved a variety of functional roles (a process that may have a mechanistic basis and was likely influenced by sexual selection, see below), the functions of a disproportionately high number among them are apparently associated with the transcriptional inactivation of the sex chromosomes in the male germline during and (to a lesser extent) after meiosis (Turner 2007). Kaessmann 1316 Genome Research www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Evolution of new genes Thus, it now seems clear that the many mammalian retrogenes be depauperate in terms of retroposition activity, such as plants, that stem from the X have been fixed during evolution and shaped have recently unveiled a surprisingly large number of apparentl by natural selection to compensate for the transcriptional silenc. selectively constrained retrogenes (wang et ing of their parental (often housekeeping) genes during male 2009). Thus, retroduplication has contributed to the phenotypic germline silencing of the X(Bradley et al. 2004; Rohozinski and evolution of many multicellular eukaryotes, ranging from mam- Bishop 2004; Potrzebowski et al. 2008). Indeed, systematic analy. mals and insects to plants, by giving rise to many functional new ses of chromosomal positions of parental genes and their daughter genes, although this contribution has been more variable than retrocopies revealed that a larger than expected number of auto- that of the more common and widespread DNA-mediated dupli- somal retrogenes are derived from parental genes located on cation mechanisms the X in various mammals (emerson et al. 2004; Potrzebowskiet al. during and after meiosis, when their parental genes are silenced Formation of new gene structures (Potrzebowski et al. 2008). Consequently, testis functions of pa- by retrotransposon-mediated transduction rental genes can be considered to have spread or"moved"to the An alternative mode by which retrotrans could contribute autosomes, a process that was facilitated by the fact that the ret- to the formation of new gene structures was identified in the late roposition process readily transfers genes between chromosomes 1990s(Moran et al. 1999 ). The investigators showed that, in ad- (more readily so than segmental duplication, which often occurs dition to the process of retroposition, in which the retrotransposon on the same chromosome). Notably, recent work (Vibranovski derived enzymes generate copies of mature mRNAS(see section et al. 2009)indicates that meiotic sex chromosome inactivation above), Ll retrotransposon transcripts can also directly carry may also underlie the export of retrogenes from the X in Drosophila downstream flanking genomic sequences with them. In this pro- (Betran et al. 2002) ess,termed 3 transduction, the RNA transcription machinery nowing the functional basis for this so called"out of X" reads through the weak retrotransposon polyadenylation signal onset of mammalian meiotic sex chromosome silencing through stream in the 3'flanking sequence(for review, see Cordaux and assessments of the age of X-derived retrogenes. This work revealed Batzer 2009). Subsequent studies showed that many Ll and SVA that not only the mechanisms of meiotic sex chromosome si- retrotransposon insertions (-10%)are associated with 3'trans lencing but also the sex chromosomes themselves originated in the duction events, copying various genic elements into new genomic common ancestor of placental mammals and marsupials (i. e, after locations( Cordaux and Batzer 2009 and references therein).An the divergence from lineage of egg-laying monotremes), and hence interesting recent study provided initial evidence that 3'trans are younger than previously thought(Potrzebowski et al. 2008). duction may have led to the formation of new genes in primates Notably, tracing the evolutionary origin of individual X-derived (Xing et al. 2006). As part of a genome-wide analysis of SvA retrogenes also identified striking cases of independent parallel mediated transduction, Xing and colleagues identified 143 events exports of key housekeeping genes in eutherians and marsupials, that transduced sequences of various sizes. Notably, three separate which illustrates the strong selective pressures that drove genes out events transduced the entire amacil3 gene into three new ge- of the x upon the emergence of sex chromosomes. Curiously, nomic locations -7-14 million yr ago in the human/African ape a recent study revealed that the X chromosome not only exported ancestor. The novel gene copies were shown to be transcribed, but many genes but also preferentially accumulated new retrogenes it was unclear whether they have been preserved by natural se. upon therian(eutherian and marsupial) sex chromosome differ- lection(Xing et al. 2006). Thus, while the functional relevance of entiation, apparently owing to the emerging sex-related (poten- this new gene family in African apes remains unclear, this study tially antagonistic) selective forces(Potrzebowski et al. 2010) provides initial evidence that 3 transduction may represent yet another way by which retrotransposons have contributed to the Retroduplication in different evolutionary lineage functional evolution of the genome Together, these examples illustrate that new retrogenes have been conducive to the evolution of new genome functions and phe- Gene fusion-the origin of new chimeric genes notypic innovation. However, it should be noted that retro- The process of gene fusion is defined as the fusion of two pre-. position has contributed to the evolution of different eukaryotic viously separate source genes into a single transcription unit-the lineages to highly varying degrees, because of fundamental dif- so-called fusion or chimeric gene(long et al. 2003).Gene fusion is ferences related to the machinery responsible for this process. For example, the rate of retroduplication has been overall high in bound to give rise to new functions given its combinatorial nature therian mammals because of the high activity of LI retrotrans- assuming that the fusion gene is beneficial and selectively pre- posons, which provide the enzymes(reverse transcriptase and served). In agreement with this notion, a number of chimeric endonuclease)necessary for this process(Kaessmann et al. 2009). genes with important functions have been described (Long et al Thus, thousands of retrocopies and over 100 functional retrogenes 2003; Zhou and Wang 2008; Kaessmann et al. 2009). The various e ve been identified in the human genome(Vinckenbosch et al. mechanisms underlying the formation of new chimeric gene unctional retrogenes(Betran et al. 2002; Baiet al. 2007; Zhou et al. following sections using representative examples 2008). In contrast, genomes from monotreme mammals and birds propriate retroposition machinery(Hillier et al. 2004; Kaessmann A common theme underlying several of the different gene fusion et al. 2009). However, eukaryotic lineages previously thought to mechanisms is gene duplication, which provides the necessary raw Genome Research 131
Thus, it now seems clear that the many mammalian retrogenes that stem from the X have been fixed during evolution and shaped by natural selection to compensate for the transcriptional silencing of their parental (often housekeeping) genes during male germline silencing of the X (Bradley et al. 2004; Rohozinski and Bishop 2004; Potrzebowski et al. 2008). Indeed, systematic analyses of chromosomal positions of parental genes and their daughter retrocopies revealed that a larger than expected number of autosomal retrogenes are derived from parental genes located on the X in various mammals (Emerson et al. 2004; Potrzebowski et al. 2008) and that these retrogenes are specifically expressed during and after meiosis, when their parental genes are silenced (Potrzebowski et al. 2008). Consequently, testis functions of parental genes can be considered to have spread or ‘‘moved’’ to the autosomes, a process that was facilitated by the fact that the retroposition process readily transfers genes between chromosomes (more readily so than segmental duplication, which often occurs on the same chromosome). Notably, recent work (Vibranovski et al. 2009) indicates that meiotic sex chromosome inactivation may also underlie the export of retrogenes from the X in Drosophila (Betran et al. 2002). Knowing the functional basis for this so called ‘‘out of X’’ movement of genes then also allowed dating of the evolutionary onset of mammalian meiotic sex chromosome silencing through assessments of the age of X-derived retrogenes. This work revealed that not only the mechanisms of meiotic sex chromosome silencing but also the sex chromosomes themselves originated in the common ancestor of placental mammals and marsupials (i.e., after the divergence from lineage of egg-laying monotremes), and hence are younger than previously thought (Potrzebowski et al. 2008). Notably, tracing the evolutionary origin of individual X-derived retrogenes also identified striking cases of independent parallel exports of key housekeeping genes in eutherians and marsupials, which illustrates the strong selective pressures that drove genes out of the X upon the emergence of sex chromosomes. Curiously, a recent study revealed that the X chromosome not only exported many genes but also preferentially accumulated new retrogenes upon therian (eutherian and marsupial) sex chromosome differentiation, apparently owing to the emerging sex-related (potentially antagonistic) selective forces (Potrzebowski et al. 2010). Retroduplication in different evolutionary lineages Together, these examples illustrate that new retrogenes have been conducive to the evolution of new genome functions and phenotypic innovation. However, it should be noted that retroposition has contributed to the evolution of different eukaryotic lineages to highly varying degrees, because of fundamental differences related to the machinery responsible for this process. For example, the rate of retroduplication has been overall high in therian mammals because of the high activity of L1 retrotransposons, which provide the enzymes (reverse transcriptase and endonuclease) necessary for this process (Kaessmann et al. 2009). Thus, thousands of retrocopies and over 100 functional retrogenes have been identified in the human genome (Vinckenbosch et al. 2006). Fruit fly genomes have also been found to contain many functional retrogenes (Betran et al. 2002; Bai et al. 2007; Zhou et al. 2008). In contrast, genomes from monotreme mammals and birds only contain very few retrocopies and lack functional retrogenes, due to the absence of retrotransposons that could provide the appropriate retroposition machinery (Hillier et al. 2004; Kaessmann et al. 2009). However, eukaryotic lineages previously thought to be depauperate in terms of retroposition activity, such as plants, have recently unveiled a surprisingly large number of apparently selectively constrained retrogenes (Wang et al. 2006; Zhu et al. 2009). Thus, retroduplication has contributed to the phenotypic evolution of many multicellular eukaryotes, ranging from mammals and insects to plants, by giving rise to many functional new genes, although this contribution has been more variable than that of the more common and widespread DNA-mediated duplication mechanisms. Formation of new gene structures by retrotransposon-mediated transduction An alternative mode by which retrotransposons could contribute to the formation of new gene structures was identified in the late 1990s (Moran et al. 1999). The investigators showed that, in addition to the process of retroposition, in which the retrotransposonderived enzymes generate copies of mature mRNAs (see section above), L1 retrotransposon transcripts can also directly carry downstream flanking genomic sequences with them. In this process, termed 39 transduction, the RNA transcription machinery reads through the weak retrotransposon polyadenylation signal and terminates transcription by using an alternative signal downstream in the 39 flanking sequence (for review, see Cordaux and Batzer 2009). Subsequent studies showed that many L1 and SVA retrotransposon insertions (;10%) are associated with 39 transduction events, copying various genic elements into new genomic locations (Cordaux and Batzer 2009 and references therein). An interesting recent study provided initial evidence that 39 transduction may have led to the formation of new genes in primates (Xing et al. 2006). As part of a genome-wide analysis of SVAmediated transduction, Xing and colleagues identified 143 events that transduced sequences of various sizes. Notably, three separate events transduced the entire AMAC1L3 gene into three new genomic locations ;7–14 million yr ago in the human/African ape ancestor. The novel gene copies were shown to be transcribed, but it was unclear whether they have been preserved by natural selection (Xing et al. 2006). Thus, while the functional relevance of this new gene family in African apes remains unclear, this study provides initial evidence that 39 transduction may represent yet another way by which retrotransposons have contributed to the functional evolution of the genome. Gene fusion—the origin of new chimeric genes The process of gene fusion is defined as the fusion of two previously separate source genes into a single transcription unit—the so-called fusion or chimeric gene (Long et al. 2003). Gene fusion is a fascinating mechanism of new gene origination that is almost bound to give rise to new functions given its combinatorial nature (assuming that the fusion gene is beneficial and selectively preserved). In agreement with this notion, a number of chimeric genes with important functions have been described (Long et al. 2003; Zhou and Wang 2008; Kaessmann et al. 2009). The various mechanisms underlying the formation of new chimeric gene structures and their evolutionary relevance are discussed in the following sections using representative examples. DNA-mediated gene fusions A common theme underlying several of the different gene fusion mechanisms is gene duplication, which provides the necessary raw Evolution of new genes Genome Research 1317 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann material for the emergence of new fusion genes, allowing ancestral potential offered by DNA-based gene fusion events for the more gene functions to be preserved. Thus, chimeric genes often arise recent evolution of animals. However, a number of highly modular from juxtaposed pieces of duplicate gene copies through fission ancient genes, sharing exons encoding specific protein domains, and fusion processes( Fig 2A). For example, the dispersion and also attest to the functional importance of DNA-based exon shuf- shuffling of numerous segmental gene copies in hominoids fling(i.e, the exchange/fusion of individual exons) for early meta- through various recombination and translocation events has led evolution(Patthy 1999) to the formation of many mosaic gene structures, some of which Retroduplication is a mechanism that could be ex have become transcribed(Bailey et al. 2002; She et al. 2004; lend itself well for the process of gene fusion, given that it readily Marques-Bonet et al. 2009a). Among these transcribed chimeras, moves gene sequences to new locations in the genome. Indeed here are several genes with known functions (e. g, USP6, also a number of functionally relevant fusion events involving retro- known as Tre2, oncogene with testis expression; Paulding et al. genes have been described. For example, retrocopies were shown 2003)or genes that have further expanded and show signatures of to frequently have inserted into an intron of a host gene and to positive selection(e.g, RANBP2; Ciccarelli et al. 2005), suggesting have become transcribed in the form of a fusion transcript to that they evolved new beneficial functions. Juxtaposition of partial gether with host gene exons(Vinckenbosch et al. 2006; Kaessma segmental duplicates also seems to rather frequently have led to the et al. 2009). Often, these retrocopies are transcribed with only emergence of young functional genes in fruit flies, more often so 5-untranslated exons of the host gene, as alternative splice vari- than the apparently often redundant complete gene duplications ants, thus profiting from promoters from the host gene(also see Zhou et al. 2008). These observations illustrate the evolutionary above), while leaving host gene functions unaltered. However, functional coding s quence fusions of host genes and retro- genes have occurred as well. A classical example is the testis-expressed jingwei ggw) gene in Drosophila(Long and langley scribed. This gene emerged through a se- Partial duplication Partial dup ication ries of events based on the fusion of part a which provided the regulatory elements with a retrocopy of the Alcohol dehy. drogenase gene. Biochemical and evolu- tionary analysis further revealed that the igw-encoded protein evolved a new func tional role in hormone and pheromone metabolism under the influence of posi re Darwinian selection(Zhang et al. 2004). Functionally important retrogene host gene-coding fusions have also oc. 上 curred in mammals. Retrocopies from the cyclophilin A(CYPA) gene(also known a Evolution of splicing signais and transcriptional readthrough PPIA)which encodes a protein that po- tently binds retroviral capsids, were shown == to have integrated into the 3 end of the antiviral defense gene TRIM5 in a New World monkey, replacing and functionally substituting the exons encoding the origi- nal capsid-binding domain from TRIM5 AAA (Sayah et al. 200-4). Remarkably, a highly similar event was independently fixed in the old world monkey lineage(Brennan Reverse transcripton and integration et al. 2008), which illustrates the high selective benefit associated with the crea. TRIMS-CYPA gene fusions present striking nd taken Figure 2. Origin of new chimeric gene or transcript structures. (A) DNA-based (genomic)gene fu, gether, provide yet another fascinating ex- ample of convergent evolution in the field or transcription termination/polyadenylation sites).(B) Transcription-mediated gene fusion. Now fusion. Novel of new gene origination. with respect to anscript structures may arise from intergenic splicing after evolution of novel splicing signals the fusion of retrogenes with preexisting chimeric retrogenes(see also fig. 1).(Green, blue, red large boxes) Exons, (red exons, it is finally noteworthy that this (dotted lines)splicing of ancestral gene structures, (green lines)intergenic splicing that results in new plants(Wang et al. 2006; Zhuet al. 2009) chimeric transcripts while the functions and phenotypic 1318 Genome
material for the emergence of new fusion genes, allowing ancestral gene functions to be preserved. Thus, chimeric genes often arise from juxtaposed pieces of duplicate gene copies through fission and fusion processes (Fig. 2A). For example, the dispersion and shuffling of numerous segmental gene copies in hominoids through various recombination and translocation events has led to the formation of many mosaic gene structures, some of which have become transcribed (Bailey et al. 2002; She et al. 2004; Marques-Bonet et al. 2009a). Among these transcribed chimeras, there are several genes with known functions (e.g., USP6, also known as Tre2, oncogene with testis expression; Paulding et al. 2003) or genes that have further expanded and show signatures of positive selection (e.g., RANBP2; Ciccarelli et al. 2005), suggesting that they evolved new beneficial functions. Juxtaposition of partial segmental duplicates also seems to rather frequently have led to the emergence of young functional genes in fruit flies, more often so than the apparently often redundant complete gene duplications (Zhou et al. 2008). These observations illustrate the evolutionary potential offered by DNA-based gene fusion events for the more recent evolution of animals. However, a number of highly modular ancient genes, sharing exons encoding specific protein domains, also attest to the functional importance of DNA-based exon shuffling (i.e., the exchange/fusion of individual exons) for early metazoan evolution (Patthy 1999). Retroduplication is a mechanism that could be expected to lend itself well for the process of gene fusion, given that it readily moves gene sequences to new locations in the genome. Indeed, a number of functionally relevant fusion events involving retrogenes have been described. For example, retrocopies were shown to frequently have inserted into an intron of a host gene and to have become transcribed in the form of a fusion transcript together with host gene exons (Vinckenbosch et al. 2006; Kaessmann et al. 2009). Often, these retrocopies are transcribed with only 59-untranslated exons of the host gene, as alternative splice variants, thus profiting from promoters from the host gene (also see above), while leaving host gene functions unaltered. However, functional coding sequence fusions of host genes and retrogenes have occurred as well. A classical example is the testis-expressed jingwei (jgw) gene in Drosophila (Long and Langley 1993), the first young fusion gene described. This gene emerged through a series of events based on the fusion of parts of a segmental duplicate gene copy (ynd, which provided the regulatory elements) with a retrocopy of the Alcohol dehydrogenase gene. Biochemical and evolutionary analysis further revealed that the jgw-encoded protein evolved a new functional role in hormone and pheromone metabolism under the influence of positive Darwinian selection (Zhang et al. 2004). Functionally important retrogene– host gene-coding fusions have also occurred in mammals. Retrocopies from the cyclophilin A (CYPA) gene (also known as PPIA) which encodes a protein that potently binds retroviral capsids, were shown to have integrated into the 39 end of the antiviral defense gene TRIM5 in a New World monkey, replacing and functionally substituting the exons encoding the original capsid-binding domain from TRIM5 (Sayah et al. 2004). Remarkably, a highly similar event was independently fixed in the Old World monkey lineage (Brennan et al. 2008), which illustrates the high selective benefit associated with the creation of this type of chimeric gene. Thus, the TRIM5-CYPA gene fusions present striking cases of domain shuffling and, taken together, provide yet another fascinating example of convergent evolution in the field of new gene origination. With respect to the fusion of retrogenes with preexisting exons, it is finally noteworthy that this process seems to be rather prevalent in plants (Wang et al. 2006; Zhu et al. 2009). While the functions and phenotypic Figure 2. Origin of new chimeric gene or transcript structures. (A) DNA-based (genomic) gene fusion. Partial duplication (and hence fission) of ancestral source genes precedes juxtaposition of partial duplicates and subsequent fusion (presumably mediated by the evolution of novel splicing signals and/ or transcription termination/polyadenylation sites). (B) Transcription-mediated gene fusion. Novel transcript structures may arise from intergenic splicing after evolution of novel splicing signals and transcriptional readthrough from the upstream gene. New chimeric mRNAs may sometimes be reversed transcribed to yield new chimeric retrogenes (see also Fig. 1). (Green, blue, red large boxes) Exons, (red right-angled arrows) transcriptional start sites (TSSs), (black connecting lines) constitutive splicing, (dotted lines) splicing of ancestral gene structures, (green lines) intergenic splicing that results in new chimeric transcripts. Kaessmann 1318 Genome Research www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Evolution of new genes implications of the majority of these plant chimeric genes remain almost bound to provide a new function), is the emergence of new to be explored, an interesting class of functional chimeric genesgenes"from scratch. In other words, new genes arise from pre- that involve fusions of mitochondrial retroposed gene copies and viously nonfunctional genomic sequence, unrelated to any pre- nuclear genes was identified in flowering plants (Nugent and existing genic material(Fig 3) Palmer 1991; Liu et al. 2009). Specifically, it was shown that mi- tochondrial g became relocated to the nuclear genome, De novo emergence of protein-coding probably via RNA intermediates (Nugent and Palmer 1991), The de novo origin of entire protein-coding genes was long con- forming chimeras with preexisting nuclear genes. Notably, in sidered to be highly unlikely. For instance in agreement with his many cases the ancestral nuclear genes provided targeting signals contemporary gene duplication advocates, Frangois Jacob noted in for import of the mitochondrion-derived protein back into mito- an influential essay that the"probability that a functional protein chondria(Liu et al. 2009). Thus, this type of gene fusion readily would appear de novo by random association of amino acids is allowed for transfer of mitochondrial genes into the nucleus while practically zero"and that therefore the"creation of entirely new mitochondrial functions could be maintained nucleotide sequence could not be of any importance in the pro- duction of new information"Jacob 1977) Transcription-mediated gene fusions In spite of these notions, recent work has uncovered a number of new protein-coding genes that apparently arose from previously In addition to the genome-based juxtapositions and "permanent. noncoding (and nonrepetitive) DNA sequences. Probably the first exons from independent consecutive genes in the genome at the gene family that emerged in an Old World primate ancestor transcription level by intergenic splicing(Fig 2B). Given that this Johnson et al. 2001). Although the details regarding the emer- mechanism draws from exons of preexisting genes, it does not gence of the original coding sequence remain unclear, the lack of nteresting to discuss here, given that it gives rise to new tran. johnson et a. (2001) revealed that the ancestor of this gene family times be fixed as new genes in the genome through secondary massively expanded by segmental duplication in hominoids, and that the various morpheus gene copies show spectacular signatures cents(see below). Transcription-mediated gene fusion was long of positive selection in their coding sequences, suggestive of ex- ought to be exceedingly rare, but after the discovery of in- ceedingly high rates of adaptive protein evolution. Although the genome-wide surveys unearthed large numbers of transcription. determined, the strong selective pressures associated with their et al. 2007). Notably, many of these chimeras involve fusions of protein-coding exons from adjacent genes. But although their encoded proteins pression levels are sometimes relatively high(Denoeud et al Other studies have followed suit and have provided a more 2007)and individual characterizations suggest specific subcellular tailed picture of de novo gene origination. For example, 14 de lovO-originated genes have been identified in Drosophila(Levine localizations of encoded products with respect to the proteins et al. 2006: Zhou et al. 2008), the majority of which are specifically encoded by the involved partner genes(Thomson et al. 2000 Pradet-Balade et al. 2002), the functional and evolutionary po- tential of these fused transcripts remains to be explored. Also, their evolutionary origin(presumably through the emergence and fix ation of intergenic splice sites)and level of selective preservation Proto ORF with frame disruptions etween species have yet to be documented. Interestingly, how- ver, at least one of the transcription-induced chimeric mRNAs was Mutations abolish frame disruptions shown to have become fixed in the genome during evolution as a separate new gene through the process of retroposition(Fig 2B Akiva et al. 2006). Babushok et al.(2007) showed that this new tact proto- ORF retrogene(termed PIP5K1A)emerged in the common hominoid ancestor, became specifically expressed in testes, experienc Promoter acquisition and transcriptional activation itive selection, and shows significant affinity for cellular ubiquitinated proteins(reflecting a modified activity of one of the parental proteins), which suggests a new and beneficial functional role of the encoded protein in apes Origin of protein-coding genes from scratch. New coding Gene origination from scratch en reading frames (proto-ORFs: thin blue bars) acquire muta- As noted above, the origin of new genes was long believed to ntimately linked to the process of gene duplication(Ohno 1970). activation of ORFs(through acquisition of promoters located in the Consistent with this notion(and as discussed in this review), new 5'flanking region) genes were usually found to be associated with duplicated genomic ing genes(Large blue n,(pink right-angled arrow) TSS,(tra raw material in one way or another. Yet, what one would probably box untranslated ssequence. Note that the transcriptional activation intuitively associate with true gene"birth"and what could, argu- step may, alternatively, also precede the formation of complete fune ably, be considered the most intriguing mode(also because it onally relevant ORFs Genome Research 1319
implications of the majority of these plant chimeric genes remain to be explored, an interesting class of functional chimeric genes that involve fusions of mitochondrial retroposed gene copies and nuclear genes was identified in flowering plants (Nugent and Palmer 1991; Liu et al. 2009). Specifically, it was shown that mitochondrial genes became relocated to the nuclear genome, probably via RNA intermediates (Nugent and Palmer 1991), forming chimeras with preexisting nuclear genes. Notably, in many cases the ancestral nuclear genes provided targeting signals for import of the mitochondrion-derived protein back into mitochondria (Liu et al. 2009). Thus, this type of gene fusion readily allowed for transfer of mitochondrial genes into the nucleus while mitochondrial functions could be maintained. Transcription-mediated gene fusions In addition to the genome-based juxtapositions and ‘‘permanent’’ fusions of genes or gene fragments described above, recent work uncovered an alternative gene fusion mechanism that combines exons from independent consecutive genes in the genome at the transcription level by intergenic splicing (Fig. 2B). Given that this mechanism draws from exons of preexisting genes, it does not represent a true process of new gene formation, but is nevertheless interesting to discuss here, given that it gives rise to new transcription units with potentially novel functions that may sometimes be fixed as new genes in the genome through secondary events (see below). Transcription-mediated gene fusion was long thought to be exceedingly rare, but after the discovery of individual cases early in the past decade (e.g., Thomson et al. 2000), genome-wide surveys unearthed large numbers of transcriptioninduced chimeras (Akiva et al. 2006; Parra et al. 2006; Denoeud et al. 2007). Notably, many of these chimeras involve fusions of protein-coding exons from adjacent genes. But although their expression levels are sometimes relatively high (Denoeud et al. 2007) and individual characterizations suggest specific subcellular localizations of encoded products with respect to the proteins encoded by the involved partner genes (Thomson et al. 2000; Pradet-Balade et al. 2002), the functional and evolutionary potential of these fused transcripts remains to be explored. Also, their evolutionary origin (presumably through the emergence and fixation of intergenic splice sites) and level of selective preservation between species have yet to be documented. Interestingly, however, at least one of the transcription-induced chimeric mRNAs was shown to have become fixed in the genome during evolution as a separate new gene through the process of retroposition (Fig. 2B; Akiva et al. 2006). Babushok et al. (2007) showed that this new retrogene (termed PIP5K1A) emerged in the common hominoid ancestor, became specifically expressed in testes, experienced a phase of intense positive selection, and shows significant affinity for cellular ubiquitinated proteins (reflecting a modified activity of one of the parental proteins), which suggests a new and beneficial functional role of the encoded protein in apes. Gene origination from scratch As noted above, the origin of new genes was long believed to be intimately linked to the process of gene duplication (Ohno 1970). Consistent with this notion (and as discussed in this review), new genes were usually found to be associated with duplicated genomic raw material in one way or another. Yet, what one would probably intuitively associate with true gene ‘‘birth’’ and what could, arguably, be considered the most intriguing mode (also because it is almost bound to provide a new function), is the emergence of new genes ‘‘from scratch.’’ In other words, new genes arise from previously nonfunctional genomic sequence, unrelated to any preexisting genic material (Fig. 3). De novo emergence of protein-coding genes The de novo origin of entire protein-coding genes was long considered to be highly unlikely. For instance, in agreement with his contemporary gene duplication advocates, Francxois Jacob noted in an influential essay that the ‘‘probability that a functional protein would appear de novo by random association of amino acids is practically zero’’ and that therefore the ‘‘creation of entirely new nucleotide sequence could not be of any importance in the production of new information’’ (Jacob 1977). In spite of these notions, recent work has uncovered a number of new protein-coding genes that apparently arose from previously noncoding (and nonrepetitive) DNA sequences. Probably the first such case described in the literature is presented by the morpheus gene family that emerged in an Old World primate ancestor (Johnson et al. 2001). Although the details regarding the emergence of the original coding sequence remain unclear, the lack of any corresponding orthologous sequences outside of Old World primates suggest a de novo origin for this gene family. Notably, Johnson et al. (2001) revealed that the ancestor of this gene family massively expanded by segmental duplication in hominoids, and that the various morpheus gene copies show spectacular signatures of positive selection in their coding sequences, suggestive of exceedingly high rates of adaptive protein evolution. Although the precise functional roles of the morpheus genes have not yet been determined, the strong selective pressures associated with their evolution suggest important and rapidly evolving functions of the encoded proteins in humans and apes. Other studies have followed suit and have provided a more detailed picture of de novo gene origination. For example, 14 de novo-originated genes have been identified in Drosophila (Levine et al. 2006; Zhou et al. 2008), the majority of which are specifically Figure 3. Origin of protein-coding genes from scratch. New coding regions may emerge de novo from noncoding genomic sequences. First, proto-open reading frames (proto-ORFs; thin blue bars) acquire mutations (point substitutions, insertions/deletions; yellow stars) that remove, bit by bit, frame-disrupting nucleotides (red wedges). Transcriptional activation of ORFs (through acquisition of promoters located in the 59 flanking region) encoding proteins with potentially useful functions may allow for the evolution of novel protein-coding genes. (Large blue box) Functional exon, (pink right-angled arrow) TSS, (transparent pink box) untranslated 59 sequence. Note that the transcriptional activation step may, alternatively, also precede the formation of complete functionally relevant ORFs. Evolution of new genes Genome Research 1319 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann expressed in testes. This suggests that de novo gene formation may nome and the fact that useable proto-promoters (or promoters that have contributed an unexpectedly large proportion of new genes can be co-opted from other genes)and cryptic splice sites abound n this genus. Other studies have reported new genes that evolved in the genome(as also evidenced by the emergence of multi-exonic McLysaght 2009: Toll-Rieraet al 2009).For example, Knowles and of noncoding RNA(Hg. ty, Boog; see above), de novo emergence McLysaght(2009)recently identified three genes that seem to have turn out to be a rather frequent phenomenon. However, the reg. arisen from scratch on the human lineage. Detailed analyses of ulatory, sequence, and structural requirements for the function lese human-specific genes, which involved comparisons with ality of long noncoding RNAs are so far poorly understood and corresponding noncoding sequences from closely related primate hence the probability of such gene formation events is hard to relatives, revealed that a few mutational events after the separation predict. of the human and chimpanzee lineages abolished"disabling reading frame precursors (Fig 3), Protein-coding genes transformed into RNA genes allowing relatively long coding sequences to emanate in humans. The origin of a classic IncRNA gene suggests an important alter- mportantly, the functionality of these new human genes is sup- ported by evidence for translation of their coding sequences native trajectory for the origin of new IncRNAs(Fig. 4B). The Xist Together, these studies suggest that the de novo emergence of gene, well known for its crucial role in x chromosome dosage new protein-coding genes is more likely than previously thought, compensation in eutherian mammals(where it triggers transcrip- although more work is required to elucidate the functional rele. vance and potential phenotypicimplications of the reported cases. the remnants of a former protein-coding gene(Duret et al. 2006). the two ke This metamorphosis involved the loss of protein-coding capacity that must precede the birth and fixation of a new protein-coding of the precursor gene's exons and subsequent reuse of several of gene from an ancestrally noncoding DNA region(Fig 3):(1)The these exons and original promoter elements in the newly minted DNA must become transcriptionally active, and(2)it must also Xist RNA gene. But the origin of IncRNA genes from protein-coding evolve a translatable open reading frame that encodes a potentially antecedents is not confined to mammals. An intriguing example beneficial protein. The former may be readily achieved, given the from Drosophila is the spx gene, which represents a fusion of an ATP synthase gene to functionally uncharacterized exons near the in high transcriptional activity of the genome and the various sertion site(Wang et al. 2002). Remarkably, the spx ancestor lost its mechanisms that allow new genes to recruit regulatory sequences (see above). A more global assessment of the probability for the coding capacity and evolved into an RNA gene with a function in atter will have to await future studies. These will also further male courtship behavior, a process that was shaped by positive ur understanding of the evolutionary importance of de novo selection Dai et aL. 2008). These cases illustrate that the formation protein-coding gene birth relative to other mechanisms of ne gene formation. structure information and regulatory capacity. Given the constant generation of new protein-coding gene copies through gene plication and the frequent(often associated) gene death processes Origins of noncoding RNa genes during evolution, the origin of Xist and spx might exemplify Recent transcriptome studies have unveiled an unexpectedly rich a potentially common mechanism repertoire of noncoding RNA species, which, in mammals, are derived from hundreds of small and thousands of IncRNA loci Small RNAs Carthew and Sontheimer 2009; Ponting et al. 2009). As already The birth of small RNAs also seems to have benefited from erst- noted above, it is known that at least miRNA and piRNA genes while protein-coding gene material. For example, two primate proliferated and diversified via gene duplication (for lncRNAs mirNa genes were shown to have arisen from retropseudogenes, there is so far little evidence). But how did the original noncoding a process that apparently profited from the fact that the pseud RNA genes arise? what are their ancestral precursors? Could th also have evolved de novo from previously nonfunctional geno- genes provided sequences of the potential target genes(the retro- mic sequence, akin to the protein-coding genes described above? eudogenes' parental genes)and regulatory elements (Devor 2006). Similarly, but on a larger scale, it was found that mamma- Recent work has started to provide some pertinent answers to these lian retropseudogenes seem to frequently encode small interfering RNAs that may play important roles in the regulation of their pa Long noncoding RNa origination from scratch rental source genes in the germline(Tam et al. 2008; Watanabe et al. 2008). A recent pioneering study dissected the origin and functional im- plications of a multi-exonic lncRNA in mice(Heinen et al. 2009). to have arisen throu he transcriptional activation of a region containing preexisting Parasitic elements of the genome, such as transposons and en- cryptic splice sites in post-meiotic testis cells(spermatidsand was dogenous retroviruses, have indirectly contributed to the func. fixed by a selective sweep in Mus musculus musculus populations. tional evolution of genomes in many ways. For example, given Remarkably, knocking out Pldi led to reduced sperm motility and that transposable elements are key mediators of segmental dupl reduced testis weight, suggesting that Pldi contributed to enhanced cation(by stimulating various recombination events; Fig. 1A; fertility of the mice carrying it. Gene expression analyses indicate Marques-Bonet et al. 2009a)and provide the core machinery that the molecular basis of this phenotype is related to regulatory underlying retroduplication(see above), they represent primary changes at the chromatin level induced by this new RNa gene, in promoters of new gene birth. But, interestingly, genomic parasites line with the notion that lncRNA often exert regulatory functions have also more directly contributed to the evolution of new genes (Ponting et al. 2009). Given the pervasive transcription of the ge in their host genomes, as summarized in the following sections. 1320 Genome
expressed in testes. This suggests that de novo gene formation may have contributed an unexpectedly large proportion of new genes in this genus. Other studies have reported new genes that evolved de novo in yeast and primates (Cai et al. 2008; Knowles and McLysaght 2009; Toll-Riera et al. 2009). For example, Knowles and McLysaght (2009) recently identified three genes that seem to have arisen from scratch on the human lineage. Detailed analyses of these human-specific genes, which involved comparisons with corresponding noncoding sequences from closely related primate relatives, revealed that a few mutational events after the separation of the human and chimpanzee lineages abolished ‘‘disabling’’ nucleotides in the ancestral open reading frame precursors (Fig. 3), allowing relatively long coding sequences to emanate in humans. Importantly, the functionality of these new human genes is supported by evidence for translation of their coding sequences. Together, these studies suggest that the de novo emergence of new protein-coding genes is more likely than previously thought, although more work is required to elucidate the functional relevance and potential phenotypic implications of the reported cases. More generally, the available studies illustrate the two key events that must precede the birth and fixation of a new protein-coding gene from an ancestrally noncoding DNA region (Fig. 3): (1) The DNA must become transcriptionally active, and (2) it must also evolve a translatable open reading frame that encodes a potentially beneficial protein. The former may be readily achieved, given the high transcriptional activity of the genome and the various mechanisms that allow new genes to recruit regulatory sequences (see above). A more global assessment of the probability for the latter will have to await future studies. These will also further our understanding of the evolutionary importance of de novo protein-coding gene birth relative to other mechanisms of new gene formation. Origins of noncoding RNA genes Recent transcriptome studies have unveiled an unexpectedly rich repertoire of noncoding RNA species, which, in mammals, are derived from hundreds of small and thousands of lncRNA loci (Carthew and Sontheimer 2009; Ponting et al. 2009). As already noted above, it is known that at least miRNA and piRNA genes proliferated and diversified via gene duplication (for lncRNAs, there is so far little evidence). But how did the original noncoding RNA genes arise? What are their ancestral precursors? Could they also have evolved de novo from previously nonfunctional genomic sequence, akin to the protein-coding genes described above? Recent work has started to provide some pertinent answers to these questions. Long noncoding RNA origination from scratch A recent pioneering study dissected the origin and functional implications of a multi-exonic lncRNA in mice (Heinen et al. 2009). The gene expressing this RNA, Pldi, seems to have arisen through the transcriptional activation of a region containing preexisting cryptic splice sites in post-meiotic testis cells (spermatids) and was fixed by a selective sweep in Mus musculus musculus populations. Remarkably, knocking out Pldi led to reduced sperm motility and reduced testis weight, suggesting that Pldi contributed to enhanced fertility of the mice carrying it. Gene expression analyses indicate that the molecular basis of this phenotype is related to regulatory changes at the chromatin level induced by this new RNA gene, in line with the notion that lncRNA often exert regulatory functions (Ponting et al. 2009). Given the pervasive transcription of the genome and the fact that useable proto-promoters (or promoters that can be co-opted from other genes) and cryptic splice sites abound in the genome (as also evidenced by the emergence of multi-exonic retrogenes; Kaessmann et al. 2009; see above), de novo emergence of noncoding RNA (Fig. 4A) genes as exemplified by Pldi might turn out to be a rather frequent phenomenon. However, the regulatory, sequence, and structural requirements for the functionality of long noncoding RNAs are so far poorly understood and hence the probability of such gene formation events is hard to predict. Protein-coding genes transformed into RNA genes The origin of a classic lncRNA gene suggests an important alternative trajectory for the origin of new lncRNAs (Fig. 4B). The Xist gene, well known for its crucial role in X chromosome dosage compensation in eutherian mammals (where it triggers transcriptional inactivation of one female X chromosome), emanated from the remnants of a former protein-coding gene (Duret et al. 2006). This metamorphosis involved the loss of protein-coding capacity of the precursor gene’s exons and subsequent reuse of several of these exons and original promoter elements in the newly minted Xist RNA gene. But the origin of lncRNA genes from protein-coding antecedents is not confined to mammals. An intriguing example from Drosophila is the spx gene, which represents a fusion of an ATP synthase gene to functionally uncharacterized exons near the insertion site (Wang et al. 2002). Remarkably, the spx ancestor lost its coding capacity and evolved into an RNA gene with a function in male courtship behavior, a process that was shaped by positive selection (Dai et al. 2008). These cases illustrate that the formation of new lncRNA genes may directly draw from previous gene structure information and regulatory capacity. Given the constant generation of new protein-coding gene copies through gene duplication and the frequent (often associated) gene death processes during evolution, the origin of Xist and spx might exemplify a potentially common mechanism. Small RNAs The birth of small RNAs also seems to have benefited from erstwhile protein-coding gene material. For example, two primate miRNA genes were shown to have arisen from retropseudogenes, a process that apparently profited from the fact that the pseudogenes provided sequences of the potential target genes (the retropseudogenes’ parental genes) and regulatory elements (Devor 2006). Similarly, but on a larger scale, it was found that mammalian retropseudogenes seem to frequently encode small interfering RNAs that may play important roles in the regulation of their parental source genes in the germline (Tam et al. 2008; Watanabe et al. 2008). New genes from domesticated genomic parasites Parasitic elements of the genome, such as transposons and endogenous retroviruses, have indirectly contributed to the functional evolution of genomes in many ways. For example, given that transposable elements are key mediators of segmental duplication (by stimulating various recombination events; Fig. 1A; Marques-Bonet et al. 2009a) and provide the core machinery underlying retroduplication (see above), they represent primary promoters of new gene birth. But, interestingly, genomic parasites have also more directly contributed to the evolution of new genes in their host genomes, as summarized in the following sections. Kaessmann 1320 Genome Research www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Evolution of new genes A encoded proteins were co-opted to me- diate crucial functions in placentation That is, they are essential for the devel- evil an exterio structure of the placenta that is essential for proper nutrient and waste exchange between mother and fetus. Thus, the eu- therian placenta, a recent evolutionary innovation, appears to have provided a particularly fruitful ground for the emer- gence of new domesticated genes with beneficial functions, a view that is further supported by the observation that tv Formation of functional RNA gene Rtil also known as Peg1l) have similarly adopted key functional roles in the murine placenta(Onoet al. 2006; Sekita et al. 2008) However. other functional roles have been assigned to"tamed"genomic para- sites as well. For instance. a recent study traced the birth of a new transcription factor gene(Zbedo) back to the domestica- of protein function and pseudogenization tion of a DNA transposon in the common ancestor of eutherians (Markljung et al. 2009). ZBED6 has evolved key regulatory roles in muscle growth, but, interestingly may affect the expression of thousands of other genes that control fundamental nd therefore could sequences) underlie the evolution of a completely new Noncoding RNAs from transposable elements In addition to various other protein-coding Figure 4. Evolutionary origins of long noncoding RNA genes. (A) De novo emergence. In this sce. genes that arose on the basis of transpos- d(thin red box) through the able element sequences in diverse taxa(i.e, acquisition/activation of a proto-promoter sequence (right-angled arrows). The transcriptional activa. vertebrates, fruit flies, and plants; Volff thone ding RNA we nes heate red bo t xons, thin tack li aes splicing red nighit-a mled aows were shown to represent "reincamated rise, a process that may draw from regulatory elements and other sequences(splicing signals, exon these genes evolved independently from Protein-coding exons, (red boxes) RNA exons,(transparent boxes) pseudogenized sne.( Blue boxes) retrotransposons in rodents and anthro- sequences, etc. )from the ancestral protein-coding ons,(thin black lines)splicing, (dotted lines) lost ancestral splicing capacity, (red right-angled arrows) TSSs poid primates(Brosius 1999), they adapted to similar roles in translational regulation the brain(Cao et al. 2006). while ca of lncRNAs that were derived from transposon ancestors are so far scarce, new small RNA genes seem to rather frequently have emerged It has been known for quite some time that transposable elements from transposable elements. For example, retrotransposon conver- have frequently been incorporated into genes as new exons, a pro- sions have given rise to dozens of known lineage-specific miRNAs cess frequently associated with alternative splicing(Sorek 2007). in mammals(Smalheiser and Torvik 2005; Piriyapongsa et al. 2007) However, the functional significance of these "exonization"even Finally, the germline-expressed piRNAs and endo-siRNAs should also has remained elusive. More strikingly, a number of new genes that be mentioned in thi were, by and large, entirely derived from genome "parasites"and from the various lineage-specific transposable elements that they evolved beneficial functions for the host organism have been then control (Malone and Hannon 2009) dentified in recent years(Volff 2006; Feschotte and Pritham 2007) Examples for such"domesticated"parasites are the syncytin genes, which stem from envelope genes of endogenous retroviruses and Horizontal gene transfer originated independently in primates, rodents, and lagomorphs Horizontal gene transfer(HGT; also known as lateral gene transfer) (Fig. 5: Miet al. 2000; Dupressoiret al. 2009; Heidmann et al. 2009). is the process by which an organism incorporates genetic material Remarkably, in all of these mammalian lineages, the syncytin- from another organism without being a direct descendant of that ome research 1321
Protein-coding genes from genome parasites It has been known for quite some time that transposable elements have frequently been incorporated into genes as new exons, a process frequently associated with alternative splicing (Sorek 2007). However, the functional significance of these ‘‘exonization’’ events has remained elusive. More strikingly, a number of new genes that were, by and large, entirely derived from genome ‘‘parasites’’ and evolved beneficial functions for the host organism have been identified in recent years (Volff 2006; Feschotte and Pritham 2007). Examples for such ‘‘domesticated’’ parasites are the syncytin genes, which stem from envelope genes of endogenous retroviruses and originated independently in primates, rodents, and lagomorphs (Fig. 5; Mi et al. 2000; Dupressoir et al. 2009; Heidmann et al. 2009). Remarkably, in all of these mammalian lineages, the syncytinencoded proteins were co-opted to mediate crucial functions in placentation. That is, they are essential for the development of the ‘‘syncytium,’’ an exterior structure of the placenta that is essential for proper nutrient and waste exchange between mother and fetus. Thus, the eutherian placenta, a recent evolutionary innovation, appears to have provided a particularly fruitful ground for the emergence of new domesticated genes with beneficial functions, a view that is further supported by the observation that two retrotransposon-derived genes (Peg10 and Rt11 [also known as Peg11]) have similarly adopted key functional roles in the murine placenta (Ono et al. 2006; Sekita et al. 2008). However, other functional roles have been assigned to ‘‘tamed’’ genomic parasites as well. For instance, a recent study traced the birth of a new transcription factor gene (Zbed6) back to the domestication of a DNA transposon in the common ancestor of eutherians (Markljung et al. 2009). ZBED6 has evolved key regulatory roles in muscle growth, but, interestingly, may affect the expression of thousands of other genes that control fundamental biological processes and therefore could underlie the evolution of a completely new regulatory network in placental mammals. Noncoding RNAs from transposable elements In addition to various other protein-coding genes that arose on the basis of transposable element sequences in diverse taxa (i.e., vertebrates, fruit flies, and plants; Volff 2006), several long and small RNA genes were shown to represent ‘‘reincarnated’’ retrotransposons. This process is exemplified by the origin of the brain cytoplasmic lncRNA genes (BC1 and BC200). Although these genes evolved independently from retrotransposons in rodents and anthropoid primates (Brosius 1999), they adapted to similar roles in translational regulation in the brain (Cao et al. 2006). While cases of lncRNAs that were derived from transposon ancestors are so far scarce, new small RNA genes seem to rather frequently have emerged from transposable elements. For example, retrotransposon conversions have given rise to dozens of known lineage-specific miRNAs in mammals (Smalheiser and Torvik 2005; Piriyapongsa et al. 2007). Finally, the germline-expressed piRNAs and endo-siRNAs should also be mentioned in this context, because they are frequently derived from the various lineage-specific transposable elements that they then control (Malone and Hannon 2009). Horizontal gene transfer Horizontal gene transfer (HGT; also known as lateral gene transfer) is the process by which an organism incorporates genetic material from another organism without being a direct descendant of that Figure 4. Evolutionary origins of long noncoding RNA genes. (A) De novo emergence. In this scenario, previously nonfunctional genomic sequence becomes transcribed (thin red box) through the acquisition/activation of a proto-promoter sequence (right-angled arrows). The transcriptional activation may be followed or preceded by the evolution of (proto-) splice sites (light blue stars). Together, these events allow for the formation of potentially functional and selectively beneficial multi-exonic noncoding RNA genes. (Large red boxes) Exons, (thin black lines) splicing, (red right-angled arrows) TSSs. (B) Origin of noncoding RNA gene from ancestral protein-coding gene. In this process, the original (functionally redundant) protein-coding gene loses its function and becomes a pseudogene. After or during loss of protein function and coding exon decay, a new functional noncoding RNA gene may arise, a process that may draw from regulatory elements and other sequences (splicing signals, exon sequences, polyadenylation sequences, etc.) from the ancestral protein-coding gene. (Blue boxes) Protein-coding exons, (red boxes) RNA exons, (transparent boxes) pseudogenized exons, (thin black lines) splicing, (dotted lines) lost ancestral splicing capacity, (red right-angled arrows) TSSs. Evolution of new genes Genome Research 1321 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press