Abstract
Transposable elements (TE) constitute one of the most variable genomic features among vertebrates, impacting genome size, structure, and composition. Despite their important role in shaping genomic diversity, they have mostly been studied in mammals, which display one of the least diverse genomes in terms of TE diversity. Recent new resources in reptilian genomics have opened a broader perspective about TE evolution in amniotes. We discuss these recent results by showing that TE diversity is high in reptiles, particularly in squamates, with strong heterogeneity in the number of TE classes retained in each lineage, even at short evolutionary scales. More research is needed to uncover the exact mechanisms that regulate TE proliferation in reptiles and to what extent these selfish elements can play a role in local adaptation or in the emergence of barriers to gene flow.
Vertebrate genomes differ considerably in size, structure, and composition, although the number of protein-coding genes is similar across vertebrate lineages. One of the genomic features that show the most variation among vertebrates is the abundance and diversity of transposable elements (TEs) [Tollis and Boissinot, 2012; Sotero-Caio et al., 2017]. TE is an umbrella term that covers a large diversity of mobile DNA sequences that have the ability to replicate and multiply in the genome of their host. TEs have had a profound impact on the evolution of genomic size and structure in vertebrates. The abundance of TEs is indeed one of the main determinants of haploid genome size variation [Kidwell, 2002], and the differential abundance of TEs across genomes contributes indirectly to other genomic features such as regional variations in base composition. In addition, TEs can be a source of genetic novelties that are advantageous to the host [Warren et al., 2015], but TEs can also cause genetic defects either directly (e.g., by inserting into genes) or indirectly (e.g., by mediating ectopic recombination events) and impose a substantial fitness cost on their host [Pasyukova et al., 2004; Boissinot et al., 2006; Ruggiero et al., 2017].
Although it was known since the 1960s that a large fraction of all vertebrate genomes consist of repetitive and mobile sequences [Britten and Kohne, 1968], our knowledge of TE diversity has been heavily biased towards mammals since then, and this situation has remained until very recently. With the exception of the fugu [Aparicioet al., 2002], which was chosen because of its inordinately small genome, the first sequenced vertebrate genomes were mammals including human and mouse [Lander et al., 2001; Mouse Genome Sequencing Consortium et al., 2002], soon followed by a number of model species and species of economic interest, including the chicken genome in 2004 [International Chicken Genome Sequencing Consortium, 2004]. It took 10 more years after the sequencing of the first mammalian genome, for the first nonavian reptile genome, the green anole Anolis carolinensis, to become available [Alföldi et al., 2011]. Even today the number of mammalian genomes publicly available far exceeds the number of reptile genomes, despite the fact that there are about 4 times as many reptile species as there are mammalian species. The current sampling of amphibians is even less representative with only 3 genomes available for a group that includes approximately 7,000 species. From the point of view of TE's abundance and diversity, mammals have extremely derived genomes and, among all vertebrates, the genome of mammals is probably the most different from the ancestral amniote genomes [Tollis and Boissinot, 2012; Sotero-Caio et al., 2017]. Mammalian genomes are dominated by a single type of TE, called LINE-1 or L1, which has evolved as a single lineage since the origin of placental mammals. This extremely low TE diversity is in fact unique among vertebrates, and the investigation of fish, amphibian, and reptile genomes revealed a remarkable diversity of mobile elements [Chalopin et al., 2015], which seems to be the rule in vertebrates and most likely represent the ancestral state of vertebrates in general, and amniotes in particular. It is thus crucial to expand our understanding of TE evolutionary dynamics outside of mammals and in particular in reptiles, the sister group of mammals, to understand the evolutionary forces that have affected genomic structure and function, to reconstruct the repetitive landscape of the ancestral amniote genome, and finally to decipher the mechanisms responsible for the evolution of genome size.
It has been 10 years since the remarkable review of Kordis in Cytogenetics and Genome Research [Kordis, 2009], to our knowledge the only review solely dedicated to TEs in reptiles. When this review was published, a single bird and no nonavian reptile genome had been fully sequenced. Since then, many reptile genomes, representative of the main extant lineages, have been sequenced, and our understanding of TE evolution in reptiles has considerably improved. Here, we review the recent literature on reptile TEs with a focus on the contribution of complete genome sequences, and in particular the contribution of the A. carolinensis genome, the first nonavian reptile genome to have been sequenced [Alföldi et al., 2011]. We will first introduce the mobilome (the fraction of a genome made of mobile sequences), and then we will describe the patterns of abundance and diversity of TEs in the main groups of reptiles. In the third section of the review, we will discuss the impact TEs have had on reptile genomes in terms of cost and benefit and how, in the long term, TEs have affected genome size evolution. We will conclude by discussing the current limits to our knowledge, and we will propose research directions for the future.
What Constitutes the Mobilome?
Classification of Transposable Elements
Typically, TEs are classified into 2 groups based on their mode of transposition: type I, which includes retrotransposons and endogenous retroviruses, and type II, which includes the DNA transposons. Type I elements have an RNA intermediate in their life cycle. Following transcription of the genomic insert, the RNA is reversed transcribed into DNA, by an enzyme called reverse transcriptase. Within type I, there are 2 classes, based on the presence or absence of long terminal repeats (LTR). Transposition in LTR-retrotransposons is similar to that of retroviruses, where reverse transcription occurs in the cytoplasm and the cDNA is subsequently inserted in the host genome. The most abundant LTR-retrotransposons in reptile genomes are the Gypsy, Copia, and DIRS elements, which have all been detected in A. carolinensis [Alföldi et al., 2011]. The non-LTR retrotransposons, which include the LINEs and the Penelope elements, transpose by a process called target-primed reverse transcription, in which reverse transcription takes place at the site of insertion [Luan et al., 1993; Cost et al., 2002; Ichiyanagi and Okada, 2008]. The LINE retrotransposons constitute an ancient and diverse group of elements that are widespread across the eukaryote tree of life, and are classified into 25 major clades [Kapitonov et al., 2009]. The L1, L2, CR1, and RTE clades are the most significant from the point of view of reptile genomic evolution and can all be simultaneously active in the same genomes as shown in A. carolinensis [Novick et al., 2009]. Despite a strong cis preference, the biochemical machinery encoded by non-LTR retrotransposons can act on other transcripts and is responsible for the amplification of nonautonomous SINEs [Kajikawa and Okada, 2002; Dewannieux et al., 2003]. SINEs are short elements derived either from tRNA or 7SL RNA that have the ability to recruit the reverse transcriptase of LINEs, sometimes, but not always, because of similarity between the 3′ end of the SINE with the 3′ end of its LINE partner [Ohshima et al., 1996]. SINEs can outcompete their LINE relatives and amplify to extremely large numbers. This process is exemplified in the lizard A. carolinensis, which contains 2 LINE/SINE pairs, the Sauria SINE mobilized by the Bov-B RTE element and the SINE2 element mobilized by L2 [Piskurek et al., 2009].
Type II elements have in common that they do not use an RNA intermediate during their transposition, but otherwise they constitute a very heterogeneous group of elements, comprised of cut-and-paste transposons, helitrons and polintons. The cut-and-paste DNA transposons constitute the most abundant and diverse category of DNA transposons and include 15 superfamilies, with Tc1/Mariner and hAT elements being the most significant in vertebrate genomic evolution [Feschotte and Pritham, 2007]. These elements have a simple structure with a single open reading frame encoding a transposase and flanked by inverted terminal repeats. Transposition requires recognition of the terminal repeats by the transposase, which then cuts the DNA and inserts it elsewhere. Although DNA transposons move through a cut-and-paste process (vs. copy-and-paste process in retrotransposons), they can increase in copy number if transposition takes place during replication or if repair of an empty site occurs by homologous recombination with a chromosome still containing the insertion [Feschotte and Pritham, 2007]. Type II autonomous elements can also mediate the transposition of nonautonomous elements that are typically, but not always, deleted versions of autonomous copies that have retained the terminal repeats [Hartl et al., 1992]. This is exemplified in A. carolinensis, where 4 different types of type II transposons (hAT, Tc1/Mariner, helitron, and Chapaev) have mobilized 56 identifiable nonautonomous families of elements that are several orders of magnitude more abundant than their autonomous counterparts [Novick et al., 2011].
Modes of Transmission: Vertical versus Horizontal
As obligatory genomic parasites, the fate of TEs is intrinsically related to the reproduction of the host. A relationship between transposition and reproduction is implicit since the spread of TEs and the increase in copy number requires transposition to occur in the germline or in early embryogenesis. In non-LTR retrotransposons, the phylogeny of the elements tends to recapitulate the phylogeny of the host, which is consistent with a strict model of vertical transmission [Malik et al., 1999; Kordis et al., 2006; Waters et al., 2007; Boissinot and Sookdeo, 2016]. This appears to be the case in the L1 and CR1 clades of LINEs, which have persisted and amplified in reptile genomes since the origin of amniotes, with no evidence of lateral transfer [Suh et al., 2014]. However, another clade of LINEs called RTE seems to be prone to lateral transfer [Zupunski et al., 2001]. The first time a lateral transfer of RTE was detected it was between reptiles and ruminant bovines; consequently, this element was termed Bov-B [Kordis and Gubensek, 1998]. This discovery originally raised some doubts [Malik et al., 1999] but the evidence was quite strong, and since then, multiple cases of RTE transfer have been identified in mammals and in reptiles [Gentles et al., 2007; Walsh et al., 2013; Suh et al., 2016].
Since the 1970s, it was known that type II transposons have the ability to jump from host to host in insects [Bartolomé et al., 2009]. Because the germline is sequestered in amniotes, it was thought unlikely that amniote genomes could be colonized de novo by DNA transposons, although the remnants of past type II transposon activity had been detected in human [Pace and Feschotte, 2007]. It thus came as a surprise that a type of DNA transposon, named space invader elements (SPIN), has repeatedly colonized vertebrate genomes, in particular reptilian genomes [Pace et al., 2008]. It was subsequently found that the independent horizontal transfer of SPIN transposons occurred on more than 12 occasions over the past 50 million years in squamates [Gilbert et al., 2012]. Soon after this discovery was made, several other instances of horizontal transfer were identified in A. carolinensis involving other types of transposons [Novick et al., 2010]. This process is not limited to A. carolinensis, and it appears that lateral transfer of DNA transposons is quite common in reptiles and that different categories of type II elements have been transferred, including hAT, Mariner, and helitrons [Schaack et al., 2010; Thomas et al., 2010]. Typically, however, invading DNA transposons do not establish persistent active populations of progenitors but instead are short lived. They usually amplify soon after the transfer to the new host, sometimes to extremely large numbers, but rapidly become extinct.
The exact mechanism of TE transfer to the germline is unknown at this point, but recent data suggest a role of parasites and pathogens in TE transmission [Gilbert et al., 2010; Suh et al., 2016]. Indeed, it was shown that some parasites and their host share TEs, suggesting that this type of intimate interaction can facilitate transmission and that parasites may constitute a vehicle for laterally transferred TEs. However, this is only one piece of the puzzle because for the lateral transfer to be successful, the TEs must colonize the germline, either in the gonads, in the environment, or during early embryogenesis, which poses a problem since amniotes do not release their gametes in the environment and the embryo develops in the amniotic egg where it is protected. Thus, an additional process must exist to move the TE to the germline. It was proposed that viruses, and in particular poxviruses, which have a broad host range, could play a role in this process [Piskurek and Okada, 2007]. The discovery of a snake SINE in a poxvirus known to infect rodents supports this hypothesis [Piskurek and Okada, 2007], but clearly, more research is needed to fully understand the mechanism of lateral transfer in amniotes.
The Evolution of Transposable Elements in Reptiles
The long-term evolution of TEs within their hosts' genomes is best studied in non-LTR retrotransposons because these elements are almost always transmitted vertically. Such analyses have revealed some striking differences among vertebrate lineages and among clades of non-LTR retrotransposons within the same genome. In mammals, TE diversity is generally limited to a single active clade of non-LTR retrotransposons, namely L1. In most species analyzed, L1 evolves as a single lineage of family, where one family is active at a time, until it is replaced by a novel family that achieves replicative supremacy concomitantly with the extinction of its predecessor [Smit et al., 1995; Boissinot et al., 2000; Khan et al., 2006; Sookdeo et al., 2013]. Consequently, most mammalian genomes contain a single active family of L1 element at a time. In contrast, L1 in A. carolinensis is represented by at least 20 concurrently active families, which diverged from each other before the split between mammals and reptiles [Novick et al., 2009; Boissinot and Sookdeo, 2016]. This situation is comparable to that of fish and frog genomes, where L1 is represented by multiple active families, thus indicating that the reduction in diversity of L1 is a mammalian specific feature [Duvernell et al., 2004; Furano et al., 2004; Boissinot and Sookdeo, 2016; Ivancevic et al., 2016]. This pattern is not limited to L1 since the L2 clade is represented in anole by 17 simultaneously active families, although L2 families are not as divergent from each other as L1 families are [Novick et al., 2009]. However, other clades harbor a much lower level of diversity. For example, the R4 clade is represented by a single family and the CR1 clade by 4 very similar subsets [Novick et al., 2009]. The relatively low diversity of CR1 in anole is also found in birds, crocodiles, and turtles, although the common ancestor of all amniotes contained 7 distinct lineages [Suh et al., 2018].
Ultimately, the evolution of TEs is driven and constrained by their environment, which is the host genome. Since TEs are potentially harmful for their host, transposition repression mechanisms have evolved in vertebrate genomes, which in turn favor elements that have the ability to escape repression. This process is exemplified in mammals where the L1 element has repeatedly recruited novel 5′ untranslated region to escape transcription repression by host factors [Adey et al., 1994; Khan et al., 2006; Sookdeo et al., 2013]. This molecular arms race was experimentally validated in primates, where L1 is engaged in an arms race with the ZNF91/93 transcriptional repressor [Jacobs et al., 2014]. In addition, the first open reading frame of L1 evolves adaptively and is structurally unstable, suggesting that this protein may also be the site of an antagonistic interaction with a host factor [Boissinot and Furano, 2001; Khan et al., 2006; Sookdeo et al., 2013]. The evolution of L1 in A. carolinensis does not fit this pattern. In fact the 5′ untranslated region of the lizard L1 is extremely conserved, even across highly divergent L1 families, and the first open reading frame does not show any evidence of adaptive evolution or structural instability [Boissinot and Sookdeo, 2016]. This suggests that the control mechanisms of L1 in A. carolinensis are not acting in a family-specific manner. We proposed that it is unlikely that family-specific transposition repressors evolved in an organism like the green anole, which harbors an extremely rich and diverse TE landscape, and that a more general repression mechanism constitutes a more effective way to control the large number of TE families that are simultaneously active in this genome [Boissinot and Sookdeo, 2016]. This aspect of reptile biology is however in dire need of investigation.
Since reptile genomes, and in particular squamate genomes, contain a large diversity of TEs (see below), they also contain a larger diversity of sequence motifs TEs can exchange. In mammals, L1 elements have been shown to exchange genetic information by recombination, but this process occurs only between the same types of elements [Hayward et al., 1997; Sookdeo et al., 2013]. In contrast, the diverse repetitive landscape of reptiles offers many opportunities for exchange of sequence among TEs and for the generation of novel mosaic TEs. This is exemplified in the evolution of DNA transposons in green anole [Novick et al., 2011]. Some hAT elements contain fragments of other categories of TE, including SINEs and retrotransposons, and related hAT elements have exchanged some motifs repeatedly, probably by recombination. In an extreme case, some autonomous elements can carry up to 5 fragments derived from other TEs, accounting for as much as a third of the length of these elements. Thus, the evolution of TEs in anoles looks like a positive feedback loop in which the high diversity of TEs in this genome generates an even greater diversity of mosaic TEs that are exchanging fragments of sequence among themselves.
The Diversity of Reptilian Mobilome
The Mobilome of Squamates
Squamate reptiles are one of the most species-rich orders of vertebrates, with over 10,000 species (www.reptile-database.org), comprising all lizards and snakes, which together encompass over 200 million years of terrestrial evolution [Jones et al., 2013] and exhibit an incredible diversity of form, physiology, ecology, and behavior. Squamate reptiles provide many classic models of extreme adaptation, e.g., the remarkable growth and metabolic shifts observed in pythons [Andrew et al., 2015], multiple systems of venom evolution [Reyes-Velasco et al., 2015], repeated parallel instances of limb [Brandley et al., 2008] and sensory organ loss, and the remarkable adaptive radiations and convergent evolution documented in anole lizards, one of the most studied model systems in evolutionary biology [Losos, 2009]. Despite these fascinating aspects of squamate biology and their obvious utility as a research tool, only a limited number of high-quality squamate genome assemblies are available and have been analyzed for their repeat content. High-quality genome assemblies exist for the green anole A. carolinensis [Alföldi et al., 2011]; the Burmese python, Python molurus bivittatus [Castoe et al., 2013]; the king cobra, Ophiophagus hannah [Vonk et al., 2013], and the five-pace viper, Deinagkistrodon acutus [Yin et al., 2016] in addition to high-coverage assemblies of the Australian dragon lizard, Pogona vitticeps [Georges et al., 2015] and the Chinese crocodile lizard, Shinisaurus crocodilurus [Gao J et al., 2017] as well as draft assemblies of the Burmese glass lizard, Ophisaurus gracilis, a legless anguid lizard [Song et al., 2015]; the leopard gecko, Eublepharis macularius[Xiong et al., 2016], and the corn snake, Pantherophis guttatus [Ullate-Agote et al., 2014]. This collection of genomes, when considered together, reveals that squamates are as diverse in TE content as they are diverse morphologically, physiologically, or ecologically. First, there is extensive variation in their TE content, ranging from 31.82% readily identifiable TEs (in the Burmese Python) to 49.6% (in the legless anguid lizard). While there have been extensive gene family expansions and contractions in squamate lineages, no clear correlations have been found linking these to TE content fluctuations [Yin et al., 2016]. The diversity in genomic TE landscapes is best exemplified in snakes. A detailed comparison of the copperhead and Burmese python genomes showed that these 2 species differ considerably in their repeat content and the repeat age, although these 2 genomes are of relatively similar size [Castoe et al., 2011]. An analysis that included data from 10 snake genomes showed that TE families are at similar relative proportions in these genomes even though the total TE content varied substantially [Castoe et al., 2013]. Taken together, these results suggest that common mechanisms could be constraining and shaping squamate genomes in ways that we do not currently understand [Pasquesi et al., 2018] and that will likely only be elucidated by analyzing the evolutionary dynamics of TE families at the intraspecific level.
The most significant feature of squamate genomes is that they have a high diversity of TE families [Pasquesi et al., 2018] and, in some species, a large fraction of young and recently active elements. This TE diversity is most pronounced in the green anole genome, which has the most diverse representation of TE families observed in amniotes [Alföldi et al., 2011; Tollis and Boissinot, 2011; Chalopin et al., 2015]. In this species, all major groups of TEs are represented and many are active, including multiple non-LTR retrotransposons, LTR retrotransposons, endogenous retroviruses, DNA transposons, and helitrons. A large proportion of identified elements are young, with little divergence from progenitor sequences, and a relative paucity of older degenerate sequences that predominate in mammalian and bird genomes. These qualities make the green anole genome more similar to that of fish than mammals [Novick et al., 2009] and support the hypothesis that the dynamic and diverse TE landscapes seen in squamate reptile genomes is likely more representative of the amniotes' basal state and that the limited TE activity seen in mammals and birds was separately and independently derived in each of these lineages.
Snake genomes appear to be intermediate in regards to genome diversity and activity. All major families (LINEs, SINEs, LTRs, and DNA transposons) contribute to snake genome size; however, in all cases, LINEs predominate and LTRs make the smallest contribution. Snake genomes contain a more mixed pallet of younger and older sequences than seen in the green anole genome. Older (divergent) copies of L1 and L2 occur in both basal and advanced groups of snakes, suggesting these elements were more active in ancestral snakes [Yin et al., 2016]. While recent expansions include the snake1 CR1 LINEs that have primarily expanded in colubroid snakes [Castoe et al., 2013], expansion of DNA transposons (hAT-Charlie, Tc1/Mariner) and LTR (Gypsy) sequences have occurred in the viper genomes and L2 and CR1 in boas and pythons [Yin et al., 2016]. Such recent TE expansions of elements suggest that the repetitive fraction of squamate genomes remains highly dynamic. This is further illustrated by the multiple horizontal transfer events that have been described from squamate genomes, including the independent horizontal transfer of SPIN elements within squamate genomes on more than 12 occasions over the past 50 million years [Gilbert et al., 2012]. Given that such events are primarily identifiable because the elements involved are also found in mammals, it is intriguing to imagine how many more such events might be identified with a greater diversity of genomes available.
The Mobilome of Archosaurs
The archosaur clade constitutes a major radiation within reptiles, including extant clades of birds and crocodiles as well as major lineages of extinct taxa including dinosaurs, distinct crocodilians, and pterosaurs [Nesbitt, 2011]. The basal split in archosaurs separates crocodilians (extant and extinct) from the clade including birds, dinosaurs, and pterosaurs. Birds are the most speciose group of archosaurs, with between 9,000 and 20,000 species depending on the species concept used for taxonomy [Barrowclough et al., 2016]. In contrast, nonavian dinosaurs may have had around 2,000 species at any given period of time [Starrfelt and Liow, 2016], and crocodilians currently consist of only 20-30 species [Oaks, 2011].
The mobilomes of archosaur genomes vary to nearly the extent of the entire reptile clade (Fig. 1). Crocodilians exhibit a relatively high genomic TE prevalence, suggesting their genomes are more similar to the ancestral genomic TE landscape of reptiles [Wan et al., 2013; Green et al., 2014; Rice et al., 2017]. In contrast, at some point during evolution of the dinosaur lineage, genomic TE content became less prevalent [Organ et al., 2007]. This genomic reduction in TE content likely occurred in the lineage leading to the Saurischian dinosaurs, which includes extant birds and excludes pterosaurs and ornithischian dinosaurs. The genomic reduction of TEs in the Saurischian dinosaur clade is exemplified in the only extant taxon - birds - relatively lacking genomic TE diversity and abundance [Zhang et al., 2014; Chalopin et al., 2015; Gao B et al., 2017; Kapusta and Suh, 2017; Sotero-Caio et al., 2017] (Fig. 1).
Relationships of assembly genome size, genomic TE content, and genome size estimates. Left: The relationship of sequenced genome assembly size and TE genomic proportions for reptiles. Right: Discrepancies between genome size estimates from flow cell cytometry [Gregory et al., 2007] and sequenced genome assemblies in reptiles, for species in which both are available.
Relationships of assembly genome size, genomic TE content, and genome size estimates. Left: The relationship of sequenced genome assembly size and TE genomic proportions for reptiles. Right: Discrepancies between genome size estimates from flow cell cytometry [Gregory et al., 2007] and sequenced genome assemblies in reptiles, for species in which both are available.
Despite the large differences in total genomic TE content between crocodilians and birds, the retrotransposon superfamily CR1 makes up the largest proportion of genomic TE content in both clades [Green et al., 2014; Zhang et al., 2014]. CR1 elements comprise approximately 2-7% of most bird genomes [Zhang et al., 2014] and approximately 10% of crocodilian genomes [Green et al., 2014]. Additionally, crocodilian genomes are comprised of large proportions of DNA transposons including hAT and PIF-Harbinger elements (∼7% of the genome each) as well as LTR Gypsy elements (∼3%) [Green et al., 2014]. While we cannot directly infer the genomic TE content and diversity of extinct archosaurs, we might expect a large proportion of CR1 retrotransposons existed in dinosaurs and pterosaurs, given their prominence in extant archosaurs and activity and prevalence in reptiles in general.
While birds have a modest prevalence of genomic CR1 content, their overall TE content is an outlier relative to other reptiles (Fig. 1). Birds have relatively small and conserved genome sizes relative to other reptiles [Gregory et al., 2007]. Small genome sizes in birds relative to closely related evolutionary lineages are likely due to metabolic constraints of powered flight [Gregory, 2001, 2002; Zhang and Edwards, 2012; Wright et al., 2014] and appear to be a convergently evolved trait among flying organisms, also including bats and pterosaurs [Organ and Shedlock, 2009]. The paucity of TE genomic activity and amplification in birds is a likely consequence of genome size constraint in birds. Recent work has hypothesized that TE activity in birds is accompanied by mid-scale (10s of kbp) deletions, with the overall result of largely stable genome sizes in birds despite TE turnover [Kapusta et al., 2017].
Crocodilians exhibit a relative paucity of TE activity similar to that seen in birds. The majority of CR1 element activity predated the diversification of modern crocodilians [Suh et al., 2014]. However, in-depth analyses of crocodilian and bird genomes have uncovered interesting patterns of recent, but low, levels of activity. In crocodilians, there has been recent activity of endogenous retroviruses [Chong et al., 2014] and a novel class of SINEs [Kojima, 2015]. Similarly, recent activity and emergence of several TEs have been identified in multiple bird clades. Novel classes of SINEs are found in several avian lineages [Kapusta and Suh, 2017], and recent expansions of LTR and CR1 retrotransposons have been identified in songbirds and woodpeckers, respectively [Zhang et al., 2014; Manthey et al., 2018; Suh et al., 2018]. These final 2 examples are notable: the downy woodpecker has about double the genomic TE content (>20% of the genome) relative to other birds [Zhang et al., 2014], and several flycatcher species (genus Ficedula) have thousands of LTR retrotransposon polymorphisms [Suh et al., 2018], indicating a moderate level of recent TE activity.
As more genomes are sequenced, and our ability to sequence, assemble, and analyze genomes improves, we will undoubtedly find we are currently underestimating the genomic TE content of archosaurs. While these genomes will likely still have less TE content and activity relative to other reptiles, much of their genomes still need to be further explored and annotated. The future of genomic TE discovery in archosaurs is exemplified in 2 recent in-depth studies with manual repetitive element identification and better genome assemblies. Guizard et al. [2016] recently showed that the chicken genome was at least 19% simple sequence repeats and TEs, which nearly double its previous estimates of genomic repeat content. Similarly, better annotation and genome assemblies in a crow (genus Corvus) genome [Weissensteiner et al., 2017] identified large (>10 kb) satellite arrays that were missing from previous Illumina-based genome assemblies.
The Mobilome of Turtles
Interest in the mobilome of turtles has emerged 30 years ago [Endoh and Okada, 1986] and has led to major contributions, such as how LINEs and SINEs share their retropositional machinery by exchanging fragments of their sequences [Kajikawa et al., 1997]. Despite these early discoveries, little is known about the dynamics of TEs in this clade. This is likely due to the recent availability of genomic resources for turtles and the even more recent availability of extensive annotation. The first studies of turtle genomes concluded that 10% of the genome could be assigned to TE repeats [Shaffer et al., 2013; Wang et al., 2013]. This rather low proportion would place turtles as outliers within reptiles (Fig. 1), given the observed association between TE content and genome size in vertebrates [Chalopin et al., 2015; Elliott and Gregory, 2015]. These first reports however have likely underestimated the TE content in turtles' genomes, possibly due to a lack of de novo annotation of element families specific to reptiles. More recent annotations suggest that the 4 genomes currently sequenced - the green sea turtle (Chelonia mydas), western painted turtle (Chrysemys picta belli), Chinese soft-shell turtle (Pelodiscus sinensis), and Agassiz's desert tortoise (Gopherus agassizii) - display TE contents around 30% [Tollis et al., 2017]. This latter estimate would be more consistent with the general correlation between TE content and genome size observed in reptiles (Fig. 1).
LINEs and DNA transposons dominate turtle genomes, and display a relatively uniform age distribution, suggesting slow but steady activity [Chalopin et al., 2015]. Among LINEs, CR1/L3 elements are the most abundant in turtles [Chalopin et al., 2015; Tollis et al., 2017], which is similar to other reptiles. The high number of CR1 lineages in turtles likely reflects much of the CR1 ancestral diversity in amniotes [Suh et al., 2014]. Many CR1 subfamilies display a low divergence (<5%) from their consensus sequence, suggesting a recent burst of activity [Tollis et al., 2017]. Overall, turtle genomes tend to diverge slowly [Shaffer et al., 2013] and do not display strong variation in size. Because conducting functional assays in these slow growing, slow to reproduce, and often endangered species seems especially prohibitive, extensive sequencing of this small group may be of particular utility for gaining a mechanistic perspective on the TE dynamics of turtles.
The Impact of Transposable Elements on the Genome of Their Host
TEs are natural components of genetic diversity, and their activity and accumulation have profoundly impacted the evolution of genome size, structure, and function in vertebrates. Host genomes in turn have evolved mechanisms for defense against rampant TE amplification, including novel regulatory proteins, epigenetic silencing, and recombination-driven deletion [Nam and Ellegren, 2012; Goodier, 2016; Kapusta and Suh, 2017; Kapusta et al., 2017]. Host genomes have evolved mechanisms to control TEs because transposition is by definition mutagenic. Like any other mutation, TE insertions may have negative, neutral and nearly neutral, or positive consequences. These relationships between newly transposed TEs and host genomes are essentially symbioses, ranging from parasitism to commensalism and mutualism [Kidwell and Lisch, 2001]. When TE insertions are neutral, we expect genetic drift to be the main factor affecting the inserted TE's fate, similar to any neutral genetic variation. In contrast, TE insertions with negative or positive effects on host genomes will be under selective pressures. Negative selection will act on TE insertions with minor adverse effects on the host, while at the extreme, such as during rampant TE activity in cancerous cells, widespread TE insertions can have detrimental effects on the host organism and provide a dead end for TE amplification [Beck et al., 2011].
The relative importance of drift and selection in determining the fate of TE insertions has been examined in only one reptile, A. carolinensis. Since TE families tend to exhibit low divergence and ancient copies are conspicuously rare by comparison with mammalian genomes, it was proposed that TEs in lizard might be subject to a high rate of turnover [Novick et al., 2009; Tollis and Boissinot, 2011]. This model, first developed to explain the pattern of TEs' frequency in Drosophila and fish [Charlesworth and Charlesworth, 1983; Montgomery et al., 1987; Furano et al., 2004], posits that novel insertions are so deleterious that they rarely reach fixation, so that mostly recent, low-frequency insertions should be found in populations. We tested this hypothesis by assessing LINE frequency in natural populations of green anole [Tollis and Boissinot, 2013; Ruggiero et al., 2017]. We found that, contrary to expectation, LINEs do indeed reach fixation in anoles but that only short, truncated elements do so. Full-length copies, the only ones capable of further transposition, almost never reach fixation, and for those the turnover model does apply. Thus, LINE transposition does impose a substantial fitness cost in anoles, and in fact, the intensity of selection against LINEs in anole may be stronger than it is in other organisms such as human and stickleback [Xue et al., 2018]. We also found that the fraction of elements that reached high frequency or fixation differed among populations [Tollis and Boissinot, 2013; Ruggiero et al., 2017]. For example, populations from North Carolina had significantly more fixed insertions in their genome than the population from Florida. This pattern can be explained by the demographic history of those populations [Tollis et al., 2012; Tollis and Boissinot, 2014; Manthey et al., 2016]. Northern populations of anoles have smaller effective population size than Floridian ones, and in those populations, the effect of drift was stronger and mitigated the effect of purifying selection, which was the dominant force in the south.
Although the majority of TE insertions will be either neutral or deleterious, TEs can occasionally be recruited by the host for its own benefit and thus positively selected [Warren et al., 2015]. There are now numerous examples of exaptation in vertebrates, and TEs have been shown to be a source of novel regulatory sequence (promoters or enhancers), of coding sequences (new genes or exons), of binding sites, and of noncoding RNAs (miRNAs and lncRNAs) [reviewed in Warren et al., 2015]. In fact, the exaptation of TEs has had some profound impacts on the evolution of networks as well as on the evolution of some important functions, such as immunity and neurotransmission [Feschotte, 2008; Jangam et al., 2017; Pastuzyn et al., 2018]. However, the vast majority of examples of exaptation in vertebrates have been found in mammals [Warren et al., 2015], arguably the best-studied vertebrate group in the field of comparative genomics. There is no reason to believe that the close examination of reptile genomes would not yield a trove of domesticated TEs. Indeed, the accumulation of TEs in the Hox clusters of reptiles [Di-Poï et al., 2009, 2010], which contrast with the extreme structural stability of those clusters in other vertebrates [Deschamps and Duboule, 2017], suggests that TE insertions could have had an impact on the morphological diversity of squamates [Di-Poï et al., 2010] as well as the rate of speciation in anoles [Feiner, 2016]. Much work remains to be done to confirm a functional role of those insertions in the evolution of phenotypes.
While the above examples are all the result of transposition-induced mutations, post-insertion TE presence also has the potential to impact the structure and size of host genomes. When many TEs of similar sequence identity are spread throughout the genome, they may induce ectopic recombination resulting in insertions, deletions, and chromosomal rearrangements such as inversions [Canapa et al., 2015]. Thus, not only does transposition provide a source of structural diversity via TE insertions, but the TE insertions themselves may help create additional heritable structural variation affecting evolutionary processes across populations [Piacentini et al., 2014]. For example, TE-induced chromosomal changes may result in novel phenotypes [Saenko et al., 2015] or reproductive incompatibilities between isolated populations (e.g., Dobzhansky-Muller incompatibilities), lending a putative role for TEs in the speciation process [Brown and O'Neill, 2010; Rebollo et al., 2010].
A combination of TE-induced chromosomal changes as well as a heightened level of TE transposition would have profound consequences for populations and species. This situation could be induced via hypomethylation associated with cancers [Daskalos et al., 2009] or due to chromosomal rearrangements interrupting normal patterns of methylation and subsequently removing TE silencing [O'Neill et al., 1998; Dobigny et al., 2004]. Here, a cascade of genomic changes could promote evolutionary diversification and is consistent with several studies identifying coincident bursts of diversification and high TE activity and accumulation [Pascale et al., 1990; de Boer et al., 2007; Ray et al., 2008]. While genomic changes induced by TEs may promote speciation, the overall accumulation of TEs in the genome may have the opposite effect; lineages with larger genome sizes appear to have slower speciation rates relative to lineages with smaller genome sizes [Kraaijeveld, 2010].
As TEs accumulate in a genome, they naturally have the potential to increase genome size if transposition outpaces excision or deletion [Petrov, 2001]. TEs can increase genome size through copy number increase via transposition, TE-induced ectopic recombination resulting in changes in genome size, and by TE-mediated satellite expansions [Canapa et al., 2015; Klein and O'Neill, 2018]. While genomes may increase in size with TE activity, recombination-driven deletions counteract this effect, in some cases maintaining genome size even in the face of TE activity [Kapusta et al., 2017]. Although there is no obvious relationship between the complexity of organisms and the size of their genome, the accumulation of TEs can have significant structuring effects, which may be adaptive, for instance by promoting the establishment and maintenance of heterochromatin domains [Lyon, 1998; Slotkin and Martienssen, 2007; Kejnovsky et al., 2015]. In addition, differences in genome size can indirectly impact the adaptability of organisms by affecting the location and number of mutations of functional significance, e.g., “the functional space hypothesis” [Mei et al., 2018].
One might expect a simple relationship between genome size and genomic TE content. However, TE content varies across lineages and within genomes. TE content varies across the genomic landscape, especially in cases where species have microchromosomes such as in squamates and birds; microchromosomes are typically enriched in gene content and contain relatively fewer TEs [Ellegren, 2010]. Similarly, across the tree of life, clade-specific factors lead to the lack of a clear association between genome size and genomic TE content, while a strong association can be found within individual clades [Pagel and Johnstone, 1992; Canapa et al., 2015]. Based on published genomes, the reptile clade has a positive relationship between genome size and overall genomic TE content, including additional clade-specific patterns (Fig. 1). Overall, this pattern may be obfuscated by incomplete genome sequencing projects, where assemblies are underrepresenting the true size of genomes (Fig. 1).
Conclusions
The sequencing of several genomes representative of the major reptilian lineages has considerably improved our understanding of amniotes genomic evolution and, in particular, revealed an extraordinary diversity of repetitive landscapes across reptiles. However, this field is only emerging, and most studies performed so far have been somewhat descriptive. There are still big gaps in our understanding: To what extent do TEs contribute to adaptation and speciation? Which mechanisms of repression have evolved in reptiles? How has this diverse repetitive landscape affected other genomic features such as the rates of recombination and mutation? How do different types of TE interact in their genomic environment?
To answer these questions, genome sequences of better quality are needed. In particular, chromosome-scale genome assemblies are necessary. Poor assemblies often omit highly repetitive regions where TEs are more likely to accumulate. Without proper assembly and annotation, it becomes impossible to perform an exhaustive assessment of TE insertions. It can be assumed that many of the unassembled regions of genomes are largely, if not entirely, composed of repetitive sequences. The availability of chromosome-scale assemblies will help resolve this and possibly many aspects of TE dynamics. This will improve our estimates of TE content and may reveal important trends in how mobile elements aggregate, are deleted, are domesticated, spawn simple repeats, and recombine into novel compound sequences. For similar reasons, further studies of TEs at the population level may be particularly useful. The breadth of reptile genomes and their TE content provides an ideal system for understanding the evolutionary forces that affect structural variation in natural populations and consequently identify general mechanisms that have shaped vertebrate genomic diversity.