Abstract
Background: Tandemly repeated satellite DNA sequences are an important part of animal genomes. They are involved in chromosome interactions and the maintenance of the integral structure of the nucleus, regulation of chromatin conformation and gene expression, and chromosome condensation and movement during cell division. Satellite DNAs located in the centromeric heterochromatin evolve rapidly and likely affect hybrid fertility and fitness. However, their studies are taxonomically highly biased. In lacertid lizards, satDNA has been extensively studied in the subfamily Lacertinae, but the subfamily Eremiadinae has been largely overlooked. Results: In this work, we describe a novel 177-bp-long centromeric satDNA family EremSat177, which is present in all studied species of the genus Eremias, but not in related genera. EremSat177 is not homologous to any previously identified centromeric satellites. Using fluorescence in situ hybridization, we demonstrate its centromeric localization in E. velox and E. arguta. We also show its tandem organization and intra-genomic homogenization by in silico analysis in the genome of E. argus. The phylogenetic analysis of consensus EremSat177 sequences from 12 Eremias species demonstrates that the same monomer subfamily is the most abundant in all these species, and its evolution mainly follows the species phylogeny as revealed by the mtDNA sequences. Conclusion: The EremSat177 represents a novel, lineage-specific centromeric satellite DNA, and its role in centromere functioning should be revealed in further research.
Introduction
Repetitive DNA makes up a significant proportion of eukaryotic genomes and is essential for maintaining their structure and functions [1]. Depending on their structure and distribution in the genome, two main classes of repetitive DNA can be distinguished: interspersed repetitive DNA, e.g., transposable elements, is dispersed throughout the genome, whereas tandemly repeated or satellite DNA (satDNA) consists of arrays of monomers following a head-to-tail order. Tandem DNA repeats are often localized in specific regions of chromosomes, such as in constitutive heterochromatin, telomeric, centromeric, and pericentromeric regions [2]. There are two main models explaining the variability of satDNA in genomes over evolutionary time [2]. The library model emphasizes long-term conservation and divergence of satDNAs from a shared ancestral repertoire. According to it, different species retain subsets of a shared library of satDNA families over evolutionary time mainly due to expansions and contractions in the copy number of particular repeats. The concerted model explains the rapid homogenization of satDNA within a species – due to mechanisms like unequal crossing over, gene conversion, or replication slippage, which propagate sequence variants across the genome, resulting in satDNA arrays that are highly similar within a species but diverge between species. Both models act in different satDNAs in different organisms.
Centromeric satDNAs are present in all eukaryotes and are part of the structure of all centromeres, indicating their importance in the genome. The functions of the centromeric satDNA include regulating the binding of CENPs and kinetochore, and they are thus involved in the process of chromosome segregation during cell division [3]. However, the similarity of their sequences between different taxa is usually very low. The centromeric satellites are characterized by a high rate of evolution. Closely related species or even different populations of the same species might possess significant differences in their sequences of centromeric satellites [4, 5]. On the other hand, in some cases, centromeric satellites are conserved at relatively high taxonomic levels, such as in bovid ungulates [6], sturgeons [7], and snakes [8]. The divergence of the centromeric satDNA sequences is hypothesized to impair viability and fertility of interspecific hybrids, interrupting their mitosis and meiosis [9]. Therefore, studying the evolution of centromeric satDNAs helps understanding species formation and lineage diversification.
In reptiles, the lizard family Lacertidae is one of the classical model groups for the studies of satDNAs. This family is widespread in the Old World, including 378 species [10], and it is divided into three subfamilies: Gallotiinae, Eremiadinae, and Lacertinae. The satDNAs in lacertids have been studied since the 1980s. They include centromeric satellites, located in the centromere areas, and pericentromeric satellites. Pericentromeric satellites are usually located in C-positive bands on the chromosome arms near centromeres and are present only in a subset of chromosomes [11, 12]. They are usually more phylogenetically conserved than the centromeric satellites [12].
Within the subfamily Lacertinae, various satDNA families have been identified. The centromeric satellites isolated in different genera include pGPS (genera Hellenolacerta, Podarcis) [13], pLCS and pLHS (Podarcis, Teira) [14], CLsat (Darevskia), Agi160 (Lacerta) [15‒17], and IMO-HindIII (Iberolacerta) [11, 18]. A conservative pericentromeric satellite, IMO-TaqI, was detected in the species of the genera Iberolacerta, Lacerta, and Timon, and even in amphisbaenians, highly specialized legless burrowing lizards, which are a sister lineage to the family Lacertidae [19, 20].
Within the subfamily Eremiadinae, only one species was investigated so far: Atlantolacerta andreanskyi, whose genome was found to contain the IMO-TaqI pericentromeric satellite common for all lacertids, and a specific centromeric satellite AAN-TaqI, not similar to anything known in other lizards [21]. The genus Atlantolacerta is monotypic and sister to all other lineages of the subfamily Eremiadinae. The more speciose and widespread genera of this subfamily have not been explored for satDNA yet.
In this work, we studied previously deposited and newly generated low-coverage genomic DNA sequences of 12 species of the genus Eremias and of Acanthodactylus guineensis and Mesalina guttulata as related outgroup species from the same subfamily using the TAREAN tool, which uses de novo graph-based prediction from raw reads to find tandem repeats in short-read sequencing data [22]. The identified satDNAs were physically mapped to the chromosomes of representatives of the genus Eremias using fluorescence in situ hybridization (FISH) with PCR-amplified and synthesized oligonucleotide probes, and in silico mapped to the assembled genome of E. argus [23] using the satXplor pipeline [24]. To compare the evolution of the identified satDNAs with the evolution of other parts of the genomes of the species, we assembled mitochondrial genomes of the studied species from the same data and reconstructed phylogeny of the studied species based on the previously deposited and newly obtained mitochondrial sequences.
Material and Methods
DNA Extraction and Sequencing
For molecular genetic analyses, we used previously obtained whole-genomic DNA sequences and DNA sequences of microdissected W chromosome of E. velox [25] and generated new genomic sequences for additional species. DNA of representatives of the other Eremias species was extracted using the phenol-chloroform method [26] from muscle tissues of the specimens deposited in the Zoological Museum of Moscow State University (online suppl. Fig. 1; for all online suppl. material, see https://doi.org/10.1159/000543883). DNA of M. guttulata was extracted from the cell culture suspension using ExtractDNA Blood & Cells kit (Evrogen, Moscow, Russia). The libraries for low-coverage WGS were prepared using MGIEaZy Universal DNA Library Prep Kit (MGI, Shenzhen, China), following the manufacturer’s protocol as described previously [27]. DNA-library sequencing was performed on the MGISEQ-2000 instrument using the DNBSEQ-G400RS High-throughput Sequencing Set (FCL PE100) in the SB RAS Genomic Core Facility (ICBFM SB RAS, Novosibirsk, Russia).
Bioinformatic Analyses
Illumina genomic sequences of additional Eremias species and of A. guineensis were downloaded from the NCBI SRA archive (online suppl. Table 1) [23, 28‒32] and analyzed together with the sequences obtained in the current work. To extract satellite DNA sequences, we implemented the TAREAN pipeline, which identifies tandem repeats in raw sequencing data through graph analysis [22]. Then, all raw sequences were realigned to the complete satellite DNA library using bowtie2 [33] to check the TAREAN output. To obtain the mitochondrial DNA sequences, the Illumina and MGI reads from the sequenced Eremias specimens were assembled using SPAdes [34], and the reads of M. guttulata were assembled using MEGAHIT [35]. The mtDNA contigs were extracted from the assemblies using BLAST and annotated using MITOS2 [36] on the Galaxy web server (https://usegalaxy.org). The standard mtDNA phylogenetic marker sequences (COI, CYTB) obtained from the mtDNA contigs were used for DNA barcoding, and complete sets of mitochondrial protein-coding genes were used for phylogenetic analysis. Additional mtDNA sequences were downloaded from GenBank. DNA barcoding was performed by NCBI BLAST and by the BOLD DNA barcoding service (https://boldsystems.org [37]). For phylogenetic analysis, the protein-coding mtDNA sequences were aligned using ClustalW algorithm in MEGA 11 [38]. The consensus sequences of the satellite DNA were aligned manually. Phylogenies were reconstructed by maximum likelihood approach with IQ-TREE 2.2.0 [39] using TIM2+F+I + I + R4 substitution model for the mitochondrial genes, and TPM3+I substitution model for the satellite DNA, selected by ModelFinder [40] and with 1,000 nonparametric bootstrap replicates. Diversity and spatial distribution of the satDNA in the assembled genome of E. argus [23] was explored with the SatXplor pipeline [21], with the result visualized using default output figures and the Integrated Genomic Viewer (IGV, [41]).
Cell Cultures and Chromosome Preparations
We adopted chromosome suspensions of E. velox [25, 42], Acanthodactylus lineomaculatus [43], Gallotia galloti, E. arguta, Acanthodactylus schreiberi, Gastropholis prasina, Latastia longicaudata, Phoenicolacerta troodica, Takydromus dorsalis, Podarcis siculus, Timon lepidus, and Lacerta media [44] used in previous studies. The cell culture of M. guttulata was obtained from embryonic tissues from an egg taken from a captive colony originating from Israel and kept in the biology department of the high school “Intellectual” (Moscow, Russia). The cultures were established in the Laboratory of Animal Cytogenetics, Institute of Molecular and Cellular Biology, Russia, using enzymatic treatment of tissues as described previously [45‒47]. The cell culture lines of this species were deposited in the large-scale research facilities “Cryobank of cell cultures” IMCB SB RAS (Novosibirsk Russia). Metaphase chromosome spreads were prepared from chromosome suspensions obtained from early passages of primary fibroblast cultures as described previously [48, 49]. For primary karyotyping, the chromosome slides of M. guttulata were stained by 4′,6-diamidino-2-phenylindole (DAPI; Vector laboratories, Burlingame, CA, USA) after barium hydroxide treatment, as described previously [50].
Satellite DNA Probes and Fluorescence in situ Hybridization
Primers EV177F (5′-TCTCATTTTGACTGGAACAAGCA-3′) and EV177R (5′-CGTGTTTTGAGTCGTTCCTACC-3′) to amplify the satellite DNA were designed based on the satellite DNA sequence of the E. velox specimen used for cell culturing, described previously [25, 42]. PCR was carried out under the following conditions: 5 min 95°C, then 35 cycles: 30 s 95°C, 30 s 57°C, 30 s 72°C; then 5 min 72°C. The amplicons were checked by agarose gel electrophoresis for the typical satDNA ladder-like pattern, and the second round of PCR was carried out to label the amplicons with biotin-dUTP (Sigma). This probe was used for FISH in E. velox, A. lineomaculatus, and M. guttulata.
FISH with the PCR-originated probe was carried out according to the standard technique [51]. In brief, before hybridization the slides with mitotic chromosome spreads were dried at 60°C for 1 h, and after that incubated in 0.12% trypsin solution for 55 s. Then they were washed in 2× saline-sodium citrate (SSC) for 5 min and in phosphate-buffered saline (PBS) for 5 min. Probes for hybridization contained 100–200 ng of DNA probe per slide with 50% formamide in 2× SSC. The probes were denatured at 95°C for 5 min and immediately placed in ice, after that they were centrifuged in a microcentrifuge at 13,000 rpm for 5 min. The chromosomes on the slides were denatured in 70% formamide in 2× SSC solution at 70°C for 70 s, and after that they were dehydrated in a cold ethanol series of 70–80–96%. The probes were applied onto the slides, and the slides were incubated in a humid chamber at 40°C overnight before washing. After washing and incubation with avidin-FITC (Vector Laboratories, Inc., Burlingame, CA, USA), the slides were mounted in Vectashield Antifade mounting medium with DAPI (Vector Laboratories, Burlingame, CA, USA).
Alternatively, a universal oligonucleotide probe was developed based on the alignment of all satellite DNA variants from different Eremias species and was directly labeled commercially by Generi Biotech (Hradec Králové, Czech Republic): 3′-AATAAYWGCTCAAAACAWGCTTCTTTMGCWAAGTTTTGCTGCTTTTCTTGTGC-5′-Cy3. The probe was used for FISH in E. arguta and in species from the other genera.
Briefly, mitotic chromosome spreads were dried at 60°C for 1 h and then treated with RNase A and pepsin (Sigma-Aldrich, St. Louis, MO, USA) with intermediate washes in 2× SSC and PBS. The slides were dehydrated through a 70–85–96% ethanol series and air-dried. They were then denatured in 70% formamide in 2× SSC at 73°C for 3 min, followed by another round of dehydration in cold 70–85–96% ethanol. For the details of pre-hybridization treatment, please see [52]. For each slide, the probe contained 40 pmol of commercially prepared oligos with Cy3 labeling in 15 μL of hybridization mixture (pH 7.0), which included 50% formamide, 10% dextran sulfate (Sigma-Aldrich, St. Louis, MO, USA), 10% sodium dodecyl sulfate, 2× SSC, and 1× Denhardt’s solution (Thermo Fischer Scientific, Carlsbad, SA, USA). The probe was denatured at 86°C for 6 min and immediately placed on ice. It was then applied to the slide with chromosome spreads and hybridized in a dark, humid chamber at 37°C overnight. Post-hybridization washes were performed following the protocol of Voleníková et al. [53] with slight modification: twice in 1× SSC at 65°C and 42°C for 3 min each and once in 4× SSC with 0.01% Tween 20 (Sigma-Aldrich, St. Louis, MO, USA) at 42°C for 5 min. Finally, the slides were mounted in Vectashield Antifade Mounting Medium with DAPI (Vector Laboratories, Burlingame, CA, USA).
Results
Mitochondrial Genomes, Phylogeny, and Barcoding
Mitochondrial genomes were assembled as single contigs from all newly sequenced specimens except the E. intermedia female and the E. persica female, due to low amounts of generated sequence reads in these specimens. In these two specimens, only isolated short mtDNA contigs were retrieved, which were used only for DNA barcoding. Only male genomes, which were assembled completely, were used in the phylogenetic analysis for these two species. The complete mitochondrial genome contigs were deposited in GenBank under the accession numbers PQ390700-PQ390711 and PQ390636. The extracted COI and CYTB sequences of the E. intermedia and E. persica females were deposited in GenBank under the accession numbers PQ394787, PQ394788, PQ412934, and PQ412935. The DNA barcoding showed that all specimens corresponded to their museum labels and morphological identifications, except the “E. regeli” female, which was reidentified as E. velox. The E. lineolata specimens were confirmed as E. lineolata based on numerous BOLD and GenBank records. Closely related complete mitochondrial genomes from GenBank were labeled as “E. scripta,” probably due to wrong species identification [31]. Based on the morphology and the confirmation by the majority of BOLD and GenBank matches, we concluded that the specimens studied in the current work were E. lineolata. We complement the previously published phylogeny by adding new species, sequenced for the first time (Fig. 1). The results of the phylogenetic analysis of mitochondrial protein-coding genes for formerly studied species correspond to the previously published analysis [31].
Phylogenetic tree of mitochondrial protein-coding genes of the genus Eremias and outgroups. Bootstrap support values (%) other than 100 are indicated. Red indicates sequences obtained in the current work. Green bar, indicating the viviparous clade, and orange bar, indicating E. intermedia, show contrasting positions of the E. intermedia sequences on the mitochondrial and EremSat177 trees.
Phylogenetic tree of mitochondrial protein-coding genes of the genus Eremias and outgroups. Bootstrap support values (%) other than 100 are indicated. Red indicates sequences obtained in the current work. Green bar, indicating the viviparous clade, and orange bar, indicating E. intermedia, show contrasting positions of the E. intermedia sequences on the mitochondrial and EremSat177 trees.
TAREAN Analysis of Satellite DNA
The analysis of all Eremias specimens except the E. lineolata female from Uzbekistan revealed a major satellite with the monomer length of 177 bp (Table 1). The average GC content of the revealed variants is 38%. Based on the taxon and the consensus length, we call this satellite EremSat177. No other universal satellites present in all studied species were found. In E. lineolata from Uzbekistan, the presence of this satellite was revealed by the realignment of the raw reads to the satellite DNA variants obtained from other specimens. Interestingly, two other E. lineolata samples from a different population in Tajikistan are phylogenetically separated from the Uzbekistan specimen in the mtDNA tree. The TAREAN analysis of A. guineensis and M. guttulata did not reveal any similar sequence. No similarities were also found in GenBank using BLAST. In E. velox from Iran, additional minor divergent variants of EremSat177 were found, and in E. stummeri male, a 353-bp sequence consisting of two divergent EremSat177 monomers was found (Table 1; Fig. 2). The phylogenetic analysis of the EremSat177 variants has shown that the tree of the major variants resembles the mitochondrial DNA tree; in particular, the variants of the viviparous clade (E. multiocellata, E. stummeri, E. yarkandensis, etc.) were closely related, and the variants of other species were more basal. The analysis of the “E. regeli” female’s variant confirmed its reidentification as E. velox. The major contradiction is the placement of EremSat177 variants of E. intermedia inside the viviparous clade, whereas its mtDNA lineage is outside this clade. Other differences include the branching order of E. velox and E. lineolata, and presence (in mtDNA) versus absence (in EremSat177) of the “E. persica + E. regeli” clade. However, these nodes have low support in the mtDNA tree. The minor EremSat177 clusters found in the E. velox from Iran and the E. stummeri male occupied the most basal position of the tree (Fig. 2). The mean pairwise p-distance between the TAREAN cluster consensus sequences from different specimens was 0.16 (0–0.45) for all variants, and 0.09 (0–0.17) within the crown group. The consensus sequences of the EremSat177 repeat were deposited in GenBank under the accession numbers PQ390381-PQ390403.
TAREAN clusters representing the EremSat177 satellite in different samples of Eremias and their genome proportions
Sample . | TAREAN cluster number . | % of genome . |
---|---|---|
E. velox EVM | CL11 | 0.72 |
CL107 | 0.019 | |
E. velox EVF | CL15 | 0.59 |
CL102 | 0.023 | |
E. velox ERF | CL5 | 1.1 |
E. velox EV_LBC | CL5 | 0.75 |
E. regeli ERM | CL5 | 1 |
E. lineolata ELM | CL14 | 0.46 |
E. lineolata ELF1 | CL13 | 0.42 |
E. intermedia EIM | CL25 | 0.24 |
E. intermedia EIF | CL13 | 0.37 |
E. stummeri ESM | CL10 | 0.56 |
CL21 | 0.33 | |
E. persica EPEM(L4) | CL4 | 1.4 |
E. persica EPEF(L7) | CL9 | 0.68 |
E. multiocellata EMM(L5) | CL6 | 0.9 |
E. multiocellata EMF(L6) | CL4 | 1.2 |
E. yarkandensis EYA | CL7 | 0.82 |
E. dzungarica EDZ | CL26 | 0.25 |
E. nikolskii ENI | CL17 | 0.3 |
E. argus EAGS | CL39 | 0.14 |
E. szczerbaki ESZ | CL6 | 0.72 |
Sample . | TAREAN cluster number . | % of genome . |
---|---|---|
E. velox EVM | CL11 | 0.72 |
CL107 | 0.019 | |
E. velox EVF | CL15 | 0.59 |
CL102 | 0.023 | |
E. velox ERF | CL5 | 1.1 |
E. velox EV_LBC | CL5 | 0.75 |
E. regeli ERM | CL5 | 1 |
E. lineolata ELM | CL14 | 0.46 |
E. lineolata ELF1 | CL13 | 0.42 |
E. intermedia EIM | CL25 | 0.24 |
E. intermedia EIF | CL13 | 0.37 |
E. stummeri ESM | CL10 | 0.56 |
CL21 | 0.33 | |
E. persica EPEM(L4) | CL4 | 1.4 |
E. persica EPEF(L7) | CL9 | 0.68 |
E. multiocellata EMM(L5) | CL6 | 0.9 |
E. multiocellata EMF(L6) | CL4 | 1.2 |
E. yarkandensis EYA | CL7 | 0.82 |
E. dzungarica EDZ | CL26 | 0.25 |
E. nikolskii ENI | CL17 | 0.3 |
E. argus EAGS | CL39 | 0.14 |
E. szczerbaki ESZ | CL6 | 0.72 |
Phylogenetic tree of EremSat177 consensus sequences isolated from different specimens of Eremias. Bootstrap support values more than 60 are indicated. Red indicates specimens sequenced in the current work. Green bar, indicating the viviparous clade, and orange bar, indicating E. intermedia, show contrasting positions of the E. intermedia sequences on the mitochondrial and EremSat177 trees.
Phylogenetic tree of EremSat177 consensus sequences isolated from different specimens of Eremias. Bootstrap support values more than 60 are indicated. Red indicates specimens sequenced in the current work. Green bar, indicating the viviparous clade, and orange bar, indicating E. intermedia, show contrasting positions of the E. intermedia sequences on the mitochondrial and EremSat177 trees.
Karyotype and Sex Chromosomes of Mesalina guttulata
M. guttulata has not been karyotyped before. The studied specimen had 19 pairs of acrocentric chromosomes gradually decreasing in size, which corresponds to the most common lacertid karyotype (Fig. 3a) [44, 54]. The W chromosomes were identified as a small heterochromatic chromosome revealed by DAPI staining after the barium hydroxide treatment (Fig. 3b).
Chromosomes of Mesalina guttulata: Giemsa staining (a) and C-like DAPI staining (b). Scale bars: 10 μm.
Chromosomes of Mesalina guttulata: Giemsa staining (a) and C-like DAPI staining (b). Scale bars: 10 μm.
FISH with Satellite DNA Probes
FISH with EremSat177 probes in E. velox and E. arguta revealed centromeric signals in all chromosomes (Fig. 4). The W chromosomes showed the same centromeric accumulation as autosomes, with no specific interstitial signals. The PCR probe tested in M. guttulata and A. lineomaculatus, and the oligo-probe tested in the other species did not reveal any signals on their chromosomes (online suppl. Fig. 1).
Diversity and Distribution of EremSat177 in the Assembled Genome of E. argus
The SatXplor pipeline revealed 794 monomers of EremSat177 in the assembly. The mean inter-monomer p-distance was 0.0497. The 2D variability plot revealed high conservation of the monomer sequences, with the majority belonging to one cluster, although minor divergent clusters are visible (Fig. 5a). The UMAP plot, showing finer monomer variability, revealed both chromosome-specific and universally distributed monomer subgroups (Fig. 5b). EremSat177 arrays were revealed on chromosomes 1, 2, 3, 5, 6, 14, and 16, and on unmapped scaffolds ctg1106, ctg396, ctg690, ctg941, and ctg435. The array lengths varied from 4.1 kb to 22.5 kb (mean 11.2 kb), and there was one array per chromosome or scaffold, located terminally or subterminally. The arrays had a regular structure with a head-to-tail orientation of the monomers, as shown by the inter-monomer distance plot (Fig. 5c) and by graphical mapping (Fig. 5d).
Results of satXplor analysis (a–c) and IGV visualization (d) of EremSat177 diversity and spatial organization in Eremias argus. a Similarity plot between the consensus sequence and individual monomers. b UMAP plot showing similarity between monomer groups. c Plot of inter-monomer distances (bp) in the chromosomes, showing clusterization and tandem organization. d Spatial organization of the EremSat177 cluster in chromosome 3.
Results of satXplor analysis (a–c) and IGV visualization (d) of EremSat177 diversity and spatial organization in Eremias argus. a Similarity plot between the consensus sequence and individual monomers. b UMAP plot showing similarity between monomer groups. c Plot of inter-monomer distances (bp) in the chromosomes, showing clusterization and tandem organization. d Spatial organization of the EremSat177 cluster in chromosome 3.
Discussion
In the current work, we identified a novel lacertid centromeric satDNA element, universal for the examined species of the genus Eremias. Its monomer length is typical for centromeric satellites, e.g., the primate alpha-satellite monomer is 171 bp long [55, 56]. The high AT content of 62% is also typical for centromeric satellites: Agi160 has 60% [57] and pLCS has 57% [58] of AT bases. EremSat177 is absent in the examined related genera of the subfamily Eremiadinae such as Mesalina and Acanthodactylus. A low conservation of centromeric satellites is also characteristic for other lacertids: IMO-HindIII is specific to the genus Iberolacerta [11, 12], and Agi160 is specific to the genus Lacerta [15]. However, some groups of reptiles show contrasting patterns: the satellite PFL-MspI is present in the centromeres of species from the distantly related snake families Colubridae and Viperidae [8, 59]. Interestingly, the pericentromeric satellite IMO-TaqI was not found in any of the here studied species of the subfamily Eremiadinae, although it was documented in several species of the subfamily Lacertinae, in the genus Atlantolacerta sister to all other lineages of the subfamily Eremiadinae and even to amphisbaenians, the sister lineage to lacertids [20, 21]. In Eremias, it was presumably either lost or decreased in abundance, making it undetectable by TAREAN. The centromeric position of EremSat177 is shown both by FISH in E. velox and E. arguta and by SatXplor analysis in the available E. argus genome. Therefore, it can be used to correctly orientate chromosome-level scaffolds in future Eremias genome assemblies. The low number of monomers found in the E. argus assembly and the lack of the arrays in some chromosomes, in contrast with the results of FISH in the other two species, are probably due to low quality of assembly of repeat-rich regions in the genome of E. argus. Interestingly, the proportion of EremSat177 in the genomes is highly variable not only between species but also within species (Table 1). This may be both due to a variation in copy number or in genome size. Analyzing more specimens of different species and sexes is required to separate interspecific, intersexual, and individual variation.
The fact that EremSat177 was found in the E. lineolata specimen from Uzbekistan only by the read realignment and not by the TAREAN analysis itself, probably indicates a more complex, not purely tandem organization of the repeat in this specimen. Interestingly, this specimen is both geographically and phylogenetically distant from two other E. lineolata samples, in which EremSat177 was successfully isolated by TAREAN (Fig. 1; online suppl. Table 1). Since basal variants of EremSat177 were found in E. velox and E. stummeri with low abundance, we hypothesize that EremSat177 emerged before the basal split of Eremias, and then one variant became dominant, according to the model of satDNA library evolution. Subsequently, it preserved its chromosome position and probably the functional role, and evolved together with the species themselves, according to the model of concerted evolution, as shown by the similarity between the satDNA and mtDNA phylogenetic trees, and by low within-genome variability of EremSat177 in E. argus. The homogenization of EremSat177 within the genomes together with its interspecific divergence implies that it may contribute to reproductive isolation and speciation. The position of the E. intermedia variants of EremSat177 is the only strict contrast between the mtDNA and satDNA trees. We hypothesize that it is either a phylogenetic error due to the small length of the fragment and low number of species analyzed, especially given low bootstrap support, or a consequence of ancient hybridization. The phylogeny of Eremias is poorly studied in general, since previous works were mostly focused on particular species groups and used low amounts of data, such as 16S rRNA and COI genes only. Although our analysis uses complete sets of protein-coding mitochondrial genes, such large datasets are available only for a low number of species, and no comprehensive nuclear gene analyses have been made. Thus, to analyze the evolution of EremSat177 in more detail, it should be isolated from more species, and a robust mito-nuclear phylogeny of these species should be available. To establish the functional role of EremSat177 and directly test its association with the centromeric proteins, ChIP-Seq experiments should be performed.
Acknowledgments
Computational resources for RepeatExplorer (TAREAN) analysis were provided by the ELIXIR-CZ project (LM2023055), part of the international ELIXIR infrastructure. The research was completed using equipment (materials) of the large-scale research facilities “Cryobank of cell cultures” Institute of Molecular and Cellular Biology SB RAS (Novosibirsk, Russia). We thank Dr. Stanislav Dryomov for his help in running satXplor and Dr. Valentina Orlova for granting access to the collections of the Moscow State University.
Statement of Ethics
This study protocol was reviewed and approved by Institute of Molecular and Cellular Biology Ethics Committee (Statement No. 1 from August 30, 2024).
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
Ministry of Science and Higher Education of the Russian Federation (Grant No. FWNR-2022-0015) awarded to A.L. and M.A. was supported by the Charles University Research Centre (Program No. UNCE/24/SCI/006).
Author Contributions
A.L.: study design, bioinformatics, and writing initial draft of the manuscript; L.L.: cell culturing and FISH; S.R.: cell culturing; G.D.: preparation of NGS libraries; M.A.: FISH; M.R., L.K., M.G., R.N., and I.O.: sampling; and V.T.: project management and curation. All authors participated in writing and editing of the final manuscript.
Data Availability Statement
All the sequencing data were deposited in the SRA archive under the BioProject PRJNA1165233.