Abstract
Mimiviruses are giant viruses that infect phagocytic protists, including Acanthamoebae spp., which were discovered during the past decade. They are the current record holder among viruses for their large particle and genome sizes. One group is composed of three lineages, referred to as A, B and C, which include the vast majority of the Mimiviridae members. Cafeteria roenbergensis virus represents a second group, though the Mimiviridae family is still expanding. We analyzed the codon and amino acid usages in mimiviruses, as well as both the transfer RNA (tRNA) and amino acyl-tRNA synthetases. We confirmed that the codon and amino acid usages of these giant viruses are highly dissimilar to those in their amoebal host Acanthamoeba castellanii and are instead correlated with the high adenine and thymine (AT) content of Mimivirus genomes. We further describe that the set of tRNAs and amino acyl-tRNA synthetases in mimiviruses is globally not adapted to the codon and amino acid usages of these viruses. Notwithstanding, Leu(TAA)tRNA, present in several Mimivirus genomes and in multiple copies in some viral genomes, may complement the amoebal tRNA pool and may contribute to accommodate the viral AT-rich codons. In addition, we found that the genes most highly expressed at the beginning of the Mimivirus replicative cycle have a nucleotide content more adapted to the codon usage in A.castellanii.
Introduction
Acanthamoeba polyphaga mimivirus, discovered in 2003, is the first described giant virus to infect amoeba and is the founder of a new viral family named the Mimiviridae [1,2]. Since its discovery, other mimiviruses have been described that infect Acanthamoeba spp. [3,4,5,6], and others, such as Cafeteria roenbergensis, have been described that infect green algae as well as heterokonts and haptophyta [7], which are two phagotrophic protists known to feed on bacteria [8]. Phylogeny reconstruction based on conserved genes has delineated two major groups (named I and II) within the family Mimiviridae [9,10]. The first group is composed of three lineages, referred to as A, B and C, that include the vast majority of the Mimiviridae members, while C. roenbergensis virus (Crov) represents a second group [7], but the family Mimiviridae is still expanding [9,10]. In addition, other giant, though smaller, amoebal viruses, including Marseillevirus and closely related viruses, have been discovered since 2008 and compose a second proposed family, the ‘Marseilleviridae' [11,12,13].
Overall, mimiviruses and marseilleviruses appear to be common inhabitants of our biosphere, especially in water, and a growing body of evidence indicates that amoebal mimiviruses can be present in humans and can cause pneumonia [14,15,16,17,18,19]. Acanthamoebae spp., the hosts for these viruses, are free-living amoebae that are present worldwide in fresh and marine water and in soil [20,21]. They can ingest any particle larger than 0.5 µm, including bacteria, fungi and giant viruses, some of which can survive and multiply within their amoebal host and are thus called amoeba-resisting microorganisms (ARMs) [22,23,24]. Multiple ARMs can live sympatrically within the same amoeba [11,25], and it is therefore strongly suspected that these protists are hot spots for gene exchange between ARM genomes, including the mimiviruses and marseilleviruses, and are therefore genitors of mosaic genomes [23,26].
Mimiviruses and marseilleviruses have been linked to the nucleocytoplasmic large DNA viruses, a monophyletic group of large DNA viruses that infect animals and unicellular eukaryotes and include poxviruses, asfarviruses, iridoviruses, ascoviruses and phycodnaviruses [2,27,28,29]. These viruses were recently proposed to be reclassified within a new order named the ‘Megavirales' [9]. One of the most remarkable and intriguing features revealed by the analyses of the gene content of mimiviruses was the presence of several genes encoding proteins involved in the translation apparatus, including amino acyl-transfer (t)RNA synthetases and elongation factors [2], which suggested that these viruses may not completely rely on the host translation machinery for their replication. The presence of these genes and of others that encode proteins involved in nucleotide biosynthesis (e.g. ribonucleotide reductase), DNA replication and repair (e.g. DNA polymerase B-family) and transcription (e.g. DNA-dependent RNA polymerase II) enabled phylogenetic and phyletic studies encompassing proteins from bacterial, archaeal, eukaryotic and viral lineages. Strikingly, these analyses suggested that mimiviruses and the other ‘Megavirales' members compose a fourth domain of life [30], an issue that is debated but supported by recent findings [31,32,33].
Moreover, the genomes of mimiviruses were found to encode tRNAs, as previously observed in a few other large DNA viruses, including phycodnaviruses, baculoviruses, herpesviruses, myoviruses and siphoviruses [34,35]. Remarkably, a study of evolutionary patterns in the sequences and structures of tRNAs from archaea, bacteria, eukaryotes and viruses indicated that viruses compose an ancient lineage that originated early in the archaeal lineage, before eukaryotes and bacteria [36]. The genomes of mimiviruses are also particularly rich in adenine and thymine (AT; approximately 75% of the nucleotide content) [2,9]. In addition, it was reported that Mimivirus and other large DNA viruses display different amino acid compositions in their proteome compared with cellular organisms. Notably, in the proteomes of these viruses, it was shown that Asp is preferred over Glu, and Tyr is preferred over Phe, while the opposite is observed in most eukaryotic genomes [37]. Likewise, codon usage has been noted to be highly dissimilar for Mimivirus and its amoebal host [2]. In addition, in Mimivirus, codon usage bias, which corresponds to the unequal frequency of synonymous codons over coding DNA within genomes [38], was described as influenced by mutational pressure and translational selection, while amino acid usage was proposed to be guided by factors including aromaticity and cysteine content [39].
We analyzed codon and amino acid usages as well as tRNAs and amino acyl-tRNA synthetase distributions in the mimiviruses and in Acanthamoeba castellanii, their eukaryotic host.
Materials and Methods
Nucleotide sequences of the genes predicted to encode proteins from the genomes of giant viruses that infect amoebae were either available in our laboratory or were downloaded from the NCBI GenBank nucleotide sequence database. One or two representative viruses were chosen for the three lineages, A (Mimivirus; NC_014649 [2,40]), B (Moumouvirus; NC_020104 [41]) and C (Megavirus chiliensis; JN258408 [5] and LBA111; NC_020232 [16]), of mimiviruses associated with amoebae [9,42]. More distantly related mimiviruses for which the genome was analyzed included Crov (NC_014637) [7] as well as Organic Lake phycodnavirus 2 (HQ704803) [43] and Phaeocystis globosa virus 12T (HQ634147) [44], which were recently reclassified in the family Mimiviridae [10].
Nucleotide composition, codon usage, amino acid usage, effective number of codons, relative synonymous codon usage (RSCU), codon adaptation index (CAI) and the relative deoptimization codon index (RDCI) were calculated using the CAIcal server (http://genomes.urv.es/CAIcal/) [45,46]. Codon and amino acid usages are expressed as percentages and reflect the contribution made by each codon or amino acid, respectively.
RSCU assesses the frequency of use of a particular codon relative to the expected frequency of usage of this codon in the absence of codon usage bias [45]. RSCU values range from 0, if the codon is absent, through 1, if there is no codon usage bias, to 6, if a single codon is used from a six-codon family. CAI measures the synonymous codon usage bias by determining the level of similarity between the synonymous codon usage of a gene and the synonymous codon frequencies of a reference set [47,48]. CAI values range from 0 to 1 when a gene encodes each amino acid using the most frequently used synonymous codon in the reference set of codon usage. This parameter has been used to compare codon usages, to assess viral gene adaptation to hosts and to predict levels of gene expression [47,48,49,50]. RCDI measures codon deoptimization by comparing the similarity in codon usage of a given gene to that of a reference genome [51,52]. Furthermore, from a genetic point of view, estimation of RCDI may provide insight into the degree of coevolution between hosts and viruses. A low RCDI might indicate a high adaptation to a host. Moreover, a high RCDI might also indicate that some genes are expressed in latency phases or even that the virus might have a low replication rate. Transfer RNA sequences were detected in amoebae-associated giant viruses and in A. castellanii using the tRNA-Scan-SE website [53] and the Aragorn software [54], and manually checked using nucleotide BLAST searches against the viral genomes with formerly identified tRNAs [55]. The codon usage table for the A. castellanii genome was recalculated from the 14,012 mRNA coding DNA sequence, CDS, available from the NCBI GenBank nucleotide sequence database using the CAIcal server (http://genomes.urv.es/CAIcal/) [45,46,56]. The codon usages for the A. castellanii mitochondrion and for Homo sapiens were taken from the codon usage database (available at http://www.kazuka.or.jp/codon/).
The transcription profile of the first generation of Mimivirus corresponded to the transcript abundances previously determined by Legendre et al. [40]. For each gene, the total number of normalized read counts encompassing the different time points of the replication cycle and those determined at T = 0 h (T0) and 12 h (T12) of the viral replication cycle were used [40]. Phylogeny reconstruction of tRNA was performed by sequence alignment using the muscle software [57] and curated using gblocks [58]; then, the maximum likelihood method was used in the Mega v.5 software [59].
Plots were built in Microsoft Excel. Statistical analysis of the data was performed using a corrected χ2 test or Fischer's exact test for comparison of proportions, while comparisons of means were performed with OpenEpi Epidemiologic Calculators v.2.3.1 (www.OpenEpi.com). p values <0.05 were considered to be statistically significant.
Results
The mean guanine and cytosine (GC)% for Mimivirus genes range from 23.4% for Crov to 33.5% for P. globosa virus 12T (fig. 1). When combining together all 3,571 genes from Mimivirus, Moumouvirus, M. chiliensis and Crov, the mean GC% ± standard deviation (SD) was 26.3% ± 5.0. A set of 150 genes (4.2%) displayed a GC% 2 SD (36.3) above the mean, whereas only 27 genes (0.7%) displayed a GC% 2 SD (16.3) below the mean value (online suppl. fig. S1; for all online suppl. material, see www.karger.com/doi/10.1159/000354557). In the first set of 150 genes, 115 were predicted to encode hypothetical proteins, whereas genes encoding several collagen-like proteins as well as capsid proteins and a putative regulator of chromosome condensation were among the 35 remaining genes. A total of 15 Mimivirus genes were among the top 20 most expressed as determined previously by transcriptomics [40], and 5 of these 15 genes have products that were identified in an earlier proteomic study as being incorporated into Mimivirus particles [60]. In the second set of 27 genes, all but 1 gene (a Crov gene encoding a putative N-acetyl transferase) were predicted to encode hypothetical proteins and belong to Crov.
Codon usages are very similar between studied mimiviruses and very dissimilar between mimiviruses and A. castellanii (fig. 2a, 3; online suppl. fig. S2-S3). The two most used codons in the mimiviruses are the AT-rich codons AAA and AAT, which encode lysine and asparagine, respectively (fig. 2a, 3; online suppl. fig. S2). In the set of mimiviruses studied here, these codons compose between 6.3 and 9.6% of all codons, with a mean value ± SD of 1.6% ± 1.8 (range 0-9.6; table 1). As a comparison, AAA and AAT are the 2nd (4.9%) and 30th (1.4%) most frequently used codons in Marseillevirus, and the 46th (0.6%) and 49th (0.5%) most frequently used codons in A. castellanii (fig. 2a, 3). Conversely, the most frequently used codons in A. castellanii are GAG (5.2%), GCC (4.9%) and AAG (4.7%), which encode glutamic acid, alanine and lysine, and are the 41st (0.6%), 49th (0.4%) and 34th (1.1%) most frequently used codons in Mimivirus. Amino acid usages of predicted proteins in mimiviruses correlate with the AT-rich genomes of these viruses, and they are very similar among studied mimiviruses (fig. 2b, 4a, b). Indeed, the most frequently encountered amino acids in Mimivirus proteins are those encoded by AT-rich codons, namely Ile (ATT), Lys (AAA), Asn (AAT) and Leu (TTA), in decreasing order of frequency in Mimivirus. The Mimivirus amino acid usage is considerably different from the usage in A. castellanii, the host for these largest viruses, whereas it tends to be less dissimilar between Marseillevirus, another amoebae-associated giant virus, and A. castellanii (fig. 4a, b).
Amino acid usage and presence/absence of transfer RNA and amino acyl-tRNA synthetase in giant viruses of Acanthamoeba spp. and their host

Codon (a) and amino acid (b) usages for genes of mimiviruses including Mimivirus, Moumouvirus, M. chiliensis, LBA111, Crov, Organic Lake phycodnavirus 2 and P. globosa 12T.
Codon (a) and amino acid (b) usages for genes of mimiviruses including Mimivirus, Moumouvirus, M. chiliensis, LBA111, Crov, Organic Lake phycodnavirus 2 and P. globosa 12T.
Amino acid usage of Mimivirus (a, left) and Marseillevirus (a, right) compared to that of A. castellanii, and of Mimivirus (b, left), Moumouvirus (b, middle) and M. chiliensis (b, right). Amino acyl-tRNA synthetase and tRNA encoded by the genomes of Mimivirus (b, left), Moumouvirus (b, middle) and M. chiliensis (b, right) are indicated. Arrows indicate amino acyl-tRNA synthetases present only in some lineages of amoebae-associated mimiviruses.
Amino acid usage of Mimivirus (a, left) and Marseillevirus (a, right) compared to that of A. castellanii, and of Mimivirus (b, left), Moumouvirus (b, middle) and M. chiliensis (b, right). Amino acyl-tRNA synthetase and tRNA encoded by the genomes of Mimivirus (b, left), Moumouvirus (b, middle) and M. chiliensis (b, right) are indicated. Arrows indicate amino acyl-tRNA synthetases present only in some lineages of amoebae-associated mimiviruses.
In regards to amino acyl-tRNA synthetases, asparaginyl-tRNA synthetases are present in amoebae-associated mimiviruses of lineages B and C but are absent from genomes of amoebae-associated mimiviruses of lineage A and of some other distant mimiviruses including Crov (table 1). In addition, a tryptophanyl-tRNA synthetase is only present in amoebae-associated mimiviruses of lineage C. Overall, these amino acyl-tRNA synthetases do not match the amino acids most frequently found in Mimivirus-predicted proteins (fig. 4b). In addition, phylogeny reconstructions based on these amino acyl-tRNA synthetases show topologies that are similar to those based on core genes of mimiviruses and other members of the ‘Megavirales', which delineates the three lineages A-C for mimiviruses of amoebae (fig. 5a; online suppl. fig. S4-S6).
Phylogeny reconstruction generated using MEGA 5 with the maximum likelihood method based on tyrosyl-tRNA synthetases (a) and Leu-tRNA (b) of mimiviruses. Probabilities are noted near branches as a percentage and are used as confidence values of tree branches. Only probabilities at major nodes are shown. The scale bar represents the number of estimated changes per position for a unit of branch length. Anticodons are indicated in parentheses after the type of tRNA. PBCV = Paramecium bursaria Chlorella virus.
Phylogeny reconstruction generated using MEGA 5 with the maximum likelihood method based on tyrosyl-tRNA synthetases (a) and Leu-tRNA (b) of mimiviruses. Probabilities are noted near branches as a percentage and are used as confidence values of tree branches. Only probabilities at major nodes are shown. The scale bar represents the number of estimated changes per position for a unit of branch length. Anticodons are indicated in parentheses after the type of tRNA. PBCV = Paramecium bursaria Chlorella virus.
As for tRNAs, overall, they are not necessarily the cognate for codons and amino acids that are the most abundant in mimiviruses (table 1). Amongst the mimiviruses, only Crov has a genome that encodes Lys(AAA)-tRNA, present in three copies amongst the 22 tRNA genes, and no Mimivirus genome encodes Asn(AAT)-tRNA, while Lys(AAA)- and Asn(AAT)-tRNA represent 0.8 and 0.2% of the A. castellanii tRNA pool, respectively. In contrast, the most frequently encountered tRNA in Mimivirus genomes is the Leu(TTA)-tRNA, which accounts for 17 of the 50 Mimivirus tRNA (34%), and TTA is between the 3rd (in Crov) and the 8th (in Mimivirus) most frequently used codon in proteins of mimiviruses. 9 Leu(TTA)-tRNA genes are particularly present in the Crov genome, which represent 45% of the Crov tRNA pool. In contrast, Leu(TTA)-tRNA is one of the less frequent tRNAs amongst those detected in the A. castellanii genome. Leucine is frequently found in both Mimivirus and A. castellanii proteins, but, in the first case, it is mostly encoded by TTA (anticodon TAA) and, in the second case, it is mostly encoded by CTG and CTC (fig. 2, 3). In addition, 24 Mimivirus tRNAs (48% of the total number) correspond to 1 of the 10 most frequently used codons in mimiviruses, and 20 of these 24 tRNAs correspond to one of the 10 less frequently used codons in A. castellanii. Besides, the Mimivirus tRNA pool generally does not complement amino acyl-tRNA synthetases present in mimiviruses with the exception of Trp-tRNA in M. chiliensis and Cys-tRNA in Mimivirus and Moumouvirus (fig. 4b). Notably, phylogeny reconstructions based on these Mimivirus tRNAs show in almost all cases topologies that delineate the previously defined lineages of mimiviruses of amoebae (fig. 5b; online suppl. fig. S7-S11).
We correlated the values of transcript abundance obtained for Mimivirus genes by transcriptomics [40] to those of CAI and RDCI determined with the A. castellanii codon usage as reference (fig. 6a, b). A total of 6 Mimivirus genes (38%) were identified as the most transcribed at T0 amongst the 16 genes with a CAI 2 SD (0.30) above the mean value for all genes. All of these 16 genes encode hypothetical proteins, 2 of which (R705, L725) were previously detected by proteomics inside Mimivirus particles [60], and 5 had been identified as being amongst the 20 most expressed genes. In addition, of the 47 with an RDCI in the lowest 5th percentile (<4.33), 30 genes (64%) were identified as the most transcribed at T0. These 30 gene products include an ankyrin-containing protein (L675), a VV A18-like helicase (L173), a choline dehydrogenase (L128) and two serine/threonine protein kinases (R826 and R831). Of note, the three latter proteins were predicted to be involved in lateral gene transfer with eukaryotes, the two serine/threonine protein kinases being predicted to be of amoebal origin [61,62]. In addition, 5 of these 30 genes were among the 20 most expressed genes. Next, we considered the proportions of Mimivirus genes with a transcript abundance above the median value (69) calculated for all genes, within a set of 50 genes with the lowest RDCI (mean 3.52 ± 0.77; range 1.53-4.35) indicating they were the genes most adapted to A. castellanii. Among genes with a low RDCI, the proportion of highly transcribed genes was statistically significantly higher at T0 than at T12 in the Mimivirus replicative cycle (28/50 vs. 4/50; p < 1e-3). In addition, a significantly higher proportion of genes had an RDCI below the median value (7.1) amongst the 50 most transcribed Mimivirus genes when the RDCI was measured at T0 compared to T12 of the replicative cycle [48/50 (96%) vs. 23/50 (46%); p < 1e-3]. Additionally, the mean RDCI values were significantly lower for the most expressed genes at T0 than for the most expressed genes at T12 [4.60 ± 1.49 (2.15 ± 8.94) vs. 7.62 ± 2.65 (3.18-16.85); p < 1e-3]. Finally, the CAI determined for Mimivirus genes in reference to the codon usage tables of the A. castellanii genome, the A. castellanii mitochondrial genome, the Dictyostelium discoideum genome (a soil amoeba) and the H. sapiens genome indicates that the Mimivirus codon usage is less adapted to that inferred from the A. castellanii genome (online suppl. fig. S12).
Correlation of Mimivirus transcript abundance of RDCI (a) and CAI (b) for Mimivirus genes in reference to the A. castellanii codon usage table.
Correlation of Mimivirus transcript abundance of RDCI (a) and CAI (b) for Mimivirus genes in reference to the A. castellanii codon usage table.
Discussion
The codon usage in mimiviruses is biased by the high AT content of the genomes of these viruses. In addition, the most frequently encoded amino acids in Mimivirus proteins generally correspond to the AT-rich composition of the Mimivirus genomes. There are some exceptions, such as leucine, which is encoded by both AT-rich and GC-rich codons. Moreover, the amino acid composition of Mimivirus-predicted proteins appears not to be one that is favored by the tRNA pool of mimiviruses, but rather corresponds to the AT-rich composition of the genomes of mimiviruses. Thus, in a majority of cases, the tRNAs of mimiviruses do not predispose these viruses to a better adaptation to their high AT content, nor do they correspond to the codons frequently used by these viruses but rarely used by the amoebal genes. An exception is Leu(TAA)-tRNA encoded by the genome of mimiviruses, which is present in up to 9 copies in the Crov genome and is concurrently very rare in A. castellanii. The latter pattern observed for Leu(TAA)-tRNA distribution among the mimiviruses and A. castellanii is similar to that described in phages and their bacterial hosts in a previous study [63]. It was observed that tRNAs present in phages tended to simultaneously fit codons that are highly used in phage genes and that are rarely used in host genes, which led to the hypothesis that a selective recruitment of tRNAs could compensate for differences in codon usages in phages and their hosts. Such a pattern has also been recently shown for cyanophages of the T4-like Myoviridae family that infect oceanic Prochlorococcus and Synechococcus hosts [64]. Thus, these cyanophages have genomes that can be more AT-rich than that of their host, but they concurrently have their own specific set of tRNAs that complements that of their host to accommodate the viral AT-rich codons. Overall, previous findings suggest that Acanthamoebae spp. may not be the natural hosts for mimiviruses.
The phylogeny based on the tRNAs encoded by the genomes of mimiviruses shows the same topology as that observed for the phylogeny based on Megavirales core genes. These observations are congruent with previous results obtained by Sun and Caetano-Anolles [36] using a comprehensive database of tRNAs, which suggested that the sequence and structure of these tRNA molecules are considerably conserved and may carry evolutionary signatures. Moreover, these data are congruent with the previous description of 3 lineages amongst amoebae-associated mimiviruses [9,42]. In addition, we detected a set of Mimivirus genes for which the nucleotide composition and codon usage bias differ significantly from that of the majority of genes, which suggests they might have been involved in lateral gene transfer events. An interesting finding was the correlation revealed between the Mimivirus gene transcription levels measured at T0 of the replicative cycle in a previous experiment and the adaptation of codon usage of these genes to the amoebal host. This observation suggests that the mechanisms of gene expression in the beginning phase of the Mimivirus replicative cycle may differ from the mechanisms for gene expression in later phases. Thus, apart from viral RNA transcripts incorporated into Mimivirus particles [65], Mimivirus gene expression may first rely primarily on the amoebal machinery and, then, may become increasingly adapted to Mimivirus codon and amino acid usages.
Disclosure Statement
No potential conflict of interest or financial disclosure for any authors.