Abstract
Introduction:Pseudomonas stutzeri KC can rapidly degrade carbon tetrachloride (CCl4) to CO2 by a fortuitous reaction with pyridine-2,6-bis(thiocarboxylic acid), a metal chelator encoded by pdt genes. These genes were first identified after a spontaneous mutant, strain CTN1, lost the ability to degrade CCl4. Methods: Here we generated the complete genome of strain KC and carried out comparative genomic analyses to illuminate the evolutionary history of the pdt genes. Results: The pdt genes are located on an integrative and conjugative element (ICE), designated ICEPsstKC. Homologs of pdt genes were found in other genomes of members of gammaproteobacterial orders. Discrepancies between the tree topologies of the deduced pdt gene products and the host phylogeny based on the 16S rRNA gene sequence provided evidence for horizontal gene transfer (HGT) in several sequenced strains of these orders. In addition to ICEPsstKC, HGT may be have been facilitated by other mobile genetic elements, as indicated by the location of the pdt gene cluster adjacent to fragments of other ICEs and prophages in several genome assemblies. Furthermore, we show that the majority of cells from the culture collection DSMZ had lost the ICE. Conclusion: The presence of the pdt gene cluster on mobile genetic elements has important implications for the bioremediation of CCl4 and needs consideration when selecting suitable strains.
Introduction
Carbon tetrachloride (tetrachloromethane, CCl4) is a volatile chlorinated solvent used for decades as a fire extinguishing and degreasing agent, solvent for dry cleaning and plutonium recovery, pesticide, and grain fumigant [1]. It is also a suspected carcinogen, ozone-depleting agent, and common legacy contaminant [2]. Many redox-active biomolecules can dechlorinate CCl4 at appreciable rates [3‒6]. However, the reactions typically yield chloroform (CHCl3), a known human carcinogen that can be even more persistent in the environment than CCl4 [3, 4, 7]. Of particular interest in this regard is the metal chelator pyridine-2,6-bis(thiocarboxylic acid) (PDTC) [8, 9], which upon binding copper(I) rapidly dechlorinates CCl4 without generating CHCl3 [10, 11]. Its microbial function appears to be that of a secondary siderophore [12], but it also binds various other transition metals in addition to Fe and Cu, including lanthanides, actinides, and some toxic metalloids [13]. PDTC production and secretion has been experimentally confirmed for the bacteria Pseudomonas stutzeri KC (ATCC 55595, DSM 7136), Pseudomonas putida DSM 3601, and Pseudomonas sp. strain DSM 3602 [14‒16] (it has been proposed that the species-rich genus Pseudomonas should be divided into several genera, with P. stutzeri being reclassified as Stutzerimonas stutzeri [17]. For the sake of familiarity and because there is not yet corresponding uniformity in taxonomic databases, we use the old classification name here, which is permissible according to the International Code of Nomenclature of Prokaryotes [18]).
Laboratory and field-scale CCl4 remediation has been demonstrated with P. stutzeri KC [19‒22]. The identification of genes responsible for the PDTC biosynthesis phenotype was facilitated by the observation of a spontaneous loss of CCl4-transforming activity in a laboratory culture of strain KC [23]. This loss was traced to a chromosomal deletion of approximately 170 kb estimated via pulsed-field gel electrophoresis in a PDTC biosynthesis-negative mutant strain designated CTN1 [23]. Complementation of strain CTN1 identified a ∼25 kb pdt locus within the large chromosomal deletion, and subsequent saturation mutagenesis enabled identification of genes for PDTC synthesis [23]. The results were supported by transposon mutagenesis of strain KC followed by screening for a CCl4 degradation-negative phenotype [24]. Currently, the pdtFGHIJ genes are hypothesized to encode the full suite of proteins necessary to catalyze the formation of PDTC from dihydrodipicolinic acid [25, 26].
Prior to this study, two draft genomes of P. stutzeri KC of different sizes were available in GenBank. The first genome assembly was generated at the Norwegian University of Life Sciences (NMBU) in 2017 and comprises 4,615,749 bp in 18 contigs (GenBank accession number GCA_002890795.1). The second assembly is from the University of the Balearic Islands (UIB) and is about 170 kb larger with 4,785,124 bp in 28 contigs (GCA_024448415.1) [17]. Here, we investigated the reason for the size difference by re-sequencing the genome of strain KC from two culture collections (Deutsche Sammlung für Mikroorganismen und Zellkulturen, DSMZ, and American Type Culture Collection, ATCC) and discovered that the pdt gene cluster is located on an integrative and conjugative element (ICE) that is present only in the larger genome assembly.
ICEs are self-transmissible, chromosomally integrated mobile genetic elements involved in horizontal gene transfer (HGT) [27‒30]. They consist of a modular backbone of core genes encoding excision, conjugative transfer and site-specific integration, and typically a suite of accessory genes encoding functions that affect ICE stability and host niche adaptation. Examples of accessory functions are antibiotic resistance [31], resistance to heavy metals [32], rhizobial nodulation [33], and degradation of hazardous chemicals [34, 35]. In the context of this study, it is important to note that some ICE play a prominent role in the evolution of metabolic pathways and the capability for synthesis of secondary metabolites that enable biotransformation of xenobiotic compounds [36‒39]. We here describe the ICE of strain KC including its accessory gene content with the pdt gene locus and provide evidence for other associations of pdt genes with mobile genetic elements in various gammaproteobacterial genomes. Furthermore, we could show that the culture from the DSMZ collection includes cells not harboring the ICE, leading to the conclusion that is should be replaced.
Materials and Methods
Strains, Media, and General Cultivation Conditions
P. stutzeri KC (ATCC 55595, DSM 7136) was obtained from the ATCC (Manassas, VI, USA) and the DSMZ (Braunschweig, Germany). Cultivation was performed at 30°C under oxic conditions using DSMZ medium 1 as nutrient agar (1.5%) or broth (5 mL in 100-mL Erlenmeyer flasks, 100 rpm).
Genome Sequencing and Assembly
DNA was extracted using the QIAmp® Mini Kit. The presence of the pdt locus and the ICE in the genome was tested by PCR (40 cycles) with previously published primers [25] and with those listed in online supplementary Table S1 (for all online suppl. material, see https://doi.org/10.1159/000538783). A NEBNext Ultra DNA sequencing library was prepared using 50 ng of the extracted genomic DNA. The insert size was estimated to be 650 bp based on measurements on an Agilent Genomics Bioanalyzer. Sequencing was performed on an Illumina MiSeq (2 × 150 bp paired end reads), resulting in 1.8–5.2 million paired reads for the cultures from the ATCC (sequenced 8 times) and DSMZ (sequenced twice). Additional sequencing of the ATCC culture was carried out using Oxford Nanopore Tecchnologies (ONT; Oxford, UK). The ONT libraries were prepared from 200 ng DNA using the Rapid Barcoding kit (SQK-RBK004; ONT) with SPRI bead clean-up (AMPure XT beads; Beckman Coulter) according to the manufacturer’s protocol. Two Flongle flow cells (FLO-FLG001) were primed with the Flongle sequencing expansion kits (EXP-FSE001, EXP-FLP002) following the manufacturer’s protocols, and 15 fmol each of the total sequencing library was loaded into each. ONT sequencing was performed on a MinION MK1b with MinKNOW software (19.06.8) for 29 h with default settings, generating 23.3k usable reads with a total size of 54.7 Mb (×11.4). The longest read generated was 24,164 bp.
Illumina raw reads were quality filtered and trimmed using Trimmomatic-0.32 (parameters: LEADING: 5; TRAILING: 5; SLIDINGWINDOW: 4:15; MINLEN: 80) [40]. FLASH (parameters: m 50; r 220; f 450; s 100; x 0.1) [41] was used to merge reads. Assembly of the quality-filtered reads was performed using SPAdes 3.15.5 [42]. Contigs with a coverage of less than ×50 and a size less than 1,000 bp were removed. Hybrid assemblies were generated with Unicycler v0.4.4 [43]. Computational gap closure was carried out using Mauve and Geneious Prime 2022.1.1. Annotations were carried out with Prokka [44], the NCBI annotation pipeline, and manual curation. The chromosome map of strain KC was generated with GView 1.7 [45].
Further Genome Analyses Including ICE Search
Average nucleotide identity (ANI) was calculated using the Pyani package [46] and digital DNA-DNA hybridization using GGDC 2.1 (Genome to Genome Distance Calculations) [47]. Whole genome alignments were carried out with progressiveMauve [48]. Alignments were visualized using the Bioperl module, AliTV (Ankenbrand) [49]. The dbCAN2 server was used for automated carbohydrate-active enzyme annotation [50]. Prophage sequences in genomes with pdt genes were searched for with PHASTER [51]. The plasmid in Burkholderia gladioli BSR3 was tentatively typed by MOB-recon analysis [52]. To find additional MPFT ICE sequences, a BLASTP search of the NCBI database (last accessed June 2023) was performed using the type 4 secretion system (T4SS) VirB4 domain protein from P. stutzeri KC as a reference. Genomes were downloaded and searched for complete MPF systems using the MacSyFinder [53] modules CONJScan [54] and TXSS [55]. A genomic region was considered a complete ICE if both the attL and attR direct repeat boundaries could be delineated, the boundaries were located on the same contig, and if it harbored genes encoding an integrase, a relaxase, and a complete MPF system. If only one boundary was found and the host genome was closed, the ICE was considered degraded. ICEs were considered fragmented if the host genome was not closed and either only one boundary was found or the boundaries were on different contigs. If an ICE sequence was found, it was annotated with Prokka 1.11 if necessary [44]. Annotated sequences were manually curated to correct for missing small open reading frames.
Results and Discussion
Full Genome Assembly of Strain KC
For brevity, the two draft assemblies of P. stutzeri KC available in GenBank prior to this study will be referred to by the acronyms of the respective institutions, NMBU and UIB, where they were generated. Both assemblies were constructed using Illumina MiSeq sequence reads. Genome coverage of the UIB assembly was ×544 compared to ×23 for the NMBU assembly, but it is unlikely that the coverage difference resulted in a substantial size difference given the low number of contigs in both assemblies. We compared the two assemblies by whole genome alignment using progressiveMauve [48] and found that the UIB assembly contained a contig of 182 kb (contig 17, JAMOHP010000017) with no homologous sequence in the NMBU assembly. The 12 kb discrepancy between the 170 kb difference (4.615 Mb vs. 4.785 Mb) in total assembly size and the length of contig 17 was due to several slightly shorter contigs in the UIB assembly. Further alignment with Mauve and manual inspection revealed that contig 17 contained the previously discovered pdt gene cluster (AF196567 and AF149851 [23, 25]) and several ICE-related genes (described in detail in the next section). These genes were not present in the NMBU assembly. The bacterial source for the NMBU assembly was the DSMZ, and for the UIB assembly it was from a personal collection (Tiedje lab, personal communication).
To further elucidate the reason for the size difference between the two assemblies, we re-sequenced the genome of P. stutzeri KC obtained from two strain collections, the DSMZ and the ATCC. Prior to sequencing, the presence of the pdt locus was verified by PCR with previously published primers [25]. Both of our individually assembled draft genomes contained a contig of approximately 182 kb with the pdt gene cluster that was essentially the same as contig 17 from the UIB assembly. However, during quality control by read mapping, we noticed that in the DSMZ assembly the coverage of the 182 kb-large contig was only about one-third of the coverage of the large remainder of the genome (×131 vs. ×416). Read coverage was however evenly distributed in the ATCC assembly. DNA was prepared for the DSMZ and ATCC assemblies from cultures that were inoculated with a portion of the lyophilized pellet from the respective culture collection. The results indicated that approximately two-third of the cells in the lyophilized culture from the DSMZ did not contain the DNA segment that yielded the 182 kb contig. We tested this hypothesis by two approaches: (i) DNA was directly isolated from portions of each freeze-dried stock from the ATCC and DSMZ without culturing and individually sequenced and (ii) by a PCR screen using primers listed in online supplementary Table S1 for three regions of the contig, including the pdt locus, in isolated single colonies obtained by direct plating from dilution series of the freeze-dried cultures. With the DSMZ stock, direct sequencing resulted in an assembly with a 182 kb contig with approximately one-third coverage of the remainder, and 38 of the 96 tested colonies were PCR positive for the queried regions. In contrast, sequenced DNA from the ATCC stock showed no evidence of ICE loss based on read coverage, and all 96 tested colonies were PCR-positive for the pdt locus. Therefore, we conclude that the DSMZ stock culture contained a mixture of cells of strain KC and another P. stutzeri strain that lacked the pdt locus but was otherwise identical to strain KC, i.e., similar or identical to strain CTN1. Given the substantial proportion of the latter cells in the stock culture, it is easy to imagine that a culture derived from one of these cells was used to generate the NMBU assembly. We therefore initiated the process of replacing the culture of DSMZ.
Next, we closed the P. stutzeri KC genome by hybrid assembly of Illumina paired-end reads and long reads obtained from ONT sequencing using Unicycler v0.4.4 [43]. The initial draft assembly consisted of four contigs (0.4–1.7 Mbp) with breaks at copies of the rRNA operon and the elongation factor Tu gene. Then, contig ordering was carried out in Mauve using the genome of P. stutzeri strain R2A2 (CP029772) as reference, and computational gap closure was done in Geneious Prime 2022.1.1. The final circular chromosome with a GC content of 61.7% has 4,813,886 bp and harbors 4,381 putative protein-coding sequences (CDS), 60 tRNA genes, and four rRNA operons (CP139348; Fig. 1). Based on ANI, the genome of strain KC is most similar to that of strain R2A2 and strain DW2-1 (CP027543) (online suppl. Fig. S1). As expected from previous experimental observations, the genome of strain KC contains genes necessary for denitrification (narG, nirS, norB, nosZ) and maltose metabolism [7, 24, 56‒58]. The genes pdtCDEFGHIJON involved in the biosynthesis of PDTC from dihydrodipicolinic acid and its transport across the membrane [25, 26] are present on a 182.3 kb ICE, designated ICEPsstKC, which is described in the following.
P. stutzeri KC genome (a) with sketches of ICEPsstKC (b) and the pdt gene cluster (c). In panel (a), the circles from the outward to the inside represent the nucleotide sequence in million bp with absolute numbering starting at the predicted ori, the CDSs on the forward and lagging DNA strands in dark purple, the GC% content in black, and the GC skew in navy and olive. In panel (c), the cluster is shown in reverse orientation for readability. Annotations are as follows: pdtC, AraC family transcriptional regulator; pdtD, hypothetical protein; pdtE, membrane transport protein; pdtF, putative sulfurylase, MoeB family; pdtG, M67 family metallopeptidase; pdtH, MoaD/ThiS family protein; pdtI, CoA transferase family protein; pdtJ, class I adenylate-forming enzyme family protein; pdtK, TonB-dependent receptor; pdtL, thiamine pyrophosphate-binding enzyme, putative decarboxylase; pdtM, pyridoxal phosphate-containing aminotransferase; pdtN, transmembrane transporter, major facilitator superfamily; pdtQ, β-glucuronyl hydrolase; pdtO, acyl-CoA dehydrogenase family protein; pdtP, class I SAM-dependent methyltransferase.
P. stutzeri KC genome (a) with sketches of ICEPsstKC (b) and the pdt gene cluster (c). In panel (a), the circles from the outward to the inside represent the nucleotide sequence in million bp with absolute numbering starting at the predicted ori, the CDSs on the forward and lagging DNA strands in dark purple, the GC% content in black, and the GC skew in navy and olive. In panel (c), the cluster is shown in reverse orientation for readability. Annotations are as follows: pdtC, AraC family transcriptional regulator; pdtD, hypothetical protein; pdtE, membrane transport protein; pdtF, putative sulfurylase, MoeB family; pdtG, M67 family metallopeptidase; pdtH, MoaD/ThiS family protein; pdtI, CoA transferase family protein; pdtJ, class I adenylate-forming enzyme family protein; pdtK, TonB-dependent receptor; pdtL, thiamine pyrophosphate-binding enzyme, putative decarboxylase; pdtM, pyridoxal phosphate-containing aminotransferase; pdtN, transmembrane transporter, major facilitator superfamily; pdtQ, β-glucuronyl hydrolase; pdtO, acyl-CoA dehydrogenase family protein; pdtP, class I SAM-dependent methyltransferase.
CCl4 Transformation Capacity Is Encoded on an ICE
ICEPsstKC is located on the strain KC chromosome between the genes for tRNA-ProGGG and the Crp/Fnr family transcriptional regulator (absolute position 2,012,489–2,194,811; Fig. 1). It is flanked on the left (attL) and right (attR) by the last 42 bp of the 3′ end of the tRNA-ProGGG gene. It is known that other ICEs mediate chromosomal integration into tRNA gene loci [28]. ICEPsstKC consists of 142 putative CDSs including the pdt locus and has a GC content of 60.5% (online suppl. Table S2). Queries of the CDSs against Pfam [59] and NCBI databases identified a putative integrase, genes associated with conjugal transfer functions including a relaxase (MOB), a type 4 secretion system (T4SS), and a mating-pair formation (MPF) system. Both MacSyFinder [53] modules CONJScan [54] and TXSS [55] classified the MPF system as type MPFT [60] and designated it as complete (online suppl. Table S3). ICEPsstKC is therefore the first identified ICE carrying the genes for PDTC biosynthesis.
The remaining approximately 53 kb of the ICE after the pdt locus is composed of various genes with closest homologs in other Pseudomonas spp. genomes including several P. stutzeri strains. About 2.2 kb downstream of the pdt genes is a putative toblerol polyketide synthase (tob) gene cluster. Toblerols are epoxide- and cyclopropanol-containing bioactive polyketides synthesized by some methylotrophic Alphaprotebacteria such as Methylorubrum extorquens AM1 and are hypothesized to modulate antibiotic activity [61]. In ICEPsstKC, the tob cluster consists of 10 genes (online suppl. Fig. S2). It was not detected by the secondary metabolite prediction tool antiSMASH [62], but BLASTP searches revealed that the deduced gene products have amino acid sequence similarity of 35–58% to the products of the tob genes located on the megaplasmid of M. extorquens AM1 (NC_012811). Homologous PKS genes were found in the closed genomes of seven Pseudomonas spp., mostly P. qingdaonensis, having perfect synteny over 73–91% coverage and approximately 83% nt similarity (online suppl. Fig. S2). Further downstream are a putative toxin-antitoxin system that may be involved in ICE stability, several genes that are predicted to be involved in the regulation of various cellular processes (a gene encoding for a GNAT-family N-acetyltransferase with GreA_B domain; fecR involved in the regulation of iron dicitrate transport, and a sigma-70 family RNA polymerase sigma factor), cell survival under unfavorable conditions (a hipBA module predicted to be involved in persistence [63]), and a 12.7 kb gene cluster for the degradation of ethanolamine(s). Together with the pdt gene cluster, these ICE components may allow for niche adaptation and enhance the competitiveness of strain KC in its natural habitat [64].
ICE Stability in P. stutzeri KC
It is conceivable that ICEPsstKC was the previously deleted chromosomal fragment from strain KC, then estimated to be 170 kb in size by pulsed‐field gel electrophoresis, that resulted in strain CTN1 lacking CCl4 transformation activity. The deletion had occurred spontaneously during long-term cultivation on rich media and was detected when an isolated colony was picked and tested for CCl4 transformation [23] (Lewis, personal communication). Loss of an ICE during culture maintenance is obviously of concern. ICE excision followed by cell division generates ICE-free cell lineages [65], and, in the absence of selective pressure, these lineages have a competitive advantage [66]. Loss of ICEPsstKC from strain KC would negate the bioremediation of CCl4-contaminated sites. To gain further knowledge of the frequency of ICEPsstKC loss from P. stutzeri KC, we grew the strain from the ATCC stock in rich nutrient broth (DSM medium 1) over 25 transfers of approximately five to ten generations each. Genomic DNA was isolated from aliquots of the first, fifth, tenth, fifteenth, twentieth, and twenty-fifth passages and sequenced. Whole genome assembly and read mapping showed that read coverage was evenly distributed across each assembly, and thus there was no evidence that a subpopulation had lost the ICE. Therefore, we cannot provide a frequency of spontaneous excision of ICEPsstKC. Future quantitative investigations of ICEPsstKC stability would require the establishment of a facile screening system. Furthermore, we cannot retrospectively determine where and when the loss of ICEPsstKC had occurred in a fraction of the cells of the DSMZ stock culture.
The application of stressors during long-term cultivation is a possible means of stabilizing ICE-encoded traits. In the case of strain KC, the stress of low trace metal bioavailability induces an ICEPsstKC-encoded fur response, resulting in the production and secretion of PDTC [9, 23]. The stressor is easily applied by adjusting the pH of defined growth media to 8.0 to 8.3, a pH range over which iron solubility is minimal. Sustained low trace metal bioavailability likely contributed to the long-term and stable CCl4 degradation achieved by P. stutzeri KC bioaugmentation of a CCl4-contaminated aquifer at Schoolcraft, MI, USA, from 1998 to 2002 [22]. After first adjusting the pH of an inoculation zone to 8–8.3 to create trace metal-limiting conditions, an inoculum of strain KC was scaled up from a single colony (tested to confirm its ability to degrade CCl4) and then injected into a series of wells intercepting the CCl4 plume. Efficient CCl4 degradation activity was observed following injection of strain KC, and this activity was maintained over a 3-year period by weekly injections of acetate and low levels of alkalinity to stimulate growth while ensuring low trace metal bioavailability. Analogous approaches may be applicable to the preservation of other ICE-encoded phenotypes that alleviate stress.
ICEs in Other Pseudomonadaceae
Previously, 277 MPFT ICEs were computationally delineated from complete genomes [67], and two were experimentally determined [68, 69]. Here, we expanded this pool of sequences by first selecting potential ICE host genomes from GenBank via a BLASTP search with the type 4 secretion system (T4SS) VirB4 domain protein from P. stutzeri KC as query sequence, followed by searching for MPF systems together with attL and attR as direct repeat boundaries in these genomes. In total, we identified 34 novel complete ICE sequences and 30 degraded or incomplete ICEs (missing at least one att boundary) in draft genomes (online suppl. Tables S4, S5). Pairwise ANI was then calculated for 313 ICE sequences, i.e., the 279 previously described and the 34 complete ICEs identified in this study. The 88 ICEs with the highest sequence similarity to ICEPsstKC are shown in Figure 2. One clade consists of seven ICE sequences with an ANI similarity >75% to ICEPsstKC. This clade includes two previously delineated ICE (Cury et al., 2017; PSST001.B.00008.C001 from P. stutzeri 19SMN4 and PSST001.B.00005.C001 from P. stutzeri DSM 10701) and five ICE sequences identified in this study: ICEPsstKC, ICEPsstODKF13, ICEPsstSLG10A3_8-1, ICEPsspCholine-3u-10, and ICEPsspR2A2.
Average nucleotide identity (ANI) of MPFT ICE sequences. Names in bold are complete ICE identified in this study; others are from reference [54]. ICE sequence names in red are those most similar to ICEPsstKC. Two different phylogenetic dendrograms to the left and on top of the heatmaps are displayed, showing varied perspectives of evolutionary histories.
Average nucleotide identity (ANI) of MPFT ICE sequences. Names in bold are complete ICE identified in this study; others are from reference [54]. ICE sequence names in red are those most similar to ICEPsstKC. Two different phylogenetic dendrograms to the left and on top of the heatmaps are displayed, showing varied perspectives of evolutionary histories.
Among the complete ICEs, only ICEPsstKC carries the pdt genes. However, we found 6 draft genome assemblies with ICE fragments and pdt homologs (online suppl. Fig. S3). In the assemblies of P. stutzeri DCP-Ps1 and Pseudomonas sp. HMP271 (both are currently listed as members of Stutzerimonas degradans in the NCBI taxonomy database), one incomplete ICE each is found, which could encompass the pdt locus, based on alignments with the complete genome of S. degradans PheN2 using Mauve. The ICE core genes are not closely related to those of ICEPsstKC; for example, the relaxases are of the MPFP-1 type rather than MPFT. The genome alignments indicate that the ICE in strain DCP-Ps1 is about 130 kb and in strain HMP271 about 90 kb in size. However, no att sites were identified. Furthermore, the genome assemblies of the three strains DCP-Ps1, HMP271, and PheN2 have been tagged as contaminated by RefSeq staff (GenBank entries last accessed January 16, 2024). Given the uncertainty about the assemblies, we did not further analyze these ICEs except their pdt locus (see next section). Nevertheless, the present assemblies indicate that ICE-mediated HGT of pdt genes is not restricted to ICEPsstKC. There are also fragment ICE sequences and pdt genes in the genomes of Pseudomonas sp. NBRC11128, P. indica PIC105 (online suppl. Fig. S3), and in Thauera linaloolentis strains 47Lol (GCA_000621305.1) and TL0 (GCA_023805265.1) (not shown for the T. linaloolentis genome assemblies since the ICE-like fragments are on at least five contigs). Additional sequencing is required to determine whether the pdt genes in these 4 genomes are located on an ICE.
Content, Diversity, and Genomic Context of the pdt Locus amongGammaproteobacteria
To further investigate the genomic distribution of pdt genes, genome assemblies of bacterial isolates available in GenBank (last accessed January 2024) were searched for homologs of pdtFGH from P. stutzeri KC and P. putida DSM3601 using BLASTN and BLASTP with the predicted protein sequences. Co-located homologs of the three genes were only found in the gammaproteobacterial orders Pseudomonadales, Alteromonadales, Burkholderiales, Nitrosomonadales, and Oceanospirillales, indicating that their co-located presence is unique to Gammaproteobacteria (online suppl. Table S6). A tree constructed using the concatenated amino acid sequence alignments of representative PdtFGH homologs reveals a separation into three major clades that is overall consistent with the phylogeny of the host organisms (Fig. 3). Exceptions are as follows: Alcaligenes faecalis, a member of the order Burkholderiales that has PdtFGH homologs similar to Pseudomonadales, and Hahella chejuensis KCTC 2396, Halomonas sp. TZB202, Halomonas huangheensis BJGMM-B45 (all Oceanospirillales), and Marinobacterium georgiense DSMZ 11526 (Alteromonadales), which are distributed among the three major clades. The different phylogenies of these pdt genes in their hosts is evidence for HGT. Previous phylogenetic analyses already suggested that ThiF-domain-containing proteins, such as PdtF, originated within the phylum Mycobacteria or Cyanobacteria and were horizontally acquired by Gammaproteobacteria [8, 70]. The PdtFGH sequences of P. stutzeri KC are most similar to those from the genome assemblies of P. stutzeri DCP-Ps1 and Pseudomonas sp. HMP271, where the pdt genes are associated with incomplete ICEs.
Concatenated tree showing relationship between amino acid sequence alignments of PdtF, PdtG, and PdtH homologs and corresponding gene organization. Sequences were aligned with ClustalW and the tree was constructed using PhyML with 50 bootstrap replicates. Organism names are colored based on taxonomic order: pink, Pseudomonadales; teal, Nitrosomonadales; blue, Burkholderiales; gold, Oceanospirillales; brown, Alteromonadales. PDTC and QB biosynthesis genes are labeled by letter. Genes in yellow are those required for PDTC biosynthesis. Genes in pink and teal are those found primarily among Pseudomonadales and Nitrosomonadales, respectively. The pdtO genes from P. stutzeri KC and Halomonas sp. TZB202 are shaded gold. Genes colored grey encode hypothetical proteins. Genes notated by a number are (1) alginate family export protein, algE, (2) amidohydrolase family protein, (3) alpha/beta hydrolase. Black stars are used to denote pdt genes associated with ICE similar to ICEPsstKC. Closed circles are pdt genes on other putative ICE. Open circles are used to denote genes found on plasmids.
Concatenated tree showing relationship between amino acid sequence alignments of PdtF, PdtG, and PdtH homologs and corresponding gene organization. Sequences were aligned with ClustalW and the tree was constructed using PhyML with 50 bootstrap replicates. Organism names are colored based on taxonomic order: pink, Pseudomonadales; teal, Nitrosomonadales; blue, Burkholderiales; gold, Oceanospirillales; brown, Alteromonadales. PDTC and QB biosynthesis genes are labeled by letter. Genes in yellow are those required for PDTC biosynthesis. Genes in pink and teal are those found primarily among Pseudomonadales and Nitrosomonadales, respectively. The pdtO genes from P. stutzeri KC and Halomonas sp. TZB202 are shaded gold. Genes colored grey encode hypothetical proteins. Genes notated by a number are (1) alginate family export protein, algE, (2) amidohydrolase family protein, (3) alpha/beta hydrolase. Black stars are used to denote pdt genes associated with ICE similar to ICEPsstKC. Closed circles are pdt genes on other putative ICE. Open circles are used to denote genes found on plasmids.
Examination of the genes proximal to pdtFGH homologs revealed synteny of seven genes required for PDTC biosynthesis according to the current model: pdtFGHIJ, [25, 26], an acyl-CoA dehydrogenase (ACAD)-encoding gene (either pdtO or pdtO′), and pdtN. A previous study noted that PdtO (AAF33139.1) from P. stutzeri KC is distinct from PdtO′ from P. putida DSM 3601, sharing only 28% aa sequence identity, and may have been independently recruited for the same biosynthetic function (Criddle et al., 2013). The only PDTC biosynthesis clusters with pdtO homologs are Pseudomonas sp. HMP271, P. stutzeri DCP-Ps1, and Halomonas sp. TBZ202, with shared amino acid similarities to PdtO from strain KC of 97.91%, 94.46%, and 87.07%, respectively. In contrast, amino acid similarities between Halomonas sp. TBZ202 of PdtF, PdtG, and PdtH and the homologs in P. stutzeri KC are 63.9%, 42.8%, and 56.7%, respectively. These observations suggest a more recent transfer of pdtO to strain KC.
Except for Nitrosomonas spp., putative PDTC biosynthesis gene clusters in other organisms encode a TonB-dependent receptor (PdtK) and an AraC family transcriptional regulator (PdtC). TonB receptors mediate substrate-specific transport across the outer membrane; hence, it is unclear how Nitrosomonas regulates PDTC synthesis and accomplishes its translocation across the outer membrane if it is indeed used for the purpose of metal sequestration by this bacterium. Most Pseudomonadales-type PDTC biosynthesis clusters encode a membrane transport protein (PdtE) [12], a thiamine pyrophosphate-binding enzyme (PdtL), a pyridoxal phosphate-containing aminotransferase (PdtM), and an S-adenosylmethionine-dependent O-methyltransferase (PdtP), which have been described in previous studies [23, 25]. Using the dbCAN2 server [50], we found that all putative PDTC-biosynthesis clusters contain a glycoside hydrolase classified as CAZy family 88 (GH88) [71]. The GH88 enzymes are unsaturated β-glucuronyl hydrolases (EC 3.2.1.) that use a vinyl ether hydration mechanism to catalyze the cleavage of thioglycosides and inverted anomeric glycosides that resist hydrolysis by classical glycoside hydrolases [72]. The ubiquity of GH88 in association with pdtFHGIJ homologs suggests a role in the biosynthesis or utilization of PDTC.
As previously reported, the pdt gene cluster shares eight homologs with the thioquinolobactin (QB) biosynthesis cluster, qbs, which encodes a small siderophore produced by P. fluorescens ATCC 17400 [73, 74]: (1) qbsA, a homolog of pdtC; (2) qbsC, a homolog of pdtF; (3) qbsD, a homolog of pdtG; (4) qbsE, a homolog of pdtH; (5) qbsI, a homolog of pdtK; (6) qbsJ, a homolog of pdtP; (7) qbsK, a homolog of pdtI; (8) qbsL, a homolog of pdtJ. In the tree shown in Figure 3, QbsCDE is in a distinct clade separate from the other PdtFGH homologs.
The genomic context of the pdt cluster in the identified hosts was further examined to investigate potential HGT mechanisms for these genes. In addition to the apparent association of pdt genes with ICEs in the draft genomes P. stutzeri DCP-Ps1 and P. sp. HMP271, and possibly Pseudomonas sp. NBRC11128, P. indica PIC105, and T. linaloolentis mentioned above, we manually screened the remaining pdt gene-containing genomes for the presence of genes frequently associated with mobile genetic elements in the vicinity of the pdt locus. Integrase and MPF genes were found proximal to the pdt genes in Pseudomonas xanthomarina UASWS0955, Pseudomonas brassicacearum DF41, and Halomonas sp. TZB202. In H. chejuensis KCTC 2396, the pdt locus is adjacent at the 5′ end to an incomplete prophage, as predicted by PHASTER [51], and at the 3′ end to a site-specific integrase gene. Similarly, in Ralstonia insidiosa ATCC 49129, the pdt locus is flanked by genes for a recombinase and integrase, respectively, and adjacent to another predicted prophage not found in other genomes deposited in GenBank. In 30 out of >400 genomes of the related Ralstonia solanacearum and Ralstonia syzgii, we found the pdt locus in the highly variable region of the megaplasmid (1.8–2.1 Mb) of these species [75]. In Burkholderia gladioli BSR3, the pdt genes are located on a 403 kb plasmid that has not yet been typed but is predicted to be conjugative based on our MOB-Recon analysis (MOBH relaxase, MOBT MPF type) [52]. Therefore, there are multiple other genomes where the presence of the pdt locus may be due to HGT, albeit not in association with ICEPsstKC. This finding suggests that the ability to synthesize PDTC may confer a considerable ecophysiological advantage to various hosts.
In summary, the accessory genes carried by P. stutzeri KC on the novel ICEPsstKC enable degradation and detoxification of CCl4 and can potentially modulate antibiotic activity through toblerol synthesis. The discovery of the pdt genes on an ICE is an important finding for the bioremediation of CCl4-contaminated sites augmented with P. stutzeri KC. The presence of the biotechnological relevant genes on mobile genetic elements has important implications for the bioremediation of contaminated sites and needs to be taken into account when selecting suitable strains.
Acknowledgments
We thank Veronica Brand for her comments on the manuscript. We also thank David Thiele for sequencing library generation, Florian Lenk for bioinformatics support, and Tom Lewis for helpful discussions.
Statement of Ethics
An ethics statement was not required for this study type since no human or animal subjects or materials were used.
Conflict of Interest Statement
The authors declare no conflict of interest.
Funding Sources
This study was funded by the Helmholtz Association of German Research Centers through its research program “PoF IV” and by the State of Baden-Württemberg, Germany, through bwHPC.
Author Contributions
Conceptualization: C.S.C. and A.K.K.; methodology: H.L.S., S.G.W., S.Y.K., and J.A.M.; formal analysis: H.L.S., S.G.W., S.Y.K., and J.A.M.; writing – original draft preparation: H.L.S. and J.A.M; writing – review and editing: all authors; supervision: C.S.C. and A.K.K.; project administration: C.S.C. and A.K.K.; and funding acquisition: C.S.C. and A.K.K. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement
The sequence reads and genome assembly of P. stutzeri strain KC are available in the NCBI database under BioProject ID PRJNA1044571.