Abstract
DNA supercoiling and nucleoid-associated proteins (NAPs) are two of the factors that govern the architecture of the bacterial genome, influencing the expression of the genetic information that it contains. Alterations to DNA topology, and to the numbers and types of NAPs, have pleiotropic effects on gene expression, suggesting that modifications to the production patterns of DNA topoisomerases and/or NAPs are likely to result in marked impacts on bacterial physiology. Knockout mutations in the genes encoding these proteins (where the mutants remain viable) result in clear physiological effects. However, genetic modifications that involve rewiring, or repositioning, of topoisomerase or NAP genes produce much more subtle outcomes. These findings demonstrate that the high-level regulatory circuitry of bacteria is robust in the face of genomic rearrangements that, a priori, might be expected to produce significant changes in bacterial lifestyle. Examples from genomic rewiring experiments, performed chiefly with the Gram-negative model bacteria Escherichia coli K-12 and Salmonella enterica serovar Typhimurium, will be used to illustrate these features. The results show not only the ability of naturally occurring bacteria to tolerate regulatory rewiring but also indicate the limits within which experiments in synthetic biology may be designed.
Introduction
Gene regulation in bacterial cells has both a networked and a hierarchical character, with each regulator taking up a position in its hierarchy that reflects influence on the gene expression programme of the cell and hence its physiology [Martinez-Antonio and Collado-Vides, 2003; Muskhelishvili et al., 2010; Treviño et al., 2012; Carrera et al., 2014; Dorman, 2020; Dorman and Geoghegan, 2020]. The structures and activities of networked regulatory hierarchies are both strongly influenced by the growth cycle of the bacterial culture: bacteria in the early exponential phase of growth exhibit a profile of regulators that is somewhat distinct from that found in bacteria that have entered the stationary phase of the cycle [Rolfe et al., 2012; Schellhorn, 2020]. Seniority in the hierarchy may be conditional, changing with the bacterium’s stage of growth: high-level regulators may be demoted to middle management, or even disappear completely, as the growth cycle advances [Martinez-Antonio and Collado-Vides, 2003; Bhardwaj et al., 2010]. The experience of shocks during growth, such as thermal or osmotic stress, leads to a shift in the population of regulators as the bacterium seeks to adjust to its new environmental circumstances [Gottesman, 2019]. Gene regulatory hierarchies have been investigated in great detail over many decades, and a wealth of molecular information is available, describing how the production and activities of regulators are controlled, how they interact with their target genes, and how they are integrated into the wider gene control networks and hierarchies of the cell [Rhen and Dorman, 2005]. This information has been invaluable in advancing our understanding of microbial life and has been exploited in studies of bacteria in the natural environment, in medicine, and in biotechnology. More recently, it has informed strategies in synthetic biology, where there is a need to employ gene regulatory circuits to solve practical problems in the development of synthetic organisms for a variety of uses [Murakami et al., 2015; Sorg et al., 2020; Fang et al., 2021].
DNA Supercoiling
DNA supercoiling, generated by transcription and DNA replication, is a major organizing principle of the bacterial nucleoid [Pruss and Drlica, 1986; Le et al., 2013; Lal et al., 2016; Le and Laub, 2016; Lioy et al., 2018; Visser et al., 2022]. DNA topological perturbation, in turn, has the potential to influence gene expression [Sternglanz et al., 1981; Drlica, 1984; Liu and Wang, 1987; Leirmo and Gourse, 1991; Dorman, 2019; Klein et al., 2021]. Since genetic information in bacterial cells is written into the DNA, this molecule’s variable topology can facilitate or impede the expression of this information. The need to resolve local DNA topological impediments during transcription elongation has been identified as an important contributor to the noisy, burst-like nature of transcription in bacteria [Chong et al., 2014]. These burst-like patterns are likely to impose gene-copy-to-gene-copy stochasticity on transcription across a population of genetically identical bacterial siblings, producing physiological diversity among those cells. For this reason, variable DNA topology may be assigned a high ranking in the hierarchy of gene expression’s influencers [Dorman, 2020].
Nucleoid-Associated Proteins
The nucleoid-associated proteins (NAPs) are another highly ranked group of regulators. These proteins combine genomic architectural properties with the ability to influence gene expression. Although much effort has been expended in studying the ability of NAPs to affect transcription [Dorman et al., 2020], it is clear that these proteins can also affect other stages of the gene expression process, such as translation [Balandina et al., 2001; Park et al., 2010]. NAPs can recognize their targets in DNA through direct readout, based on the base sequence, or through indirect readout, based on DNA conformation, or they can employ a combination of both methods [Dillon and Dorman, 2010]. They interact widely with large segments of the genome [Grainger et al., 2006; Cho et al., 2008; Kahramanoglou et al., 2011; Prieto et al., 2012; Ayala et al., 2015; Antipov et al., 2017]. Many NAPs control large collectives of genes, and the memberships of different NAP regulons can overlap extensively. Depending on the protein, NAP production can remain relatively constant over the growth cycle, or it may be produced in large bursts in one portion of the cycle, only to disappear in another [Dillon and Dorman, 2010]. These features allow NAPs to play a coordinating role in determining the gene expression profile of the bacterium in any given set of environmental circumstances. The distinction between NAPs and transcription factors (TFs) is not always clear, with some NAPs displaying properties that are distinctly TF-like in their gene regulatory roles [Browning et al., 2019; Dorman et al., 2020; Heyde et al., 2021]. NAPs also interact cooperatively, or anti-cooperatively, with TFs, and with variable DNA topology, to set and to reset the transcriptome of the cell [Matinez-Antonio and Collado-Vides, 2003; Zhou and Yang, 2006; Balleza et al., 2009; Browning and Busby, 2016; Dorman and Dorman, 2016]. Analyses at the whole-genome-scale in model organisms have confirmed the importance of DNA topology and NAPs in the management of the transcriptome [Lal et al., 2016; Sobetzko, 2016; Yus et al., 2019; Guo et al., 2021; Le Berre et al., 2022; Visser et al., 2022] and demonstrated the influence of the transcriptome on chromosome organization as a function of the bacterium’s physiological state [Le et al., 2013; Le and Laub, 2016; Lioy et al., 2018]. The H-NS NAP and DNA supercoiling have been implicated as major contributors to transcriptional noise in E. coli and hence to the generation of physiological diversity in clonal bacterial cultures [Urchueguía et al., 2021]. Here, we will consider the influence of chromosomal gene location on the expression pattern of regulatory genes encoding either NAPs or factors involved in the control of DNA topology. The effects on bacterial physiology of relocating these genes and of tinkering with their regulatory circuits through genetic rewiring will be described. While it should be noted that genes encoding NAP-like proteins are also found on extrachromosomal genetic elements [Doyle et al., 2007; Takeda et al., 2011; Fitzgerald et al., 2020], this topic is beyond the scope of the present discussion.
Chromosomal Geography
The model bacterium Escherichia coli K-12 has a circular DNA chromosome of 4.6 million base pairs (mbp) [Blattner et al., 1997]. It undergoes bidirectional replication from a unique origin of replication, oriC, to a terminus region that is located diametrically opposite to oriC [Hill et al., 1987]. Following replication, the chromosome copies are decatenated and segregated, without the intervention of specialized partition (Par) proteins, into two daughter cells that are then separated by binary fission [Reyes-Lamothe et al., 2012]. The replicating chromosome undergoes a carefully choreographed set of translocations during the cell cycle that involve the positioning and repositioning of the origin and terminus regions with respect to the cell poles and the mid-cell, with the latter becoming the site of septum formation at cell division [Danilova et al., 2007]. Other model organisms have cell cycles that involve specialized proteins to tether the chromosome to the cell pole and partitioning proteins to move the daughter chromosomes apart at mitosis [Badrinarayanan et al., 2015]. The movements of the chromosome as it is being copied cause the genes that it carries to sample a variety of locations within the cytosol as the cell cycle progresses.
The circular chromosome appears to have three levels of looped organization: at approximately 16 kbp, at 100–125 kbp, and at 600–800 kbp [Jeong et al., 2004; Képès, 2004; Wright et al., 2007; Mathelier and Carbone, 2010; Xiao et al., 2011; Junier et al., 2012]. The smallest loops approximate to the 12-kbp microdomains reported by Postow et al. [2004], of which E. coli is thought to have approximately 400 [Postow et al., 2004; Deng et al., 2005]. The chromosome also has a macrodomain structure, composed of distinct, large-scale chromosomal regions that do not interact, as determined experimentally by site-specific recombination reactions catalysed by the bacteriophage lambda tyrosine integrase, Int [Garcia-Russell et al., 2004, 2007; Valens et al., 2004]. There are four macrodomains in the E. coli chromosome, known (clockwise) as Ori, Right, Ter, and Left, and two nonstructured (NS) domains, NS-Right and NS-Left [Niki et al., 2000; Valens et al., 2004; Esnault et al., 2007; Messerschmidt and Waldminhaus, 2014; Duigou and Boccard, 2017] (shown in Fig. 1). These features are shared with the Salmonella enterica serovar Typhimurium [Rebollo et al., 1988; Niki et al., 2000; Garcia-Russell et al., 2004, 2007; Rocha 2008; Cameron et al., 2017], a model pathogenic organism with which E. coli is often compared. Given that E. coli and S. typhimurium separated from their common ancestor 100 million years ago [Doolittle et al., 1996; Gordienko et al., 2013], the striking similarity of their genomes, in terms of gene content and gene location, might suggest that there are limits to the extent to which genes can be repositioned while maintaining physiology within tolerable limits. However, a comparison of the 4.6-mbp chromosome of E. coli with its slightly larger (4.9-mbp) counterpart in S. typhimurium shows that while the gene content and gene order are indeed similar, the latter has absorbed at least seventeen pathogenicity islands via horizontal gene transfer [Ilyas et al., 2017], and there has been a large inversion of the chromosome, centred at the replication terminus [Jacobsen et al., 2011; Bartke et al., 2020]. Indeed, gene amplification [Andersson and Hughes, 2009], horizontal gene transfer [Syvanen, 2012], and symmetrical inversions around the chromosomal terminus [Khedkar and Seshasayee, 2016] are commonplace across the bacterial world. The E. coli and Salmonella comparison shows that while a great deal of genetic tinkering can be tolerated, an underlying, fundamental structure is preserved.
Genetic map of the chromosome of Salmonella entericaserovar Typhimurium strain LT2 showing the transcription units discussed in the text. The innermost (green) circle shows the coordinates of the chromosome calibrated in megabases. The next (blue) circle shows genetic distances using more refined increments. The black semicircular arrows represent the two replichores, originating at the origin of chromosome replication (oriC) and extending to the unique difsite within the terminus region. Each segment of the outermost (purple) shows the location of the four macrodomains (Ori, Right, Ter, and Left) and the two NS regions, NS-Right and NS-Left. Within these regions, the positions of the seven rrnoperons are shown, together with their directions of transcription (small black arrows). The locations of seventeen pathogenicity islands are also shown, together with the positions of other horizontally acquired genetic elements, such as the Fels-1, Fels-2, Gifsy-1, and Gifsy-2 prophage, that contribute to Salmonellavirulence. The locations of key genes involved in the organization of the bacterial nucleoid and the management of DNA topology are indicated around the circumference of the figure; the directions of transcription of the repositioned and rewired genes and operons discussed in the text are shown by the small red arrows. Reciprocal exchanges have been studied of the protein-encoding regions of fisand dps, ihfAand ihfB, and hnsand stpA, together with the engineering of a gyrBAoperon at the native location of the gyrBgene (with the concomitant deletion of the now-redundant gyrAgene from its native location indicated by ΔgyrA). A summary of structures of these modified genes is given around the periphery of the figure. “P” represents the promoter, followed by the name of the gene from which it comes. Thus, PihfA is the promoter of the ihfAgene. The source of the protein-coding region is written next, so PihfA-ihfBis the promoter of the ihfAgene driving the transcription of the protein-coding region of the ihfBgene. In the case of the rewired topAgene, the source of the replacement promoter region was the E. coli(EC) topAgene. The positions of other genes whose products are discussed in the text (sigma factor genes rpoD, rpoSand topoisomerase genes parC, parE, topA, topB) are also indicated, with small black arrows to indicate the direction of transcription.
Genetic map of the chromosome of Salmonella entericaserovar Typhimurium strain LT2 showing the transcription units discussed in the text. The innermost (green) circle shows the coordinates of the chromosome calibrated in megabases. The next (blue) circle shows genetic distances using more refined increments. The black semicircular arrows represent the two replichores, originating at the origin of chromosome replication (oriC) and extending to the unique difsite within the terminus region. Each segment of the outermost (purple) shows the location of the four macrodomains (Ori, Right, Ter, and Left) and the two NS regions, NS-Right and NS-Left. Within these regions, the positions of the seven rrnoperons are shown, together with their directions of transcription (small black arrows). The locations of seventeen pathogenicity islands are also shown, together with the positions of other horizontally acquired genetic elements, such as the Fels-1, Fels-2, Gifsy-1, and Gifsy-2 prophage, that contribute to Salmonellavirulence. The locations of key genes involved in the organization of the bacterial nucleoid and the management of DNA topology are indicated around the circumference of the figure; the directions of transcription of the repositioned and rewired genes and operons discussed in the text are shown by the small red arrows. Reciprocal exchanges have been studied of the protein-encoding regions of fisand dps, ihfAand ihfB, and hnsand stpA, together with the engineering of a gyrBAoperon at the native location of the gyrBgene (with the concomitant deletion of the now-redundant gyrAgene from its native location indicated by ΔgyrA). A summary of structures of these modified genes is given around the periphery of the figure. “P” represents the promoter, followed by the name of the gene from which it comes. Thus, PihfA is the promoter of the ihfAgene. The source of the protein-coding region is written next, so PihfA-ihfBis the promoter of the ihfAgene driving the transcription of the protein-coding region of the ihfBgene. In the case of the rewired topAgene, the source of the replacement promoter region was the E. coli(EC) topAgene. The positions of other genes whose products are discussed in the text (sigma factor genes rpoD, rpoSand topoisomerase genes parC, parE, topA, topB) are also indicated, with small black arrows to indicate the direction of transcription.
Gene Location and Physiology
It has been suggested that the highly conserved order of genes along each replichore in the model bacterium E. coli K-12 reflects an intimate, hard-wired connection between gene position and the bacterial growth cycle, through the lag, the exponential, and the stationary phases of growth [Sobetzko et al., 2012]. If this is true, what might the consequences be for the life of the cell if genes are relocated, especially if those genes encode regulatory proteins with wide-ranging influences on the transcriptome? The distance between a gene of interest and oriC correlates with the gene copy number in rapidly growing bacteria because chromosome replication restarts before the previous replication round terminates (shown in Fig. 2a, b). This phenomenon generates more copies of oriC-proximal genes compared to oriC-distal genes in exponentially growing cultures [Cooper and Helmstetter, 1968].
Effect of gene location on the transcriptional output. a Multicopy effect of chromosome replication on oriC-proximal and oriC-distal genes. The chromosome is undergoing consecutive rounds of replication initiation, producing four copies of the origin (green disc) and six replication forks (black arrows). The rrnCoperon is adjacent to oriCand is present in four copies. At the same time and in the cell cycle, the topAgene (shown in the Salmonellalocation), which is located close to the replication terminus (red disc), is present in just one copy. This pattern applies during rapid growth. With slow growth, or following cessation of growth, the copy numbers of rrnCand topAwill each approach one. b Notional gene (consisting of its native transcription signals and protein-coding region), represented by the orange filled rectangle, is placed in four different chromosomal locations. Its native location is close to the terminus of replication and the baseline level of transcription has a modest value. When moved to the location adjacent to oriC, the gene experiences the copy-number-associated boost described in a. This location also has a DNA topological profile that favours high activity of the gene’s promoter, and this chromosomal region is free from nucleoprotein complexes that can interfere with transcription. The result is a high level of transcription, exceeding that normally seen at the native location. A third location, at the 8 o’clock position on the chromosome, produces weak transcription and this is due to unfavourable DNA supercoiling in that chromosomal region. In a fourth location, the gene is found to be completely silent. This is because it has been placed in a transcriptionally silent Extended Protein Occupied Domain (tsEPOD) where nucleoprotein complexes, probably composed of NAPs such as H-NS, prevent the gene from being expressed. c A regulatory gene, located close to the replication terminus, controls a regulon of genes distributed around the chromosome. Does the distance-dependent sphere of influence of the regulatory gene product extend evenly across the regulon, or are target genes that are located closest to the source of the regulatory protein more likely to be bound by the protein? (Strong regulatory connections indicated by thicker arrows; weak ones by dashed arrows.) How does the simplistic pattern illustrated here, with a fully open chromosome, compare to the situation in a living bacterium where the DNA is folded within the complex structure of the nucleoid? Experimental evidence, described in the text, indicates that this “sphere of influence” model is too simplistic to describe reliably the gene regulatory situation in the folded chromosome within the nucleoid.
Effect of gene location on the transcriptional output. a Multicopy effect of chromosome replication on oriC-proximal and oriC-distal genes. The chromosome is undergoing consecutive rounds of replication initiation, producing four copies of the origin (green disc) and six replication forks (black arrows). The rrnCoperon is adjacent to oriCand is present in four copies. At the same time and in the cell cycle, the topAgene (shown in the Salmonellalocation), which is located close to the replication terminus (red disc), is present in just one copy. This pattern applies during rapid growth. With slow growth, or following cessation of growth, the copy numbers of rrnCand topAwill each approach one. b Notional gene (consisting of its native transcription signals and protein-coding region), represented by the orange filled rectangle, is placed in four different chromosomal locations. Its native location is close to the terminus of replication and the baseline level of transcription has a modest value. When moved to the location adjacent to oriC, the gene experiences the copy-number-associated boost described in a. This location also has a DNA topological profile that favours high activity of the gene’s promoter, and this chromosomal region is free from nucleoprotein complexes that can interfere with transcription. The result is a high level of transcription, exceeding that normally seen at the native location. A third location, at the 8 o’clock position on the chromosome, produces weak transcription and this is due to unfavourable DNA supercoiling in that chromosomal region. In a fourth location, the gene is found to be completely silent. This is because it has been placed in a transcriptionally silent Extended Protein Occupied Domain (tsEPOD) where nucleoprotein complexes, probably composed of NAPs such as H-NS, prevent the gene from being expressed. c A regulatory gene, located close to the replication terminus, controls a regulon of genes distributed around the chromosome. Does the distance-dependent sphere of influence of the regulatory gene product extend evenly across the regulon, or are target genes that are located closest to the source of the regulatory protein more likely to be bound by the protein? (Strong regulatory connections indicated by thicker arrows; weak ones by dashed arrows.) How does the simplistic pattern illustrated here, with a fully open chromosome, compare to the situation in a living bacterium where the DNA is folded within the complex structure of the nucleoid? Experimental evidence, described in the text, indicates that this “sphere of influence” model is too simplistic to describe reliably the gene regulatory situation in the folded chromosome within the nucleoid.
Many investigations of the effects of gene translocations around the bacterial chromosome have concluded that copy number effects associated with distance from the origin are the dominant influences on chromosome position-dependent transcription outputs [Beckwith et al., 1966; Cooper and Helmstetter, 1968; Chandler and Pritchard, 1975; Schmid and Roth, 1987; Miller and Simons, 1993; Pavitt and Higgins, 1993; Sousa et al., 1997; Loconto et al., 2005; Block et al., 2012]. This is not uniformly true: local effects imposed by NAP-dependent nucleoprotein structures and DNA supercoiling can override the copy number effects in some cases [Bryant et al., 2014; Brambilla and Sclavi, 2015; Gerganova et al., 2015; Frimodt-Møller et al., 2016; Inoue et al., 2016; Cooke et al., 2019]. For examples of these phenomena, see the section below on gene position and transcription output.
The ability of large, highly expressed transcription units to influence nucleoid structure [Le and Laub, 2016] might place limits on where these can be located on the chromosome. Ribosomal protein genes are typically found near to oriC and are transcriptionally aligned with the direction of replication fork movement in bacteria capable of fast rates of growth. They have promoters that respond to changes in DNA supercoiling [Leirmo and Gourse, 1991; Ohlsen and Gralla, 1992; Gaal et al., 1997] and are members of several NAP regulons [Hillebrand et al., 2005; Pul et al., 2007]. When highly expressed, these transcription units exert DNA topological influences over long distances, affecting up to forty neighbouring genes [Visser et al., 2022]. Relocating these genes toward the terminus region in Vibrio cholerae imposes a fitness penalty, even during slow growth, consistent with the natural position of the gene being of authentic physiological significance [Soler-Bistué et al., 2017].
Gene orientation, as well as chromosome position, is an important feature of highly expressed genes. Experiments where collisions between the replisome and RNA polymerase are engineered reveal the cis effects of transcription unit orientation on chromosome replication [Mirkin and Mirkin, 2005]. Pausing, or pausing and backtracking, of RNA polymerase leads to R-loop formation and collisions with the DNA replication complex has the potential to cause lethal DNA damage [Dutta et al., 2011; Brüning and Marians, 2021]. These gene orientation studies have demonstrated the importance of resolving transcription and replication conflicts, the assignment of highly expressed genes (and essential genes) to the leading strand of the chromosome, and the significance of maintaining GC-skew for the genetic and structural stability of the chromosome [Lopez and Philippe, 2001; Rocha and Danchin, 2003; Campo et al., 2004; Rocha, 2008]. Overall, the data indicate a tendency for genes in natural chromosomes to maintain their relative distances from oriC. Although large-scale chromosomal rearrangements occur, and are important drivers of genome evolution, these often involve symmetrical inversions around the origin or the terminus regions that preserve gene-to-oriC distance [Hughes, 2000; Kresse et al., 2003; Khedkar and Seshasayee, 2016].
Gene-Gene Communication
TFs govern the expression of genes in their regulons, positively or negatively, by direct interaction with cis-acting sites in the DNA that are close to, or that overlap, the target promoters [Browning and Busby, 2016]. TFs differ in the number of genes that they control, with some controlling single targets and others regulating the transcription of many hundreds [Shimada et al., 2018]. The TF must make a journey from its site of production to the target gene, taking into account constraints that may inhibit the diffusion of its mRNA from its site of transcription [Montero Llopis et al. 2010; Moffitt et al., 2016; Kannaiah et al., 2019] and barriers imposed by compartmentalization of the cytosol by the bulk of the nucleoid (shown in Fig. 2c). The length of the journey is influenced by the distance between the TF’s gene and the target site, but within the tiny 3-dimensional space of the nucleoid-occupied cytosol, the distance will likely be on a sub-micrometre scale: the volume of an E. coli cell is approximately 1 femtolitre or 10−15 L [Kubitschek and Friske, 1986]. The LacI repressor protein conducts its journey by a facilitated diffusion mechanism [Hammar et al., 2012], involving a combination of through-space (3-dimensional) translocations and 1-dimensional sliding along the chromosome, making nonspecific and specific interactions with the DNA as it goes [Elf et al., 2007; Garza de Leon et al., 2017]. There are approximately 5 to 40 copies of LacI in E. coli [Gilbert and Muller-Hill, 1966; Garza de Leon et al., 2017], and the presence of its inducer, isopropyl-β-D-thiogalactopyranoside, decreases the number of specific (but not nonspecific) interactions that LacI makes with DNA as it traverses the nucleoid [Garza de Leon et al., 2017]. Based on its diffusion coefficient of D = 0.4 μm2/s [Elf et al., 2007], LacI might be expected to become evenly distributed in the E. coli cell by simple diffusion. In contrast, the facilitated diffusion distribution pattern of LacI is reported to depend on the genomic location of the lacI gene, the compaction state of the nucleoid, and the physiological status of the bacterium [Kuhlman and Cox, 2012]. While the distance between the site of production of a TF and its genomic target may influence communication for regulators of very low abundance, what happens in the case of proteins, such as NAPs, that are produced in hundreds of copies? This geographical question is of special relevance to NAPs with production patterns in which periods of abundance alternate with periods of scarcity. The question is also relevant to considerations of TF activity because the distinction between TFs and NAPs is not amenable to precise definition [Browning et al., 2019; Dorman et al., 2020; Heyde et al., 2021].
NAP-Encoding Genes
Unlike many bacterial TFs, NAPs are often present in hundreds of copies, reflecting the very large numbers of targets that these have in the cell (Table 1). Their abundance also reflects the bifunctional nature of some NAPs, with an ability both to regulate gene expression directly and to influence the architecture of the nucleoid [Dorman et al., 2020]. The distribution of NAPs along the chromosome is not random [Grainger et al., 2006; Cho et al., 2008; Kahramanoglou et al., 2011; Prieto et al., 2012; Ayala et al., 2015; Antipov et al., 2017], and some, such as the H-NS protein, have well-characterized abilities to silence transcription, targeting those horizontally acquired genes that have an A+T content exceeding that of the background genome [Lucchini et al., 2006; Navarre et al., 2006; Oshima et al., 2006] (Table 1).
H-NS and Its Paralogue StpA
The hns gene that encodes H-NS is found in the Ter macrodomains of the E. coli and S. typhimurium chromosomes, as part of a large inversion that has placed it in opposite replichores in each of these species [Blattner et al., 1997; McClelland et al., 2001]. There appears to be no stage of growth at which H-NS is absent from the cell, although fluctuations in its intracellular concentration have been reported [Ueguchi et al., 1993; Dersch et al., 1993; Free and Dorman 1995; Atlung and Ingmer 1997]. H-NS is not essential, in part because it is accompanied by the paralogous StpA protein, with which it shares many properties [Fitzgerald et al., 2020].
One distinguishing feature of StpA concerns its influence on the production of the RpoS stationary phase and stress sigma factor of RNA polymerase [Lucchini et al., 2009]. The stpA gene is located in the NS-Left region of the chromosome in both E. coli and S. typhimurium in which the protein coding regions of the stpA and hns genes have been reciprocally exchanged display a marked rescheduling of RpoS production. Instead of being confined to the stationary phase of the growth cycle, RpoS is produced prematurely during exponential growth, with a positive impact on the competitive fitness of the rewired bacterium when grown under conditions of osmotic stress [Fitzgerald et al., 2015] (Table 2). Interpreting with precision the underlying cause of the change to RpoS production is made complicated because the rewired strain has undergone both a reciprocal exchange of the chromosomal locations of the stpA and hns protein coding regions and a reciprocal reconnection of each to the regulatory sequences of the paralogous gene. The most straightforward interpretation is that dysregulation of stpA expression has affected RpoS output through the known effects of StpA on the expression of the rssC gene, encoding the RssC anti-adaptor protein that helps to control the proteolytic turnover of RpoS in exponential phase cultures [Lucchini et al., 2009]. The hns gene is autoregulated negatively by H-NS [Dersch et al., 1993; Ueguchi et al., 1993].
Moving the hns gene’s promoter (with transcriptionally fused reporters) to different positions on the chromosome changes the output of this transcription unit when the new location is in an A+T-rich target site for H-NS binding; movement of the hns promoter into such a chromosomal territory causes it to become silent. Based on transcription output data from insertions at six different chromosome positions, three in each replichore, gene position does not otherwise affect the hns promoter beyond influences that can be attributed to copy number effects arising from the distance between the insertion site and oriC [Brambilla and Sclavi, 2015] (Table 2). The suggestion that the native location of the hns gene might produce a localized sphere of influence for the H-NS protein in this gene’s vicinity [Sobetzko et al., 2012] (as shown in Fig. 2c) is not supported by these hns gene translocation data [Brambilla and Sclavi, 2015].
Fis and Dps: NAPs Produced at Opposite Ends of the Growth Cycle
The Fis NAP (Table 1) is involved as an architectural component in a multitude of molecular mechanisms, including the initiation of chromosome replication [Ryan et al., 2004; Miyoshi et al., 2021], site-specific recombination [Johnson et al., 1986; Koch and Kahmann, 1986], transposition [Weinreich and Reznikoff, 1992], and transcription [Bokal et al., 1997; Auner et al., 2003; John et al., 2022]. The hns gene is a member of the Fis regulon, illustrating the regulatory interconnectedness of the NAP-encoding genes [Falconi et al., 1996]. The Dps protein (Table 1) is usually classified as a NAP, although its contributions to the life of the cell lie chiefly in its ferritin-like iron binding properties and its ability to protect the genome from chemical and enzymatic damage [Nair and Finkel, 2004; Karas et al., 2015; Janissen et al., 2018].
Fis and Dps are produced at opposite stages of the growth cycle: Fis is abundant in early exponential phase but declines in concentration thereafter, while Dps is produced in late stationary phase [Nair and Finkel, 2004; Karas et al., 2015; Janissen et al., 2018]. The genes that encode these proteins are also found in opposite replichores, at diametrically opposite locations on the chromosome, with fis being in the NS-Left region and dps in the Right macrodomain (shown in Fig. 1). The negatively autoregulated fis gene is the downstream component of the bicistronic dusB-fis operon; the dps gene depends on the RpoS sigma factor and the integration host factor (IHF) NAP for transcription in stationary phase, but it can be transcribed using RpoD in bacteria experiencing oxidative stress during exponential growth; in the absence of oxidative stress, Fis represses this RpoD-dependent transcription of dps [Altuvia et al., 1994; Grainger et al., 2008]. Transplanting the dusB-fis operon to the Ter region of the E. coli chromosome alters the ability of the new bacterial strain to manage its DNA topology and its resistance to stress, without causing a detectable change in Fis protein production patterns due to a position-associated increase in fis transcription being offset by a reduction in fis copy number during exponential growth [Gerganova et al., 2015] (Table 2).
Fis influences bacterial DNA topology at several levels: indirectly by acting as a TF at the genes that encode DNA topoisomerase I [Weinstein-Fisher and Altuvia, 2007] and DNA gyrase [Schneider et al., 1999; Keane and Dorman, 2003] and directly through its ability to constrain DNA supercoils [Rochman et al., 2004; Cameron et al., 2011]. For these reasons, it is difficult to determine, with precision, the mechanism by which a change in fis gene location affects global DNA supercoiling in E. coli. Exchanging the complete dusB-fis operon reciprocally with the dps gene on the chromosome of S. typhimurium results in increased binding of Fis to the Ter region and reduced binding to the Ori region, but no gross changes to the physiological character of the rewired bacterium are observed [Bogue et al., 2020] (Table 2). When just the protein-encoding regions of fis and dps are exchanged, these binding patterns are more pronounced and the large Fis regulon becomes dysregulated. In this second genetic rearrangement, the growth phase-dependent production patterns of Fis and Dps are reversed: Fis is now produced in the stationary phase and Dps in the early exponential phase. One of the phenotypic effects of this reversal is a reduction in the ability of the facultative intracellular pathogen S. typhimurium to infect cultured HeLa cells. These changes correlated with the altered expression of virulence genes in the horizontally acquired pathogenicity islands SPI1 and SPI2 [Bogue et al., 2020] (Table 2).
The changes to Fis binding patterns and Fis regulon gene expression extended beyond the chromosome to two resident plasmids, underlining the association between Fis production patterns and the expression of genes acquired by horizontal transfer. These findings show that for Fis in S. typhimurium, the pattern of its production during the growth cycle is more significant than the location on the chromosome of the fis gene. Alterations to Dps production in the rewired S. typhimurium strains have negligible impacts on global gene expression patterns [Bogue et al., 2020], in agreement with the known inability of Dps to affect transcription in E. coli [Janissen et al., 2018] (Table 2).
The Heterodimeric NAP IHF
The IHF (Table 1) is an intensively studied NAP that binds to DNA sites that are close matches to the consensus sequence WATCAANNNNTTR (where W is A or T, R is purine, and N is any base), introducing bends of up to 180° [Rice et al., 1996]. This allows IHF to play architectural roles in site-specific recombination [Corcoran and Dorman, 2009]; transposition [Chalmers et al., 1998]; the replication of chromosomes [Ryan et al., 2004; Kasho et al., 2021], of plasmids, and of bacteriophage [Fekete et al., 2006]; the acquisition of novel DNA sequences by CRISPR-cas systems [Wright et al., 2017]; integrative and conjugative element transmission [McLeod et al., 2006]; conjugative plasmid transfer [Williams and Schildbach, 2007]; and transcription [Mangan et al., 2006]. IHF is a heterodimer composed of one alpha and one beta subunit, and these are produced by the ihfA and ihfB genes, located 350 kilobase pairs apart, in the Right replichore of the S. typhimurium chromosome (shown in Fig. 1). The ihfA gene is in the Right macrodomain, while ihfB is in the Ter macrodomain; both genes align with the direction of chromosome replication (shown in Fig. 1). Whole-genome analysis indicates that oriC, which lies at a considerable distance from the genes encoding IHF, is a cell cycle-dependent high-affinity site for IHF binding, a property that it retains even when relocated to the Ter region of the chromosome [Kasho et al., 2021]. The subunits of IHF show significant amino acid sequence similarity to one another and to the subunits of the related heterodimeric NAP HU [Swinger and Rice, 2004] (Table 1).
Although required in a one-to-one stoichiometry, the IhfB subunit is produced in 2-fold excess over IhfA [Pozdeev et al., 2022]. This appears to reflect a mutual dependence of the two subunits for protein stability, with IhfA being especially unstable in the absence of IhfB [Zulianello et al., 1994]. Each gene is embedded in a complex operon that specifies components of the cellular translational machinery. Exchanging just the protein-encoding regions of ihfA and ihfB reciprocally allowed the significance of gene location and complex operon membership to be explored. The strain that underwent the exchange exhibited a reversal of IHF subunit production: now IhfA was present in 2-fold excess over IhfB, chiefly due to differences in the stabilities of the mRNAs expressed by the hybrid operons [Pozdeev et al., 2022] (Table 2). At a cellular level, the genetic exchange produced no difference in the competitive fitness of the rewired strain to the wild type, but all three of the regulons encoding type III secretion systems had altered expression. These were the motility apparatus, the virulence genes encoded by the SPI1 pathogenicity island for invasion of host cells and the macrophage vacuole survival genes encoded by the SPI2 pathogenicity island. In the rewired strain, motility was reduced and SPI1 and SPI2 gene expression each showed a modest growth phase-dependent decline in expression. However, these SPI island expression reductions did not translate into measureable reductions in infectivity of cultured mammalian cells, indicating that the ihf gene rewiring was well tolerated by the bacterium at the level of its virulence gene expression programme.
Gene Position and Transcription Output
Is distance from the origin of chromosome replication the principal determinant of gene expression output, or can other features of the nucleoid exert an influence too? Experiments with hns promoter relocation show that local nucleoprotein structure plays a role, with segments of the chromosome that experience H-NS binding being nonpermissive for transcription [Brambilla and Sclavi, 2015]. The E. coli lac promoter also displays chromosome position-dependent variations in transcription output that are due to local nucleoprotein complexes [Bryant et al., 2014; Cooke et al., 2019]. These regions of the chromosome, called transcriptionally silent extended protein occupancy domains (tsEPODs), have an average length of 2 kilobases and are enriched in NAP binding [Vora et al., 2009]. The A+T-rich, horizontally acquired pathogenicity islands of S. typhimurium that are targets for transcriptional silencing by H-NS have the characteristics of tsEPODs, although they are not usually defined in this way. These islands, especially SPI1 and SPI2, show altered transcription patterns when the Fis NAP is produced from the Ter region of the chromosome, especially when the dps promoter drives its production [Bogue et al., 2020]. Thus, tsEPODs can affect the output from promoters that have been inserted artificially, and promoters located within tsEPODs can become dysregulated, when transcriptional regulatory proteins are produced from a new chromosome location.
A comprehensive survey of transcription propensity as a function of E. coli chromosome location, involving over 100,000 sites, reveals that the halves of each replichore that are on the oriC side of the chromosome are the most transcriptionally active [Scholz et al., 2019]. This “Northern Hemisphere” of the chromosome contains those transcription units, such as the ribosomal operons (shown in Fig. 1) that are known to be among the most heavily transcribed. In addition, high transcription correlates with regions that are enriched for Fis NAP binding, while low transcription correlates with regions that are enriched for H-NS and tsEPODs [Scholz et al., 2019]. DNA supercoiling is distributed unevenly around the chromosome [Lal et al., 2016; Guo et al., 2021; Visser et al., 2022], so a supercoiling-sensitive promoter that is inserted in a genomic location that differs topologically from its native position may exhibit a new expression pattern. This was found to be the case with the lac promoter in E. coli, when its insertion at some chromosome locations resulted in enhanced expression that depended on DNA gyrase activity and correlated negative supercoiling of reporter plasmid DNA [Bryant et al., 2014]. If chromosome topology is part of the global gene regulatory hierarchy, what are the consequences of rewiring genes that encode topoisomerases?
DNA Topoisomerases and Topological Balance
Replication of the chromosome and transcription are processes that produce local under-twisting and over-twisting of the DNA. These topological disturbances are resolved by DNA topoisomerases, of which E. coli and S. typhimurium each has four (Table 1). Of these, the type II topoisomerase, DNA gyrase, has the unique ability to introduce negative supercoiling into relaxed DNA, using an ATP-dependent mechanism in which an intact segment of DNA is passed through a double-stranded gate that is generated by gyrase [McKie et al., 2021]. This enzyme can also eliminate positive supercoils from over-twisted DNA. Its structural paralogue, topoisomerase IV, also a type II enzyme, resolves catenanes in chromosome copies during the cell cycle, preventing entanglements that would interfere with successful segregation of the chromosome copies at cell division [Zechiedrich and Cozzarelli, 1995].
Gyrase has an α2β2 heterotetrameric structure, with the alpha (GyrA) subunit encoded by the gyrA gene and the beta (GyrB) subunit by gyrB, respectively, and these genes are found at separate locations on the chromosome in E. coli and S. typhimurium (shown in Fig. 1). This separation refers to the linear distance between gyrA and gyrB along the circumference of the circular chromosome; in the folded chromosome of the nucleoid, the through-space distance between these Left-replichore-located genes is likely to be much shorter. The independently transcribed parC and parE genes, encoding topo IV, are co-located (shown in Fig. 1) but do not form an operon. They can undergo copy number amplification in mutants that are impaired for the production of topoisomerase I, an important contributor to the relaxation of over-twisted DNA. Loss of topo I is associated with hypernegative supercoiling of the DNA in the wake of elongating RNA polymerase during transcription, a process that encourages the formation of R-loops [Brochu et al., 2018]. R-loop formation can be suppressed by the DNA-relaxing activities of type I topoisomerases topo I and topo III [Brochu et al., 2018], although the primary cellular role of topo III is as a decatenase [Nurse et al., 2003; Perez-Cheeks et al., 2012]. Mutations that knockout topA, the gene that encodes topo I, are tolerated because these mutants rapidly accumulate second-site lesions [Leela et al., 2021], such as those that increase the copy numbers of parC and parE, and hence, the production of topo IV and its DNA-relaxing activity (Table 2).
These observations, combined with earlier reports that compensatory mutations, diminish the negative supercoiling activity of gyrase [DiNardo et al., 1982; Pruss et al., 1982; Raji et al., 1985; Dorman et al., 1989; McNairn et al., 1995] and point to a subtle, homoeostatic circuitry, involving the key topoisomerases that maintains an appropriate balance of DNA relaxation and supercoiling activities in the cell at all times. Interspecies differences in the response of topA transcription to the Fis NAP in E. coli and Salmonella have revealed subtle distinctions in the ways that these closely related bacteria manage their DNA topology. When the Salmonella topA gene is rewired with the promoter region from the E. coli topA gene, its response to the Fis NAP and to environmental stress is altered; a reciprocal effect is seen in E. coli when topA transcription is driven from the Salmonella topA promoter region [Cameron et al., 2011] (shown in Fig. 1 and described in Table 2). These results show that even highly conserved genes in related species diverge in their expression patterns as the bacteria adapt to their environmental niches. When the distinctions in expression affect high-ranking members of the regulatory hierarchy, such as DNA topoisomerase I, wide-ranging effects on physiology are likely to follow.
Converting the Independent gyrA and gyrB Genes to a gyrBA Operon
DNA gyrase is essential for bacterial life, and the genes that encode it cannot be deleted. Uniquely among topoisomerases, gyrase has the ability to introduce negative supercoils into double-stranded DNA. Despite producing subunits that are required in equal quantities, the gyrA and gyrB genes lie at opposite ends of the Left replichore of the chromosome in E. coli and S. typhimurium (and many other bacteria), have independent transcriptional control (shown in Fig. 1), and have different copy numbers in rapidly growing bacteria. The genes exhibit growth cycle-dependent transcription patterns, with the highest levels of transcription being seen in early exponential phase. During this period of the cycle, the Ter-proximal gyrA gene is transcribed at approximately twice the rate of its oriC-proximal gyrB counterpart, perhaps to compensate for the difference in gyrB and gyrA copy numbers in fast growing cells [Pozdeev et al., 2021]. Curiously, the genes are arranged as a gyrBA operon in many other bacteria, such as Listeria monocytogenes [Glaser et al., 2001], Mycobacterium spp. [Unniraman and Nagaraja, 1999, 2002], and Staphylococcus aureus [Baba et al., 2008]. An operon arrangement would appear to offer advantages, such as co-expression and co-regulation of the two genes [Price et al., 2005], with co-production of the proteins in equal amounts, at the same site, leading to efficient assembly of the mature topoisomerase [Dandekar et al., 1998; Pal and Hurst, 2004; Swain, 2004]. Indeed, the coupling of transcription and translation in prokaryotes is likely to aid in the production of proteins in stoichiometric quantities from operons [Rocha, 2008; Li et al., 2014]. An operonic arrangement of genes that contribute to the same pathway is commonplace in bacteria [Lawrence and Roth, 1996; de Daruvar et al., 2002; Rogozin et al., 2002; Price et al., 2006]. Why has this arrangement not been adopted uniformly by natural bacterial species? Should it be considered when designing the chromosomes of synthetic organisms?
In an attempt to answer these questions, a gyrBA operon was introduced at the site of the gyrB gene in S. typhimurium (shown in Fig. 1), with precise elimination of the now-redundant second copy of gyrA from that gene’s native site [Pozdeev et al., 2021]. The production of gyrase from an operon did not lead to obvious improvements in the physiology of the bacterium, although it did differ from the wild type in a number of respects. Although the gyrBA strain and its wild-type ancestor had equal competitive fitness indices, the operonic strain was marginally less motile. When tested for sensitivity to DNA gyrase inhibiting drugs, the two strains were equally sensitive to quinolones, while the operonic strain exhibited enhanced resistance to the coumarin drug novobiocin and greater sensitivity to coumermycin. Having dispensed with the stand-alone gyrA gene, the production of gyrase in the operonic strain depended on the gyrB promoter alone, and this promoter is sensitive to coumarin stimulation while that of gyrA is not [Neumann and Quiñones, 1997].
Although the strain with the gyrBA operon failed to supercoil DNA negatively to the same degree as the wild type in rich growth medium, it had enhanced negative supercoiling activity in a minimal medium that mimics conditions in a macrophage vacuole, an environment in which the gyrase of S. typhimurium plays a pivotal adaptive role [Colgan et al., 2018]. Consistent with this observation, the gyrBA operonic strain exhibited dysregulation of its SPI2 pathogenicity island, containing genes required for survival in macrophage, and it survived less well than the wild type in cultured macrophage [Pozdeev et al., 2021] (Table 2). Rewiring the natural gyrase operon in M. smegmatis, such that it no longer responds to fluctuations in local DNA supercoiling, removes the gyrBA operon from its native homoeostatic DNA topological control circuitry where transcription is triggered by DNA relaxation. This allows the effects of aberrant gyrase production to be studied in the absence of inhibiting drugs. This strain exhibits a number of transcriptomic, morphological, and growth defects, together with a reduction in topoisomerase I to compensate for the paucity of gyrase and increased expression of the NAPs HupB and Lsr2 [Gordon et al., 2010; Guha et al., 2018] (Table 2). It is clear that rewiring the expression signals of the gyrase genes promotes large changes to bacterial physiology, whereas forming a bicistronic operon from the separated gyrA and gyrB genes has much more subtle effects.
Conclusion
Gene rewiring is a natural process that contributes to bacterial evolution [Perez and Groisman, 2009; Oren et al., 2014; Murakami et al., 2015; Baumgart et al., 2021], and it offers potential for exploitation in synthetic biology. However, engineering bacterial genomes to have reduced gene numbers for synthetic biological purposes can produce organisms with unexpected growth limitations, illustrating our incomplete understanding the workings of the gene regulatory networks and hierarchies that natural genomes specify [Choe et al., 2019]. Rather than making very large numbers of changes to the genetic complement of a bacterium, the studies described in this article have investigated the impact of rewiring genes that encode pleiotropic regulators, namely NAPs and topoisomerases. Two types of rewiring are involved: altering the location of the gene of interest and/or providing the gene with a novel set of regulatory signals.
Model bacteria such as E. coli are impressively tolerant of even extensive regulatory rewiring [Isalan et al., 2008], and the same is true of the specific gene relocations and regulatory region alterations described above. It is perhaps significant that the Salmonella systems most sensitive to rearrangements of the genes encoding Fis, IHF, and DNA gyrase are the horizontally acquired, physiologically “expensive” genes found in the SPI1 and SPI2 pathogenicity islands. These lie in regions that are normally transcriptionally quiescent and subject to silencing by the H-NS NAP. Fis and IHF both play positive roles in the expression of these genes [Schechter et al., 2003; Kelly et al., 2004; Mangan et al., 2006; Ó Cróinín et al., 2006; Fass and Groisman, 2009; Cameron et al., 2011; Queiroz et al., 2011], and Fis is a known antagonist of H-NS repression [Falconi et al., 1996]. Tinkering with the production of pleiotropic regulators may create impressions of tolerance and robustness on the part of the bacteria growing under laboratory conditions. Subtle changes to the abilities of the organism to amplify or to dampen regulatory noise in the transcriptome [Chalancon et al., 2012; Urchueguía et al., 2021] due to regulatory gene rewiring may produce less subtle effects that only become apparent when the bacterium is placed under stress and/or is growing in its natural environment. It would be advantageous to conduct long-term evolution studies with bacteria that have rewired NAP and topoisomerase genes. Studies of this type with wild-type E. coli have already shown that mutations accumulate in NAP (dusB-fis) and topoisomerase genes (topA, but not gyrA or gyrB) and that these correlate with changes to global DNA supercoiling [Crozat et al., 2010; El Houdaigui et al., 2019]. Work of this nature would provide deeper insights not only into what is physiologically tolerable but also what is tolerable over extended periods of time.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This research was supported by Principal Investigator Award 13/IA/1875 from the Science Foundation Ireland to Charles James Dorman. This research was funded in whole, or in part, by the Wellcome Trust (grant 108413/A/15/D). For the purpose of open access, the authors have applied a CC BY public copyright licence to any author-accepted manuscript version arising from this submission. The funders had no role in the study design, data collection or analysis, decision to publish, or the preparation of the manuscript.
Author Contributions
Charles James Dorman and Matthew James Dorman researched the topic, analysed the data, and wrote the manuscript.