Human papillomaviruses (HPVs) are the etiologic agents of cervical cancer, the unique human neoplasia that has one single necessary cause. The diversity of HPVs is well described, with 200 HPV types existing as distinct taxonomic units and each receiving an Arabic number. On a clinical basis, they are usually grouped by their site of occurrence and disease associations. Those types inhabiting the anogenital mucosa are more intensively studied and further divided into cancer-associated HPVs, which are termed ‘high risk', while those linked to benign proliferative lesions are assigned as ‘low risk'. HPV16 is responsible for approximately 50% of all ICC cases, and paradoxically is one of the most prevalent types among healthy women. Longitudinal studies have shown that when an incidental HPV16 infection becomes persistent it will result in an enhanced risk for the development of high-grade lesions. However, it is unknown why some persistent, HPV16 infections (or infections by other HR-HPV types) progress to CIN3+ while most clear spontaneously. Several epidemiological investigations have focused on cofactors, from the most obvious such as cigarette and other carcinogenic exposures, to coinfections by other STDs such as chlamydia, with no significant findings. Thus, the current focus is on genomic variation from both virus and host. Such studies have been potentialized by the enormous technical advances in nucleic acid sequencing, allowing this relationship to be broadly interrogated. Corroborating subgenomic data from decades ago, an association between HPV16 lineages and carcinogenesis is being revealed. However, this effect does not seem to apply across female populations from different continents/ethnicities, again highlighting a role played by HPV16 adaptation and evasion from the host over time.
Papillomaviruses (PVs) comprise a group of small, nonenveloped viruses with a double-stranded DNA genome of approximately 8 kbp that infect vertebrates. PVs have been isolated from diverse hosts, including mammals, birds, turtles, and snakes, suggesting they infect all amniotes . It is believed that their origin dates to around 350 million years ago and is linked to changes in the epithelium of their ancestral host, the first reptiles. Viruses that slowly evolve with their hosts, as PVs do, usually cause asymptomatic infections rather than severe disease . However, certain human papillomavirus (HPV) infections can cause benign and malignant proliferative disorders, such as skin warts, epidermodysplasias, condylomata acuminata and malignant neoplasms of the anogenital region, mainly in the cervix. Molecular evidence suggests that HPV is also involved in a proportion of cancer cases in the vagina, vulva, anus, penis, and in the head and neck . To date, more than 200 HPVs have been fully sequenced (http://pave.niaid.nih.gov/). HPVs are classified into 5 genera, with different types showing differential associations and disease spectra [4,5,6].
Origin of PVs Diversity and Mutation Rates
In the infected epithelial cell, PV replication occurs during the S-phase using high-fidelity cellular DNA polymerases with error correction leading to mutation rates close to those of the hosts . However, codon usage varies between species and between genes within the same genome , and HPVs do not have the same codon usage preferences of the host, since the viral genome is enriched in A+T [8,9]. All PV genes show poor adaptation to human codon usage preference, and orthologous genes in PVs with a similar clinical presentation display similar codon usage preferences . Regarding mutation, the APOBEC3 internal mutators have been shown to target PV DNA, introducing directional C>T changes [11,12]. The APOBEC3 locus has undergone a large expansion in the human lineage , which may partially account for the broad diversity of human PVs. The substitution rate estimates for PV coding regions are between 2 × 108 and 5 × 109 substitutions per site per year [14,15].
HPV Diversity and Cancer
The phylogenetic relationship of HPVs is classified at the genera and species level, according to the International Committee on Taxonomy of Viruses (ICTV). Below the species level, the PV research community classifies viruses that are 10% dissimilar on the L1 gene as a different HPV type, and as a subtype when 2-10% dissimilar . Recently, it was proposed that differences ranging from 1 to 2% of the full genome define the variant level and from 0.5 to 1% define sublineages [4,16].
Infections by most HPVs are asymptomatic , but 12 types are classified by the International Agency for Research on Cancer as definitely oncogenic: HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58 and 59 . Carcinogenic potential is not evenly distributed among HPVs, with types 16 and 18 responsible for 70% of invasive cervical cancer (ICC) cases worldwide [19,20].
Different HPV16 and 18 variants and sublineages have been identified based on partial regions of the genome [21,22,23,24,25] and the complete genome [26,27,28,29], and variants of both types are associated with different risks of ICC development [30,31,32,33]. Differences in the oncogenic potential displayed by variants of the same HPV type have been described; HPV16 B, C, D2 and D3 variants promote a 3-fold higher risk of cervical cancer development compared to the A1 sublineage. The non-European HPV18 variants are identified more frequently in cancer tissues and high-grade squamous lesions [31,34,35,36]. HPV33 (C7732G) and HPV58 (C632T and G760A) variants have been associated with a higher risk of cervical cancer development [37,38]. In addition, the HPV16 D2 and D3 sublineages are associated with adenocarcinoma [36,39]. Some studies suggest that a higher risk for cervical cancer development is not only related to the HPV variant/sublineage, but also dependent on ethnicity. A study that analyzed the risk of CIN3+ development in women infected by HPV16 according to ethnicity shows that white women infected by the HPV16 A1/A2 sublineages have a higher risk of CIN3+ compared to all women from other ethnicities. In contrast, this risk is higher for Asian women with HPV16 A4 infection and Hispanic women with the D2/D3 infection sublineages . Moreover, rare cases of cervical cancer in which the only HPV type isolated was a low-risk HPV have been described [19,40], suggesting a particular susceptibility in the host.
Several research groups have sought to identify mutations or a mutation profile that could be associated with the transformation activity of an individual isolate. Researchers analyzed the PVs upstream regulatory region (URR), which contains the transcriptional factor binding sites observed that are partly common to all PVs and are partly type specific [41,42,43]. All these studies led to the conclusion that changes in the replicative behavior and in the ability to generate carcinogenesis by the different variants could be due to point mutations in the URR [44,45]. However, in-depth analysis of the E2-L2 region, which is responsible for encoding the E5 protein of mucosal HPVs, led to 4 phylogenetically distinct families that could be related to the ability to produce benign or malignant proliferative disorders . These data suggest that a single region of the HPV genome is not sufficient to explain the different infection outcomes.
A study conducted by Chen et al.  underscores the importance of sequencing complete HPV genomes in such studies, since mutations are not uniformly distributed through the genome. More recently, with the advent of next-generation sequencing, some groups have started to work with technology that is able to generate 3 to 4 orders of magnitude more information than Sanger sequencing technology . This information density, provided by the deep sequencing approach, is fundamental for the study of viral diversity as it allows the simultaneous detection of minor variants. The resolution power makes these techniques the current standard tool for inter- and intrahost viral diversity analyses.
Studies that analyzed the near full HPV16 genome, using deep sequencing, did not observe any deletion event in the invasive cervical cancer samples analyzed [39,48]. However, a significant increase in read depth close to L1 and L2, respectively, compatible with a duplication event of a genome fragment spanning these regions was observed . Similar results of partial duplications of the viral genome have been independently reported by different laboratories in lung cancer cases evolving from respiratory papillomatosis caused by HPV low-risk types [49,50].
Intrahost HPV genetic diversity is poorly studied, but deserves more attention since it could be the main explanation for different outcomes after infections by the same HPV type. Intrahost next-generation sequencing data have shown that HPV16 episomes present in the W12 cell line did not accumulate sequence variation above 0.5% . However, the same authors reported that more than 40% of the clinical samples contained polymorphic sites reaching frequencies up to 5% . In another study, it was detected in polymorphic sites in all ICC samples submitted to deep sequencing . Viral genomic diversity accumulated during the course of a natural infection contrasts sharply to the absence of variation generated during in vitro subculture of the W12 episome-containing cells.
It could be claimed that intralesion viral diversity could rather reflect an original infection by several very closely related HPV16 genotypes. However, our current understanding of HPV-associated disease suggests that individual lesions are associated with individual infection events [51,52]. Instead, we propose here that at least a fraction of such changes accumulated during infection may arise from selection pressures associated with the immune response. Regarding the adaptive immune response, polymorphisms observed in the E6 gene could be a result of an immune-selective pressure. Many linear epitopes have already been described in this region  and could also be related to the affinity of the interactions of the E6 protein with E6-AP  and p53 . Similarly, polymorphisms in the E1 gene, present in different HPV16 subvariants, could be related to the DNA binding domain capacity, but do not affect the replication activity . Finally, humoral epitopes are known in the E2 region  and could drive immune selection.
Genetic variability of the host may play an important role in HPV infection outcome. Several host genetic studies show that genes in the major histocompatibility complex region are involved in the susceptibility to HPV-associated diseases. HLA-DRB1 and HLA-DQB1 are associated with the susceptibility to cervical carcinoma [56,57]. A positive association between SCC (squamous cell carcinoma) and human leukocyte antigen has also been described for DRBQ*15, DRB1*11, DRB1*04, and DRB1*07, and protection against SCC is described especially for DRB1*13 .
The association of HPV type with cancer risk is well established. However, an important scientific gap remains in our understanding of why the same HPV type leads to such variable clinical outcomes. Variants of HPV16 are related to an increased risk of CIN3+, a finding restricted to the ethnical group where these variants were firstly described, suggesting an evolving ability of the prevailing variants to evade the host's immune control. Besides this, intralesion diversity, a common feature in ICC, may be generated during the course of a chronic infection, possibly fueled by immune selection pressures, either innate, for example in response to an APOBEC3 directional mutation, or adaptive, such as in response to humoral or cellular directed responses.
Mutational patterns that can be responsible for the differential transforming activity within and between HPV types, subtypes, variants and lineages may emerge in the near future, as more whole-genome sequences linked to epidemiological and clinical data becomes available.
The authors have no conflicts of interest to declare.