Abstract
Sequence heterogeneity is a feature of hepatitis B virus (HBV), the prototype member of the family Hepadnaviridae. Based on an intergroup divergence of greater than 7.5% across the complete genome, HBV has been classified phylogenetically into 9 genotypes, A-I, with a putative 10th genotype ‘J', isolated from a single individual. With between approximately 4 and 8% intergroup nucleotide divergence across the complete genome and good bootstrap support, genotypes A-D, F, H, and I are classified further into subgenotypes. There is a broad and highly statistically significant correlation between serological subtypes and genotypes, and in some cases, serological subtypes can be used to differentiate subgenotypes. The genotypes, and certain subgenotypes, have distinct geographical distributions and are important in both the clinical manifestation of infection and response to antiviral therapy. HBV genotypes/subgenotypes and genetic variability of HBV are useful in epidemiological and transmission studies, tracing human migrations, and in predicting the risk for the development of severe liver disease and response to antiviral therapy. Moreover, knowledge of the genotype/subgenotype is important in implementing preventative strategies. Thus, it is crucial that new strains are correctly assigned to their respective genotype/subgenotype and consistent, unambiguous, and generally accepted nomenclature is utilized.
Hepatitis B virus (HBV), the prototype of the family Hepadnaviridae, has a partially double-stranded circular DNA genome of approximately 3,200 base pairs. This compact genome contains four partly or completely overlapping open reading frames: preC/C that encodes for e antigen (HBeAg) and core protein (HBcAg); P for polymerase (reverse transcriptase) (POL), S for surface proteins [three forms of HBsAg, small (S), middle (M) and large (L)], and X for a transcriptional transactivator protein. HBV replicates by reverse transcription of the pregenomic RNA, a 3.5-kb RNA intermediate, which is transcribed by the cellular RNA polymerase II from the covalently closed circular form of HBV DNA in the hepatocyte nucleus. Sequence heterogeneity is a feature of HBV because POL lacks proofreading ability. However, as a result of the constraints of the overlapping open reading frames and the presence of secondary RNA structures, such as epsilon, coded by nonoverlapping regions, the mutation rate of the various regions of the HBV genome varies [1, 2]. The HBV genome has been estimated to evolve at an error rate of approximately 10-3 to 10-6 nucleotide substitutions/site/year [3, 4, 5, 6, 7, 8, 9]. To date, based on an intergroup divergence >7.5% across the complete genome, HBV has been classified phylogenetically into 9 genotypes, A-I [10, 11, 12], with a putative 10th genotype, ‘J', isolated from a single individual [13]. With between approximately 4 and 8% intergroup nucleotide difference across the complete genome and good bootstrap support, genotypes A-D, F, H, and I are classified further into at least 35 subgenotypes (fig. 1; table 1) [11, 12, 14].
Comparison of the virological and clinical characteristics of the genotypes of HBVa

A rooted phylogenetic tree of 250 full genome sequences of HBV representative of the 9 genotypes, established using neighbor joining. Bootstrap statistical analysis was performed using 1,000 data sets and the numbers on the nodes indicate the percentage of occurrences. Subgenotypes are shown in red (color refers to the online version only). GenBank accession numbers for the representative sequences included in the phylogenetic analyses are as follows: Genotype A: subgenotype A1 (AB076678Malawi, AY233288South Africa, AB076679Malawi, FJ692592Haiti, AY233275South Africa, AY233284South Africa, FM199974Rwanda, FM199979Rwanda, AY934765Somalia, AY934771Somalia, DQ020003UAE, AB116094Philippines, AB116093Philippines, AB116086India, AB453988Japan, AB453989Japan, AB116082Bangladesh, AB116085Bangladesh, AB116088Nepal, AB116089Nepal, M57663Philippines, AB116087India, AF090842Belgium, AF043560Argentina), subgenotype A2 (AY233280South Africa, AJ344115France, AY233286South Africa, Z35717Poland, Z72478Germany, AJ309370France, AF536524USA, AB014370Japan, AY034878USA, X70185Germany, AY128092Canada, X02763USA, AB064314USA, AF537371USA, AF090838Belgium, AF090841Belgium, X51970Germany), quasi-subgenotype A3 (AY934763Gambia, AM180623Mali, AM180624Cameroon, AB194950Cameroon, AM184125Gambia, AM184126Gambia, AB194951Cameroon, AB194952Cameroon, FJ692554Nigeria, FJ692556Nigeria, FJ692607Haiti, FJ692600Haiti, FJ692604Haiti, FJ692602Haiti, FJ692606Haiti, FJ692601Haiti, FJ692603Haiti, FJ692605Haiti, AY934764Gambia, GQ161813Guinea), subgenotype A4 (GQ331048Belgium). Genotype B: subgenotype B1 (AB010290Japan, AB010291Japan, D00329Japan, AB073855 Japan, D23768Japan, D23679Japan), subgenotype B2 (AB073831Thailand, AF282918China, AB073832Thailand, AB073836Taiwan, AB073839Taiwan), quasi-subgenotype B3 (D00331Indonesia, AB033555Sumatra, M54923Indonesia, AB219428Philippines, AB219427Philippines, AB241117Philippines, AB219426Philippines, AB219429Philippines), subgenotype B4 (AY033072Netherlands*, AB115551Cambodia, AB117759Cambodia, AB073835Vietnam, AY033073Netherlands*, AB205122Vietnam), subgenotype B5 (DQ463797Canada, DQ463801Canada, DQ463799Canada, DQ463795Canada, DQ463802Canada, DQ463796Canada, DQ463798Canada, DQ463800Canada). Genotype C: subgenotype C1 (AB112472Thailand, AB112471Thailand, AB074756Thailand, AB117758Cambodia, AF068756Thailand, AB112066Myanmar, AB112348Myanmar, AY167099Taiwan, AB112063Vietnam, AB112065Vietnam), Quasi-subgenotype C2 (AY057947Tibet, AB014376Japan, AY066028China, AB014362Japan, AB014382Japan, S75184Japan, AB205123China, AY057947Tibet), subgenotype C3 (X75656Polynesia, AF458664China, X75665New Caledonia), subgenotype C4 (AB048704Australia, AB048705Australia), subgenotype C5 (AB241109Philippines, AB241110Philippines, AB241111Philippines), subgenotype C6 (AB493842Indonesia, AB554014Indonesia, AB493837Indonesia, AB554021Indonesia), subgenotype C7 (EU670263Philippines, GU721029South Korea), subgenotype C8 (AP011104Indonesia, AP011107Indonesia, AP011105Indonesia, AP011106Indonesia), subgenotype C9 (AP011108Indonesia), subgenotype C10 (AB540583Indonesia), subgenotype C11 (AB554020Indonesia, AB554019Indonesia), subgenotype C12 (AB554020Indonesia, AB554019Indonesia), subgenotype C12 (AB560662Indonesia, AB554025Indonesia, GQ358157Indonesia, AB554018Indonesia), subgenotype C13 (AB644280, AB644281), subgenotype C14 (GQ377555Indonesia, HM011493Indonesia), subgenotype C15 (AB644286Indonesia), subgenotype C16 (AB644287Indonesia). Genotype D: subgenotype D1 (FJ899792China, AB104711Egypt, JN642129Lebanon, GU456684Iran, AB583680Pakistan, FJ904415Tunisia, FJ386590China, GU456664Iran, GU456670Iran, GU456673Iran, JF754607Turkey, JF754629Turkey), subgenotype D2 (GU456635Iran, EU594410Estonia, GQ477453Poland, JF754597Turkey, JN642144Lebanon), subgenotype D3 (GQ922000Canada, EU594435Estonia, EU594434Estonia, AB493846Papua, AB493845Papua, AB493848Papua, AY233296South Africa, U95551USA, AY902773USA, AY902776USA, FJ692507Haiti, X85254Italy), subgenotype D4 (AB033559Papua, AB048701Australia, FJ692536Haiti, FJ692532Haiti, FJ692533Haiti), subgenotype D5 (AB033558Japan, GQ205382India, GQ205389India, GQ205384India, GQ205377India, GQ205385India, GQ205378India, DQ355779India), subgenotype D6 (FJ904433Tunisia, FJ904441Tunisia, FJ904438 Tunisia, FJ904394Tunisia, FJ904395Tunisia, FJ904416Tunisia, FJ904398 Tunisia, FJ904407Tunisia, FJ904408Tunisia). Genotype E: (X75664Senegal, AB091255Cote d'Ivoire, AB032431, AB091256Cote d' Ivoire, X75657Western Africa). Genotype F: subgenotype F1 (AY090461El Salvador, AY090459Costa Rica, AY090458Costa Rica, AF223963Argentina), subgenotype F2 (X69798Brazil, AY090455Nicaragua), subgenotype F4 (AF223965Argentina, AF223962Argentina, X75658France). Genotype G: (AB056513USA, AB056514USA, AB056515USA, AB064310USA, AB064311USA, AB064312USA, AB064313USA, AF160501USA, AF405706Germany). Genotype I: subgenotype I1 (AB231908Vietnam, AF241407Vietnam, AF241408Vietnam, FJ023661Laos, FJ023660Laos, FJ023663Laos, AF241409Vietnam, GU357844China), subgenotype I2 (FJ023672Laos, EU833891Canada*, FJ023667Laos, FJ023664Laos, FJ023673Laos, FJ023665Laos, FJ023666Laos, FJ023669Laos, FJ023670Laos, EU835240India, EU835241India, EU835242India).
A rooted phylogenetic tree of 250 full genome sequences of HBV representative of the 9 genotypes, established using neighbor joining. Bootstrap statistical analysis was performed using 1,000 data sets and the numbers on the nodes indicate the percentage of occurrences. Subgenotypes are shown in red (color refers to the online version only). GenBank accession numbers for the representative sequences included in the phylogenetic analyses are as follows: Genotype A: subgenotype A1 (AB076678Malawi, AY233288South Africa, AB076679Malawi, FJ692592Haiti, AY233275South Africa, AY233284South Africa, FM199974Rwanda, FM199979Rwanda, AY934765Somalia, AY934771Somalia, DQ020003UAE, AB116094Philippines, AB116093Philippines, AB116086India, AB453988Japan, AB453989Japan, AB116082Bangladesh, AB116085Bangladesh, AB116088Nepal, AB116089Nepal, M57663Philippines, AB116087India, AF090842Belgium, AF043560Argentina), subgenotype A2 (AY233280South Africa, AJ344115France, AY233286South Africa, Z35717Poland, Z72478Germany, AJ309370France, AF536524USA, AB014370Japan, AY034878USA, X70185Germany, AY128092Canada, X02763USA, AB064314USA, AF537371USA, AF090838Belgium, AF090841Belgium, X51970Germany), quasi-subgenotype A3 (AY934763Gambia, AM180623Mali, AM180624Cameroon, AB194950Cameroon, AM184125Gambia, AM184126Gambia, AB194951Cameroon, AB194952Cameroon, FJ692554Nigeria, FJ692556Nigeria, FJ692607Haiti, FJ692600Haiti, FJ692604Haiti, FJ692602Haiti, FJ692606Haiti, FJ692601Haiti, FJ692603Haiti, FJ692605Haiti, AY934764Gambia, GQ161813Guinea), subgenotype A4 (GQ331048Belgium). Genotype B: subgenotype B1 (AB010290Japan, AB010291Japan, D00329Japan, AB073855 Japan, D23768Japan, D23679Japan), subgenotype B2 (AB073831Thailand, AF282918China, AB073832Thailand, AB073836Taiwan, AB073839Taiwan), quasi-subgenotype B3 (D00331Indonesia, AB033555Sumatra, M54923Indonesia, AB219428Philippines, AB219427Philippines, AB241117Philippines, AB219426Philippines, AB219429Philippines), subgenotype B4 (AY033072Netherlands*, AB115551Cambodia, AB117759Cambodia, AB073835Vietnam, AY033073Netherlands*, AB205122Vietnam), subgenotype B5 (DQ463797Canada, DQ463801Canada, DQ463799Canada, DQ463795Canada, DQ463802Canada, DQ463796Canada, DQ463798Canada, DQ463800Canada). Genotype C: subgenotype C1 (AB112472Thailand, AB112471Thailand, AB074756Thailand, AB117758Cambodia, AF068756Thailand, AB112066Myanmar, AB112348Myanmar, AY167099Taiwan, AB112063Vietnam, AB112065Vietnam), Quasi-subgenotype C2 (AY057947Tibet, AB014376Japan, AY066028China, AB014362Japan, AB014382Japan, S75184Japan, AB205123China, AY057947Tibet), subgenotype C3 (X75656Polynesia, AF458664China, X75665New Caledonia), subgenotype C4 (AB048704Australia, AB048705Australia), subgenotype C5 (AB241109Philippines, AB241110Philippines, AB241111Philippines), subgenotype C6 (AB493842Indonesia, AB554014Indonesia, AB493837Indonesia, AB554021Indonesia), subgenotype C7 (EU670263Philippines, GU721029South Korea), subgenotype C8 (AP011104Indonesia, AP011107Indonesia, AP011105Indonesia, AP011106Indonesia), subgenotype C9 (AP011108Indonesia), subgenotype C10 (AB540583Indonesia), subgenotype C11 (AB554020Indonesia, AB554019Indonesia), subgenotype C12 (AB554020Indonesia, AB554019Indonesia), subgenotype C12 (AB560662Indonesia, AB554025Indonesia, GQ358157Indonesia, AB554018Indonesia), subgenotype C13 (AB644280, AB644281), subgenotype C14 (GQ377555Indonesia, HM011493Indonesia), subgenotype C15 (AB644286Indonesia), subgenotype C16 (AB644287Indonesia). Genotype D: subgenotype D1 (FJ899792China, AB104711Egypt, JN642129Lebanon, GU456684Iran, AB583680Pakistan, FJ904415Tunisia, FJ386590China, GU456664Iran, GU456670Iran, GU456673Iran, JF754607Turkey, JF754629Turkey), subgenotype D2 (GU456635Iran, EU594410Estonia, GQ477453Poland, JF754597Turkey, JN642144Lebanon), subgenotype D3 (GQ922000Canada, EU594435Estonia, EU594434Estonia, AB493846Papua, AB493845Papua, AB493848Papua, AY233296South Africa, U95551USA, AY902773USA, AY902776USA, FJ692507Haiti, X85254Italy), subgenotype D4 (AB033559Papua, AB048701Australia, FJ692536Haiti, FJ692532Haiti, FJ692533Haiti), subgenotype D5 (AB033558Japan, GQ205382India, GQ205389India, GQ205384India, GQ205377India, GQ205385India, GQ205378India, DQ355779India), subgenotype D6 (FJ904433Tunisia, FJ904441Tunisia, FJ904438 Tunisia, FJ904394Tunisia, FJ904395Tunisia, FJ904416Tunisia, FJ904398 Tunisia, FJ904407Tunisia, FJ904408Tunisia). Genotype E: (X75664Senegal, AB091255Cote d'Ivoire, AB032431, AB091256Cote d' Ivoire, X75657Western Africa). Genotype F: subgenotype F1 (AY090461El Salvador, AY090459Costa Rica, AY090458Costa Rica, AF223963Argentina), subgenotype F2 (X69798Brazil, AY090455Nicaragua), subgenotype F4 (AF223965Argentina, AF223962Argentina, X75658France). Genotype G: (AB056513USA, AB056514USA, AB056515USA, AB064310USA, AB064311USA, AB064312USA, AB064313USA, AF160501USA, AF405706Germany). Genotype I: subgenotype I1 (AB231908Vietnam, AF241407Vietnam, AF241408Vietnam, FJ023661Laos, FJ023660Laos, FJ023663Laos, AF241409Vietnam, GU357844China), subgenotype I2 (FJ023672Laos, EU833891Canada*, FJ023667Laos, FJ023664Laos, FJ023673Laos, FJ023665Laos, FJ023666Laos, FJ023669Laos, FJ023670Laos, EU835240India, EU835241India, EU835242India).
The genotypes differ in genome length (table 1), the size of open reading frames, and the proteins translated [11], as well as the development of various mutations [15]. Based on HBsAg heterogeneity [11], 9 serological subtypes, ayw1, ayw2, ayw3, ayw4, ayr, adw2, adw4, adwq, adr, and adrq-, have been identified. A broad, highly statistically significant correlation exists between serological subtypes and genotypes: adw is associated with genotypes A, B, F, G, and H, adrwith C, and ayw with D and E [14], but many exceptions exist. Moreover, mild immune selection may lead to mutations in the subtype determinants [W. Gerlich, pers. commun.]. In some cases, serological subtypes can be used to differentiate subgenotypes (table 1) [14]. Using the logistic regression method, 1802-1803CG in the basic core promoter region was shown to be characteristic of genotypes A, D, and E, whereas 1802-1803TT is characteristic of genotypes B, C, and F; 1858C is positively associated with genotypes A, F, and H and 1858T with genotypes B, D, and E [14]. Subgenotype C2 has 1858T as opposed to C1 which has 1858C, and F1/F4 can be differentiated from F2/F3 by having 1858T instead of 1858C [14]. 1888A is positively associated with subgenotype A1 [14].
Genotype A and Its Subgenotypes
Genotype A is characterized by a 6-nucleotide insert at the carboxyl end of the core gene. Comprehensive analysis of genotype A [16, 17] has led to the classification of this genotype into subgenotypes A1, A2, A4, and quasi-subgenotype A3 because the latter group of sequences does not meet the criteria for a subgenotype classification [11, 14]. Subgenotypes A1, A4, and quasi-subgenotype A3 are found mainly in Africa, whereas A2 prevails in northern and central Europe and North America [18].
Genotype B and Its Subgenotypes
Using phylogenetics and sequence divergence of >4%, the subgenotypes of B have been reclassified into 6 subgenotypes: B1, B2, B4-B6, and quasi-subgenotype B3 (table 1) [19]. Subgenotype B1, found mainly in Japan [20], and B5 (previously B6) from a Canadian Inuit population [21] represent genotype B without recombination with genotype C in the precore/core region, as opposed to the remaining subgenotypes of B that have this recombination [20]. Subgenotype B1 was probably the ancestor of B5, possibly carried by indigenous peoples during migration from Siberia and Alaska to North America and Greenland [22, 23].
Genotype C and Its Subgenotypes
According to Paraskevis et al. [8] genotype C is the oldest HBV genotype. It has the highest number of subgenotypes, C1-C16 [24, 25], reflecting the long duration of its endemicity in humans. A large number of subgenotypes circulate in Indonesia [24]. Subgenotype C4 is exclusively found in indigenous people of northern Australia [26], who are descended from a founder group that emigrated from Africa at least 50,000 years ago [27].
Genotype D and Its Subgenotypes
In a recent systematic and comparative analysis of the subgenotypes of D, it was concluded that there are 6, not 8, subgenotypes. Subgenotypes D1-D6 can be differentiated by distinct clustering with high bootstrap support and signature amino acids. Subgenotypes D3 and ‘D6' were reclassified as a single subgenotype D3, and ‘D8' was shown to be a genotype D/E recombinant rather than a subgenotype [28]. Subgenotype D4, which is found in aboriginal populations in Papua New Guinea and Australia [12] and in a small percentage of the Canadian Inuit population [29], may be an early subgenotype dating from the time of early human intercontinental migrations. Moreover, a recombinant of subgenotype D4 has been identified in Sudan [30]. It is possible that D4 originated in Africa but has subsequently been replaced by other subgenotypes of D and the recombinant is a remnant of the original strain/s [30]. Subgenotype D4 carriers were significantly older than both D3 and B6 carriers [23]. Although it has been suggested that subgenotype D5 is the most ancient of the subgenotypes of D [31], this conclusion is neither supported by the relatively low intragroup divergence of this subgenotype [28], nor by the dated HBV phylogeny of the subgenotypes of D [8]. Additional sequences of subgenotype D5 from geographical regions besides the Paharia tribe of India [31] may resolve these discrepancies.
Genotype E
Genotype E has the unique serological subtype ayw4 and can be differentiated from genotypes A-D, F, H, and I by a 3-nucleotide deletion in the preS1 region. This genotype is endemic in western, central Africa, with a low genetic diversity, intimating a recent emergence of 200 years or less [4]. As opposed to subgenotype A1, which was dispersed by the slave trade [18], genotype E is rarely found outside Africa, except in individuals of African descent, further supporting its recent emergence after the forced migrations of slavery. Using Bayesian inference, a median time of evolution from a most recent common ancestor (tMRCA) of 130 years has been calculated [4]. This differs from a tMRCA of 6,000 years estimated by others [8]. However, as suggested previously, it is possible that genotype E existed in indigenous African populations and has recently been reintroduced [32]. Genotype E has been isolated from Pygmies [33] and the Khoi San [Kramvis, unpubl. data], and in Colombia [34] and India [35] in individuals with no history of travel to or from Africa. Without an accurate determination of the nucleotide substitution rate of HBV, the variance of the estimated age of genotype E will be difficult to resolve.
Genotype F and Its Subgenotypes
All genotype F isolates belong to serological subtype adw4 and cluster into 4 subgenotypes, F1-F4. This genotype is found in the Amerindian populations of Central and South America, as well as Alaska.
Genotype G
Genotype G is characterized by a 36-nucleotide insert, 3′ of position 1905, and two translational stop codons at positions 2 and 28 of the precore/core region, abrogating HBeAg expression [11]. This genotype can establish chronic infection only in the presence of other genotypes, most frequently genotype A, that can supply HBeAg in trans. A major risk factor is sexual transmission by men who have sex with men. Genotype G is least divergent from genotype E, with which it shares the 3-nucleotide deletion in the core region and a unique sequence in the preS [11, 36]. Although not yet detected in Africa, an African origin of genotype G has been postulated [36].
Genotype H
Genotype H prevails in Mexico in both the indigenous populations and mestizos (mixed descent), suggesting this genotype has a long history among the descendants of the Aztecs, before the arrival of Europeans [37]. It is most closely related to genotype F.
Genotype I and Its Subgenotypes
In 2008, sequence analysis of the complete genome of a single isolate (AB231908) from a Vietnamese male found it to be closely related to 3 previously described ‘aberrant' Vietnamese strains [38] and a 9th genotype, I, was proposed [39]. This proposal was not accepted because the mean genetic divergence of these 4 strains from genotype C was 7% and the recombination analysis was not robust [40]. Subsequently, sequences derived from Laos [41], the Idu Mishmi tribe in northeast India [42], a Canadian of Vietnamese descent [43], and China [10] have expanded the number of sequences. The nucleotide divergence of most of these sequences relative to genotype C was at least 7.5%, with good bootstrap support for the group, thus meeting the criteria for genotype assignment [14]. Two subgenotypes, I1 and I2, with serological subtypes adw2 and ayw2, respectively, were described [41]. This separation into subgenotypes was questioned when additional strains from India clustering within subgenotype I2 were sequenced and the intersubgenotype divergence was calculated to be <4% [42]. By analyzing all 19 complete genotype I genomes, without indels, available in the GenBank, the intergroup divergence between subgenotype I1 and I2 was found to be 3.40 ± 0.30% (mean ± SD), below the 4% cutoff. However, as for subgenotype D1 and D2 [28], an exception can be made because of the different serological subtypes. The highest intergroup divergence of 4.1% was between the Laotian strain FJ023663 (I1) and the Indian strain EU835242 (I2). The wide geographical distribution suggests this genotype has been endemic in a wide area of Asia for a long time [10]. Genotype I is a recombinant of genotypes A/C/G and an indeterminate genotype [10, 41, 42, 43], which clusters close to genotype C when the complete genome is analyzed, and with genotype A in the polymerase [10]. The genotype A and C regions are closely related to subgenotypes A3 and C3, respectively [10, 41, 42, 43]. Genotype I has been functionally characterized in both Huh7 cells and by acute hydrodynamic infection of a mouse. In both systems, genotype I secreted HBsAg at levels comparable to genotypes A, B, and C, and higher than D, but HBeAg at similar levels to genotype A but lower than B, C, and D [10].
‘Genotype J'
This strain was isolated from a single Japanese man with hepatocellular carcinoma (HCC), who had lived in Borneo for a prolonged period of time [13]. The complete genome clusters with nonhuman HBV, including isolates from gibbons, orangutans, chimpanzees, and gorillas. When compared with 1,440 human and nonhuman HBV strains, its sequence diverged by 10.7-15.7% from other genotypes and did not show any evidence of recombination [13]. In a later analysis, using additional gibbon/orangutan HBV sequences for comparison, Locarnini et al. [44] concluded that genotype ‘J' is in fact a recombinant of genotype C and gibbon HBV in the S region. Thus, although the high intergroup divergence of genotype ‘J' meets the criterion for classification into a separate genotype, it may represent a cross-species transmission [13], and identification and analysis of additional sequences will be required before the existence of this 10th genotype can be confirmed.
Recombinants
Variation and evolution of HBV has been influenced by recombination between genotypes [45]. By carrying out a recombination analysis of 3,400 nearly full-length HBV sequences, Shi et al. [46] identified 44 patterns of intergenotypic recombination. Only genotype H and putative genotype ‘J' showed no recombination, whereas genotype I was composed entirely of recombinants and 93% of genotype B were recombinants [46]. The breakpoints occurred most frequently in the BCP/PC region, with breakpoints found in the small S and core regions [45, 46].
Geographical Distribution of Genotypes/Subgenotypes
HBV genotypes, and in some cases subgenotypes, have distinct geographical distribution both globally and locally [11] (table 1). Genotype A circulates in Africa, Europe and the Americas, genotypes B and C in Asia, and genotypes F and H in Southern and Central America. Subgenotype A1 prevails in Africa and outside Africa in regions with a history of slave trade, whereas A2 is found mainly outside Africa [18]. Similarly, subgenotype B1 is generally confined to Japan and B2 is found in Southeast Asia. In the Canadian Arctic, genotype D prevails in the west whereas subgenotype B6 is found in the eastern regions [23]. These genotypes also have a distinct ethnic distribution, with genotype B found in the Inuits and genotype D in Denes [23]. The geographical distribution can be influenced by both ethnicity and migration between regions. In the USA, immigrants from Asia are infected with either genotype B or C, whereas in Europe, immigrants from Africa are generally infected with genotype E. Genotype I has been spread from Vietnam to Canada [43] and France [47] by immigration and adoption, respectively.
Effect of Genotypes/Subgenotypes on the Natural History of HBV Infection and Response to Antiviral Therapy
HBV genotypes may be responsible for differences in the natural history of chronic HBV infection [48], and therefore play a role in the clinical manifestation of infection and response to antiviral therapy [15, 49, 50]. Patients infected with genotypes A, B, D, and F show earlier and more frequent spontaneous HBeAg seroconversion compared to those infected with genotype C, independent of ethnicity [48, 51, 52]. Patients infected with genotype E have a higher frequency of HBeAg positivity and higher viral loads compared to patients infected with genotype D [30]. Individuals infected with subgenotype A1 lose HBeAg much earlier than those infected with subgenotype A2 [53]. Although HBeAg is not required for viral replication, it has been shown to act as an immune tolerogen [54] and to play a role in the transmission of HBV, viral persistence, and the establishment of chronic hepatitis B [55]. An essential requirement for acute HBV infection becoming chronic is the expression of HBeAg [56, 57]. In geographical regions with early HBeAg seroconversion, perinatal HBV transmission is rare because few women are HBeAg positive during their reproductive phase [58, 59]. Thus, perinatal transmission is common in areas where genotype C and I prevail (table 1). Transmission of HBV occurs in a higher proportion of children born to genotype C-infected mothers than genotype B-infected mothers [60].
HBV genotypes/subgenotypes A1, C, B2-B4, F1, and perhaps D show a higher risk of serious complications of HBV infection, including cirrhosis and development of HCC, compared to A2, B1, and B5 [50]. Genotype C infections have increased and earlier risk of liver inflammation, liver fibrosis, cirrhosis, and HCC than those infected with other genotypes [61]. The subgenotypes of B, with genotype C recombination, also cause more severe liver disease compared to subgenotypes of B without recombination. The risk of southern Africans infected with subgenotype A1 of developing HCC is 4.5 times that of patients infected with other genotypes, and they develop HCC 6.5 years earlier [62]. A similar association of subgenotype A1 with HCC and its development at a younger age is also found in southern Indians [63]. Subgenotype A2 is well adapted to sexual transmission and can establish chronic infection in adults [61]. In India, subgenotype D1 is significantly associated with chronic liver disease and D3 with occult HBV infection [64]. Alaskans infected with genotype C or F are more likely to revert to HBeAg positivity and therefore are at more risk at developing severe liver disease than those infected with genotypes A, B, and D [52]. In Sudan, genotype E is found to prevail in asymptomatic blood donors [65], whereas genotype D predominates in liver disease patients [30]. Occult HBV infection, characterized by low viral loads, is common in patients infected with genotype H, and the development of HCC is rare [37].
Patients infected with genotype A or B respond better to interferon-based therapy compared to patients infected with either genotype C or D (reviewed in [15, 66]). No significant difference in response of the different genotypes/subgenotypes to nucleos(t)ide analogue therapy has been found [66].
The role of HBV genotypes in the clinical manifestation of disease following HBV infection may be related to the development of specific mutations in the BCP/PC region as well as deletions in the preS region of the genome. Genotype C has a higher propensity to develop mutations compared to genotype B [67]. For example, individuals infected with B5 (previously B6) did not develop HCC, which may be related to the fact that BCP/PC mutations and preS deletions do not develop in this subgenotype, whereas they were frequent in genotype F, which led to the development of HCC in infected individuals [68]. A discussion of the development of these mutations is beyond the scope of the present review and the reader is referred to the following papers and references cited therein [14, 15, 66, 67].
A number of studies have shown the genotypes/subgenotypes of HBV are important in both clinical manifestation of infection and response to antiviral therapy. HBV genotypes/subgenotypes and the genetic variability of HBV are useful in epidemiological studies, tracing human migrations, and in predicting the risk of development of severe liver disease and response to antiviral therapy. Moreover, knowledge of the genotype/subgenotype is important for implementing preventative strategies. Thus, it is crucial that new strains are correctly assigned to their respective genotype/subgenotype and that a consistent, unambiguous, and generally accepted nomenclature is utilized. With the exponential growth in the number of sequences published in public databases, misclassifications have been described and corrected in some cases [16, 17, 19, 25, 69]. A set of minimum criteria should be followed when proposing a new genotype or subgenotype:
(1) A number of sequences of complete genomes of the correct length, for the respective genotype/subgenotype are essential [11, 70]; the clinical relevance and epidemiology of a genotype/subgenotype is impossible to determine from a single strain; single strains should be designated as a putativegenotype/subgenotype until additional strains become available
(2) A nucleotide divergence of at least 7.5% to separate strains into genotypes [14]
(3) A nucleotide divergence of between approximately 4 and 7.5%, monophyletic clustering and good bootstrap support to separate subgenotypes [14]
(4) Distinct geographical separation (D1 vs. D2) and/or different serological subtypes (D1 vs. D2, and I1 vs. I2), monophyletic clustering and good bootstrap support, to separate subgenotypes, when the nucleotide divergence is <4%
(5) As many available sequences as possible, instead of a few representative ones, should be used for comparison, in order to make the analysis robust; the use of too few sequences can lead to misclassifications
(6) Sequences selected for comparison should not have indels or be recombinants of defined genotypes, as these can lead to an artificial increase in nucleotide divergence
(7) New strains should be checked for recombinations between genotypes/subgenotypes [46]
(8) Recent literature should be consulted in order to obtain up-to-date information on genotype/subgenotype assignment, misclassifications, or new assignments [16, 17, 19, 69]