The database at Nutrigenetics.net has been under development since 2007 to facilitate the identification and classification of PubMed articles relevant to human genetics. A controlled vocabulary (i.e., standardized terminology) is used to index these records, with links back to PubMed for every article title. This enables the display of indexes (alphabetical subtopic listings) for any given topic, or for any given combination of topics, including for genes and specific genetic variants. Stepwise use of such indexes (first for one topic, then for combinations of topics) can reveal relationships that are otherwise easily overlooked. These relationships include environmental and lifestyle variables with potential relevance to risk modification (both beneficial and detrimental), and to prevention, or at least to the potential delay of symptom onset for health conditions like Alzheimer disease among many others. Thirty-four specific genetic variants have each been mentioned in at least ≥1,000 PubMed titles/abstracts, and these numbers are steadily increasing. The benefits of indexing with standardized terminology are illustrated for genetic variants like MTHFR 677C-T and its various synonyms (e.g., rs1801133 or Ala222Val). Such use of a controlled vocabulary is also helpful for numerous health conditions, and for potential risk modifiers (i.e., potential risk/effect modifiers).

The PubMed database at https://pubmed.ncbi.nlm.nih.gov/ is an online resource freely provided by the National Library of Medicine which allows users to browse through >30 million journal article records. For highly specific queries, the number of search results can be kept to a manageable number of citations. However, for larger result sets, PubMed does not provide a way to sort the results by subtopics. A breakout by subtopics could help identify not only individual variables or subtopics of interest, but also general classes of subtopics like health conditions, potential risk (or effect) modifiers like drugs or dietary practices, lifestyle variables such as exercise or tobacco use, genetic variables such as mutations, common genetic variants, or gene expression, etc.

Some topics like gene expression have been mentioned in hundreds of thousands of PubMed articles. Regarding genetic variants, APOE4 has been specifically mentioned using one of its various synonyms in >9,000 PubMed records. The BRAF p.Val600Glu and MTHFR 677C-T variants have both been mentioned in >5,000 PubMed titles or abstracts. Table 1 lists 34 genetic variants which have each been mentioned in at least 1,000 PubMed records, and the volume of articles on these and many other variants continues to grow steadily.

Table 1.

PubMed record counts for the listed genetic variants which are frequently mentioned in PubMed titles/abstracts terms for the publication years 2000 to mid-2020 (a combined total of >64,000 PubMed records), with counts for those same PubMed records which also mention in titles/abstracts/MeSH terms at least one of the displayed health conditions, or the displayed potential risk/effect modifiers.

PubMed record counts for the listed genetic variants which are frequently mentioned in PubMed titles/abstracts terms for the publication years 2000 to mid-2020 (a combined total of >64,000 PubMed records), with counts for those same PubMed records which also mention in titles/abstracts/MeSH terms at least one of the displayed health conditions, or the displayed potential risk/effect modifiers.
PubMed record counts for the listed genetic variants which are frequently mentioned in PubMed titles/abstracts terms for the publication years 2000 to mid-2020 (a combined total of >64,000 PubMed records), with counts for those same PubMed records which also mention in titles/abstracts/MeSH terms at least one of the displayed health conditions, or the displayed potential risk/effect modifiers.

In addition to the daunting volume of this literature, another challenge when using PubMed is the inconsistent terminology used by different authors, particularly when naming the genetic variants being reported. For anyone not already familiar with most of the synonyms, this can result in missing many of (sometimes even most of) the potentially relevant articles when searching in PubMed.

To help deal with such challenges, the development of the database at Nutrigenetics.net began in early 2007, with the goal of identifying PubMed articles relevant to human genetics and the related omics. Records are created in the Nutrigenetics.net database which correspond to PubMed records, but are further indexed using a controlled vocabulary which also takes into account the most common synonyms.

Perhaps the greatest benefit of using such indexing is that it enables individual online database users to create their own customized index (i.e., an alphabetical subtopic listing with subtopic article number counts) for any given topic, or combination of topics, including genes and genetic variants. Stepwise use of such indexes (first for 1 topic only, and then for ≥2 in combination) enables a multitopic exploration of potentially important relationships that can otherwise be easily overlooked. Once a topic or combination of topics is selected using the indexes for guidance, the corresponding PubMed titles can be easily displayed via the Index/Search drop-down box, and each title serves as a link back to PubMed. More complete bibliographic information, including the PubMed article IDs (PMID numbers) can be found by clicking on the “View Printer-Friendly Indexes/Searches” link.

Exploring indexes with the aim of discovering/recognizing less obvious relationships can also stimulate the interest and engagement of a wider audience. For researchers and clinicians, this can lead to both investigational and translational opportunities, including those relevant to the prevention of health consequences. Beyond health professionals, students, educators, and other members of the public like journalists can benefit, especially in the light of the abundance of new articles which appear continuously. The large new-article volume makes it difficult for journalists and others to effectively inform members of the public, resulting in a growing information gap [1].

Another system of terminology standardization for genes and gene-related proteins has been developed by the Gene Ontology Consortium [2], but with different aims. The Nutrigenetics.net database provides a unique tool for enhancing the identification of genetics-relevant PubMed articles for almost anyone who is interested, and is especially useful for the purposes of education, engagement, and information dissemination. Because of the inherent intricacies associated with genetics and the related omics, the goal here is to equip both the public and professionals to hold a more effective dialogue, thereby facilitating partnerships that can result in appropriate applications of the emerging science.

The terminology used by the Nutrigenetics.net database for indexing is often based on the National Library of Medicine’s Medical Subject Headings or MeSH terms (https://meshb.nlm.nih.gov/search). For most genes, PubMed lists a corresponding gene product/protein in their MeSH terms; however, because the Nutrigenetics.net database is gene-centric, official gene symbols (https://www.genenames.org) are used to index for both the genes themselves and for any corresponding protein products. Each gene variant name uses the gene symbol as a prefix, e.g., MTHFR 677C-T. This allows the gene and any related gene variants to be grouped together within an alphabetical index listing.

Gene variant terminology presents a particular challenge, since more than one naming convention is often used. With very few exceptions, PubMed does not list gene variants within its MeSH terms. To illustrate the magnitude of the terminology challenge, when a gene variant occurs within a coding region of a gene, it often results in an amino acid substitution. For gene variants like MTHFR 677C-T, instead of referring to the nucleotide substitution, some authors refer, instead, to the corresponding amino acid substitution with 3-letter abbreviations for the amino acid change (e.g., Ala222Val), while other authors use single-letter abbreviations (A222V). If the gene variant happens to be a single-nucleotide polymorphism (SNP) like MTHFR 677C-T, then some authors prefer to use the reference SNP number (“rs number”), which for this variant is rs1801133 (https://www.ncbi.nlm.nih.gov/snp/). Others may prefer to use an alternative nucleotide substitution designation for the variant, e.g., C665T or C677T.

Clearly, the multiple naming options make it more difficult to find all of the relevant literature in PubMed. When PubMed article records are carefully searched for all of the potential naming options in combination with the gene symbol or protein name (methylenetetrahydrofolate reductase in this example), the total number of PubMed articles for this gene variant is >5,000. If, instead, PubMed is searched for rs1801133 alone, without the other synonyms, then only about 500 records are currently found, i.e., around only 10% of the total number of articles that are actually of potential relevance.

Another caveat is that when more than just a few rs numbers are mentioned in an article, typographical errors may easily creep in. Moreover, some authors append information about the nucleotide changes (alleles) to the end of the rs numbers, which can then result in missing the article when conducting a search in PubMed. For instance, a PubMed search for rs1801133 will not find rs1801133C or rs1801133C-T [3] unless a wildcard symbol (*) is added to the end of the rs number, e.g., rs1801133*.

The use of rs numbers can be helpful for many gene variants, and their use has been encouraged [4, 5]. One major limitation is that rs numbers have not been assigned to every type of genetic variant. Even when assigned, some rs numbers like rs121912438 can refer to >1 nucleotide substitution, which can result in >1 amino acid substitution. For the multiple reasons mentioned above, it is clear that a complete transition to the use of rs numbers for all types of gene variants is unlikely to occur. For this reason, the Nutrigenetics.net database selects one of the most commonly used synonyms as a standard, and then adds cross-references to the other synonyms.

Besides rs numbers, another major naming convention for gene variants is to refer to the amino acid substitution, which creates its own opportunities for misinterpretation. For any gene variants that are unfamiliar, note that single-letter amino acid abbreviations can be easily confused with nucleotide base names (e.g., “A” can refer to either alanine or adenine, “T” either threonine or thymine, “C” either cysteine or cytosine, and G either glycine or guanine). Therefore, the Nutrigenetics.net database follows the naming convention used by the OMIM database (https://www.ncbi.nlm.nih.gov/omim), where nucleotide substitutions are denoted by the letter(s) following the location, e.g., 677C-T is used instead of C677T for the example of MTHFR described above. Although the risk of such confusion is unlikely for substitution variant names like MTHFR A222V, it is more apt to occur for others like SOD1 Gly93Ala which is frequently abbreviated in PubMed as G93A. The database at Nutrigenetics.net deals with this by adding the following type of cross-reference to the auto-suggest feature when selecting the topic(s) of interest: SOD1 G93A (select SOD1 GLY93ALA instead). This database also displays the most common synonyms for genes and genetic variants (and many other topics) whenever an alphabetical index listing of subtopics is generated, such as that in the description column of Figure 1 for the gene MTHFR.

Fig. 1.

The Web page at nutrigenetics.net/StartYourResearch.aspx is for the creation of an index or a search for any given topic or combination of topics by using the Index/Search drop-down box. Topics which are available within this database can be found using the auto-suggest feature while typing but can also be found by looking at a user-created index. Notice the synonyms provided in the Description column in this Index example (note that the special search topic “34 Selected genetic variants” can be used in search topic combinations to find the PubMed records which correspond to the article counts shown in Table 1.

Fig. 1.

The Web page at nutrigenetics.net/StartYourResearch.aspx is for the creation of an index or a search for any given topic or combination of topics by using the Index/Search drop-down box. Topics which are available within this database can be found using the auto-suggest feature while typing but can also be found by looking at a user-created index. Notice the synonyms provided in the Description column in this Index example (note that the special search topic “34 Selected genetic variants” can be used in search topic combinations to find the PubMed records which correspond to the article counts shown in Table 1.

Close modal

Now that the PubMed evidence base has become quite extensive for genetics-related topics, database assistance with information management is clearly useful. The 34 genetic variants listed in Table 1 (each of which has been mentioned in at least ≥1,000 PubMed records) represent a total of >64,000 PubMed records when combined (for the publication year 2000 up to mid-2020). The middle and right-most sections of Table 1 show how many articles from within this group of 64,000 PubMed citations also mention ≥1 of the various health conditions or potential risk (or effect) modifiers listed.

Lists of potential risk/effect modifiers like those shown in Table 1 can be useful for identifying opportunities for more research or for practical applications, including opportunities for managing or mitigating risks. However, it should be cautioned that risk/effect modifiers can be either helpful or harmful depending on the circumstances, which is why individualized, precision healthcare has been gaining attention.

The potential role of both environmental and lifestyle factors is illustrated by even just a cursory inspection of the potential risk/effect modifiers listed, including things like nutrition, smoking, alcohol consumption, stress, or exercise, to name just a few. Although many of these are already recognized by health professionals, this type of listing, especially when correlated with the matching PubMed records, can be extremely useful for educational purposes, disseminating information, and engaging members of the public. Several genomics-guided studies have also found encouraging results with regard to study outcomes [6-8].

There are tens of thousands of PubMed articles which touch on genetics/genomics and the environment, with articles on lifestyle also gaining momentum [9-11]. As might be expected, some of the most commonly mentioned potential lifestyle/environment risk/effect modifiers include smoking, exercise or the lack thereof (a sedentary lifestyle), pollution of various types, alcohol, social environment, vitamins or dietary supplements, dietary fats, stress, intestinal microbiota, pesticides, fruits and vegetables, and meats (especially red or processed meats).

Awareness of potential risk/effect modifiers can also be an important factor when it comes to prevention. The number of PubMed article records relevant to risk score, gene score, polygenic score, or genotype score, has increased dramatically in recent years, and some may be on the verge of clinical application [12].

The Nutrigenetics.net database is a subset of PubMed for the publication years 2000 up to the present for articles relevant to human genetics/epigenetics and the related omics (and potentially relevant models) but has been indexed further with a controlled vocabulary. Access to the database is freely available to the public on weekends (US Pacific time) via the complimentary login, to be found at: https://www.nutrigenetics.net/Login.aspx.

The original focus of the Nutrigenetics.net database was nutrition. However, because nutrition is a profound example of an environmental variable with many overlaps into other health-related areas, the database soon expanded its coverage to include all types of interactions such as lifestyle, social environment, pharmaceuticals, etc., all of which can be important for physical and mental health and well-being as well as performance.

Except for genes and genetic variants, most of the standardized terminology used for indexing in the Nutrigenetics.net database follows the conventions used by the Medical Subject Headings (https://meshb.nlm.nih.gov/search). For genetic variants, it should be noted that most of the PubMed records contain traditional naming conventions. The Nutrigenetics.net database therefore often uses a traditional term for a genetic variant but includes cross-references with other synonyms, including rs numbers where applicable.

An effort is made to keep the Nutrigenetics.net database reasonably current, especially for genetic variants and related subtopics frequently mentioned in PubMed. For instance, a higher priority is placed on adding records where PubMed mentions an rs number in titles or abstracts, e.g., FTO rs9939609. Indexing for certain topics like FTO rs9939609 is straightforward because there are essentially no synonyms for this particular variant in PubMed. In sharp contrast, for rs numbers like rs1801133 (MTHFR 677C-T), the indexing process involves multiple synonyms that must all be taken into consideration, searched, and combined in order to identify all of the relevant PubMed records, and priority is given to updating the indexing for the more frequently cited genetic variants.

The Nutrigenetics.net database uses a proprietary combination of both manual and semiautomated processes for identifying the relevant PubMed records and indexing them with a controlled vocabulary. Controlled-vocabulary indexing with cross-referencing requires extra effort compared to simple full-text indexing, so some variability in update frequency does occur. The Nutrigenetics.net database serves as an auxiliary, multitopic indexing tool which can reveal otherwise easily overlooked relationships. It is intended to further enhance PubMed’s usefulness but not to compete with or replace PubMed. New topics and cross-references to synonyms are added on an ongoing basis and in response to user requests submitted to info@Nutrigenetics.net.

Since its inception, the database at Nutrigenetics.net has been adding records which correspond to PubMed at an average rate of >100,000 citations per year, with the current total count at >1.7 million records (the latest count, plus additional information can be found at https://nutrigenetics.net/AboutUs/FrequentlyAskedQuestions.aspx).

With the continuing emergence of environmental and lifestyle evidence, along with abundant reports on genetics/epigenetics and the related omics, there is now a growing opportunity for its practical use [13]. Any translational applications must be chosen and applied responsibly; this presents its own challenges including the acute need for genetics literacy/education, information dissemination, partnering with and among healthcare professionals, community and public engagement, and avoiding/minimizing healthcare disparities [14-20]. As more PubMed records appear, the aim of the Nutrigenetics.net database is to become increasingly useful to all potential audiences.

R.L.M. is the founder and president of Nutrigenetics Unlimited, Inc., which produces the database at Nutrigenetics.net. He is one of the original members of the International Society of Nutrigenetics/Nutrigenomics (ISNN) and has been providing ISNN members with 24/7 access to the Nutrigenetics.net database.

There was no funding.

1.
Sugawara
Y
,
Narimatsu
H
,
Fukao
A
.
Coverage of genomic medicine: information gap between lay public and scientists
.
Risk Manag Healthc Policy
.
2012
;
5
:
83
90
.
[PubMed]
1179-1594
2.
Shoop
E
,
Casaes
P
,
Onsongo
G
,
Lesnett
L
,
Petursdottir
EO
,
Donkor
EK
, et al
Data exploration tools for the Gene Ontology database
.
Bioinformatics
.
2004
Dec
;
20
(
18
):
3442
54
.
[PubMed]
1367-4803
3.
Salimi
S
,
Keshavarzi
F
,
Mohammadpour-Gharehbagh
A
,
Moodi
M
,
Mousavi
M
,
Karimian
M
, et al
Polymorphisms of the folate metabolizing enzymes: association with SLE susceptibility and in silico analysis
.
Gene
.
2017
Dec
;
637
:
161
72
.
[PubMed]
0378-1119
4.
Yu
W
,
Ned
R
,
Wulf
A
,
Liu
T
,
Khoury
MJ
,
Gwinn
M
.
The need for genetic variant naming standards in published abstracts of human genetic association studies
.
BMC Res Notes
.
2009
Apr
;
2
(
1
):
56
.
[PubMed]
1756-0500
5.
Wei
CH
,
Phan
L
,
Feltz
J
,
Maiti
R
,
Hefferon
T
,
Lu
Z
.
tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine
.
Bioinformatics
.
2018
Jan
;
34
(
1
):
80
7
.
[PubMed]
1367-4803
6.
Horne
J
,
Gilliland
J
,
O’Connor
C
,
Seabrook
J
,
Madill
J
. Enhanced long-term dietary change and adherence in a nutrigenomics-guided lifestyle intervention compared to a population-based (GLB/DPP) lifestyle intervention for weight management: results of the NOW randomised controlled trial. BMJ Nutrition, Prevention & Health.
2020
; 0. doi: (https://nutrition.bmj.com/content/bmjnph/early/2020/05/21/bmjnph-2020-000073.full.pdf)
7.
Horne
J
,
Madill
J
,
O’Connor
C
,
Shelley
J
,
Gilliland
J
.
A systematic review of genetic testing and lifestyle behaviour change: are we using high-quality genetic interventions and considering behaviour change theory?
Lifestyle Genomics
.
2018
;
11
(
1
):
49
63
.
[PubMed]
2504-3161
8.
Nielsen
DE
,
El-Sohemy
A
: Disclosure of genetic information and change in dietary intake: a randomized controlled trial. PLoS One. 2014 Nov 14;9(11):e112665. doi: . eCollection
2014
.
9.
Mutch
DM
,
Zulyniak
MA
,
Rudkowska
I
,
Tejero
ME
.
Lifestyle Genomics: addressing the multifactorial nature of personalized health
.
Lifestyle Genomics
.
2018
;
11
(
1
):
1
8
.
[PubMed]
2504-3161
10.
Katz
DL
,
Frates
EP
,
Bonnet
JP
,
Gupta
SK
,
Vartiainen
E
,
Carmona
RH
.
Lifestyle as Medicine: The Case for a True Health Initiative
.
Am J Health Promot
.
2018
Jul
;
32
(
6
):
1452
8
.
[PubMed]
0890-1171
11.
Zubair
N
,
Conomos
MP
,
Hood
L
,
Omenn
GS
,
Price
ND
,
Spring
BJ
, et al
Genetic predisposition impacts clinical changes in a lifestyle coaching program
.
Sci Rep
.
2019
May
;
9
(
1
):
6805
.
[PubMed]
2045-2322
12.
Dron
JS
,
Hegele
RA
.
The evolution of genetic-based risk scores for lipids and cardiovascular disease
.
Curr Opin Lipidol
.
2019
Apr
;
30
(
2
):
71
81
.
[PubMed]
0957-9672
13.
Groopman
JD
.
Environmental health in the biology century: transitions from population to personalized prevention
.
Exp Biol Med (Maywood)
.
2019
Jun
;
244
(
9
):
728
33
.
[PubMed]
1535-3702
14.
Mazzarella
L
.
Are we ready for routine precision medicine? Highlights from the Milan Summit on Precision Medicine, Milan, Italy, 8-9 February 2018
.
Ecancermedicalscience
.
2018
Mar
;
12
:
817
.
[PubMed]
1754-6605
15.
Lacombe
D
,
Bogaerts
J
,
Tombal
B
,
Maignen
F
,
Osipienko
L
,
Sullivan
R
, et al
Late translational research: putting forward a new model for developing new anti-cancer treatments that addresses the needs of patients and society
.
Mol Oncol
.
2019
Mar
;
13
(
3
):
558
66
.
[PubMed]
1574-7891
16.
Drake
TM
,
Knight
SR
,
Harrison
EM
,
Søreide
K
.
Global inequities in precision medicine and molecular cancer research
.
Front Oncol
.
2018
Sep
;
8
:
346
.
[PubMed]
2234-943X
17.
Laviolle
B
,
Denèfle
P
,
Gueyffier
F
,
Bégué
É
,
Bilbault
P
,
Espérou
H
, et al;
participants of Giens XXXIV Round Table “Translational research”
.
The contribution of genomics in the medicine of tomorrow, clinical applications and issues
.
Therapie
.
2019
Feb
;
74
(
1
):
9
15
.
[PubMed]
0040-5957
18.
Horgan
D
,
Lal
JA
.
Making the most of innovation in personalised medicine: an EU strategy for a faster bench to bedside and beyond process
.
Public Health Genomics
.
2018
;
21
(
3-4
):
101
20
.
[PubMed]
1662-4246
19.
Meyer
SL
.
Toward precision public health
.
J Public Health Dent
.
2019
Apr
;
•••
:
[PubMed]
0022-4006
20.
Roberts
MC
,
Mensah
GA
,
Khoury
MJ
.
Leveraging implementation science to address health disparities in genomic medicine: examples from the field
.
Ethn Dis
.
2019
Feb
;
29
Suppl 1
:
187
92
.
[PubMed]
1049-510X
Open Access License / Drug Dosage / Disclaimer
This article is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND). Usage and distribution for commercial purposes as well as any distribution of modified material requires written permission. Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug. Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.