Molecule Tutorials - Herong's Tutorial Examples - v1.26, by Herong Yang
Gene Mutation Naming Convention
Provides a quick introduction of gene mutation naming convention, also referred as 'nomenclature system for human gene mutations' based on recommendations proposed by 'Human Genome Variation Society (hgvs.org)'.
What Is Mutation Naming Convention? - Mutation naming convention, also referred as "nomenclature system for human gene mutations", is a notation system that describes gene mutations.
Currently, "Human Genome Variation Society (hgvs.org)" is the leading organization to coordinate recommendations on mutation naming convention.
Some basic recommendations are listed below in different categories:
1. General recommendations -
1.1. Reference sequence - A reference sequence ID followed by a colon (:) should be provided to identify the original gene sequence. For example, NM_000546 is the accession number for gene TP53 and can be used as the reference sequence ID.
1.2. Letter prefix - A prefix letter followed by a dot (.) is required to indicate the type of the sequence to be described. Examples are:
“c.” for a coding DNA reference sequence “g.” for a linear genomic reference sequence “r.” for an RNA reference sequence (transcript) “p.” for a protein reference sequence
2. Coding DNA and linear genomic recommendations -
2.1. Substitution - A substitution refers to one nucleotide being replaced by one other nucleotide, and follows the syntax below:
"position”“original”>”new” For example: NC_000023.10:g.33038255C>A
2.2. Deletion - A deletion refers to one or more nucleotides being deleted, and follows the syntax below:
"position”del "start"_"end”del For example: NG_012232.1:g.19del NG_012232.1:g.19_21del LRG_199t1:c.720_991del
2.3. Insertion - An insertion refers to one or more nucleotides being inserted, and follows the syntax below:
"position"_"next”ins"sequence" "position"_"next”ins("number") where "next" = "position"+1 "number" = number of nucleotides inserted For example: NC_000023.10:g.32862923_32862924insCCT LRG_199t1:c.240_241insAGG
2.4. Inversion - An inversion refers to multiple nucleotides being reversed, and follows the syntax below:
"start"_"end”inv For example: NC_000023.10:g.32361330_32361333inv NM_004006.2:c.5657_5660inv
2.5. Deletion-insertion - A deletion-insertion refers to one or more nucleotides being replaced by one or more other nucleotides, and follows the syntax below:
"position”delins"sequence" "start"_"end”delins"sequence" For example: NC_000023.11:g.32386323delinsGA NM_004006.2:c.6775_6777delinsC LRG_199t1:c.145_147delinsTGG
3. Protein recommendations - Both 3-letter and 1-letter amino acid codes can be used. "Ter" or "*" is as amino acid code to represent the stop codon. "=" can be used to represent the oringal amino acid code to indicate a silent mutation.
3.1. Substitution - A substitution refers to one amino acid being replaced by one other amino acid, and follows the syntax below:
“original”"position””new” Where "original" and "new" can be 3-letter or 1-letter amino acid code For example: NP_003997.1:p.Trp24Cys or NP_003997.1:p.W24C LRG_199p1:p.Trp24Ter, or p.Trp24*, or p.W24* NP_003997.1:p.Cys188Cys, or Cys188=, or C188C, or C188=
3.2. Deletion - A deletion refers to one or more amino acids being deleted, and follows the syntax below:
"position”del "start"_"end”del For example: LRG_199p1:p.Val7del or p.V7del NP_003997.1:p.Lys23_Val25del or p.K23_V25del
3.3. Insertion - An insertion refers to one or more amino acids being inserted, and follows the syntax below:
"position"_"next”ins"sequence" "position"_"next”ins"number" "position"_"next”ins*"number" where "next" = "position"+1 "number" = number of amino acids inserted For example: ...:p.His4_Gln5insAla or p.H4_Q5insA ...:p.Lys2_Gly3insGlnSerLys or p.K2_G3insQSK ...:p.Pro46_Asn47insSerSerTer or p.Pro46_Asn47insSerSer* ...:p.Arg78_Gly79ins23 (23 amino acids inserted) ...:p.Gln746_Lys747ins*63 (63 amino acids inserted with stop codon)
2.4. Deletion-insertion - A deletion-insertion refers to one or more amino acids being replaced by one or more other amino acids, and follows the syntax below:
"position”delins"sequence" "start"_"end”delins"sequence" For example: ...:p.Cys28delinsTrpVal (or p.C28delinsWV) ...:p.Glu125_Ala132delinsGlyLeuHisArgPheIleValLeu (or p.E125_A132delinsGLHRFIVL)
Note that, some medical reports are still using ">" instead of "delins" for protein deletion-insertion mutations.
For example: TP53:NM_000546:exon8:c.844_845insAGACCT:p.D281_R282>DQTW Same as: TP53:NM_000546:exon8:c.844_845insAGACCT:p.D281_R282delinsDQTW
For more information, see "Sequence Variant Nomenclature" Website at http://varnomen.hgvs.org.
Table of Contents
Molecule Names and Identifications
Nucleobase, Nucleoside, Nucleotide, DNA and RNA
Base-Pair Insertion and Deletion
Gene Mutation Inheritance Likelihood
What Is VCF (Variant Calling Format)
"vcftools" - VCF Utility Command
What Is VAF (Variant Allele Frequency)
►Gene Mutation Naming Convention
ChEMBL Database - European Molecular Biology Laboratory
PubChem Database - National Library of Medicine
INSDC (International Nucleotide Sequence Database Collaboration)
HGNC (HUGO Gene Nomenclature Committee)