Gene Mutation Naming Convention

Provides a quick introduction of gene mutation naming convention, also referred as 'nomenclature system for human gene mutations' based on recommendations proposed by 'Human Genome Variation Society (hgvs.org)'.

What Is Mutation Naming Convention? - Mutation naming convention, also referred as "nomenclature system for human gene mutations", is a notation system that describes gene mutations.

Currently, "Human Genome Variation Society (hgvs.org)" is the leading organization to coordinate recommendations on mutation naming convention.

Some basic recommendations are listed below in different categories:

1. General recommendations -

1.1. Reference sequence - A reference sequence ID followed by a colon (:) should be provided to identify the original gene sequence. For example, NM_000546 is the accession number for gene TP53 and can be used as the reference sequence ID.

1.2. Letter prefix - A prefix letter followed by a dot (.) is required to indicate the type of the sequence to be described. Examples are:

“c.” for a coding DNA reference sequence
“g.” for a linear genomic reference sequence
“r.” for an RNA reference sequence (transcript)
“p.” for a protein reference sequence

2. Coding DNA and linear genomic recommendations -

2.1. Substitution - A substitution refers to one nucleotide being replaced by one other nucleotide, and follows the syntax below:

"position”“original”>”new”

For example:
  NC_000023.10:g.33038255C>A

2.2. Deletion - A deletion refers to one or more nucleotides being deleted, and follows the syntax below:

"position”del
"start"_"end”del

For example:
  NG_012232.1:g.19del
  NG_012232.1:g.19_21del
  LRG_199t1:c.720_991del

2.3. Insertion - An insertion refers to one or more nucleotides being inserted, and follows the syntax below:

"position"_"next”ins"sequence"
"position"_"next”ins("number")
  where "next" = "position"+1
    "number" = number of nucleotides inserted

For example:
  NC_000023.10:g.32862923_32862924insCCT
  LRG_199t1:c.240_241insAGG

2.4. Inversion - An inversion refers to multiple nucleotides being reversed, and follows the syntax below:

"start"_"end”inv

For example:
  NC_000023.10:g.32361330_32361333inv
  NM_004006.2:c.5657_5660inv

2.5. Deletion-insertion - A deletion-insertion refers to one or more nucleotides being replaced by one or more other nucleotides, and follows the syntax below:

"position”delins"sequence"
"start"_"end”delins"sequence"

For example:
  NC_000023.11:g.32386323delinsGA
  NM_004006.2:c.6775_6777delinsC
  LRG_199t1:c.145_147delinsTGG

3. Protein recommendations - Both 3-letter and 1-letter amino acid codes can be used. "Ter" or "*" is as amino acid code to represent the stop codon. "=" can be used to represent the oringal amino acid code to indicate a silent mutation.

3.1. Substitution - A substitution refers to one amino acid being replaced by one other amino acid, and follows the syntax below:

“original”"position””new”
  Where "original" and "new" can be 3-letter or 1-letter amino acid code

For example:
  NP_003997.1:p.Trp24Cys or NP_003997.1:p.W24C
  LRG_199p1:p.Trp24Ter, or p.Trp24*, or p.W24*
  NP_003997.1:p.Cys188Cys, or Cys188=, or C188C, or C188=

3.2. Deletion - A deletion refers to one or more amino acids being deleted, and follows the syntax below:

"position”del
"start"_"end”del

For example:
  LRG_199p1:p.Val7del or p.V7del
  NP_003997.1:p.Lys23_Val25del or p.K23_V25del

3.3. Insertion - An insertion refers to one or more amino acids being inserted, and follows the syntax below:

"position"_"next”ins"sequence"
"position"_"next”ins"number"
"position"_"next”ins*"number"
  where "next" = "position"+1
    "number" = number of amino acids inserted

For example:
  ...:p.His4_Gln5insAla or p.H4_Q5insA
  ...:p.Lys2_Gly3insGlnSerLys or p.K2_G3insQSK
  ...:p.Pro46_Asn47insSerSerTer or p.Pro46_Asn47insSerSer*
  ...:p.Arg78_Gly79ins23 (23 amino acids inserted)
  ...:p.Gln746_Lys747ins*63 (63 amino acids inserted with stop codon)

2.4. Deletion-insertion - A deletion-insertion refers to one or more amino acids being replaced by one or more other amino acids, and follows the syntax below:

"position”delins"sequence"
"start"_"end”delins"sequence"

For example:
  ...:p.Cys28delinsTrpVal (or p.C28delinsWV)
  ...:p.Glu125_Ala132delinsGlyLeuHisArgPheIleValLeu
    (or p.E125_A132delinsGLHRFIVL)

Note that, some medical reports are still using ">" instead of "delins" for protein deletion-insertion mutations.

For example: 
  TP53:NM_000546:exon8:c.844_845insAGACCT:p.D281_R282>DQTW

Same as:
  TP53:NM_000546:exon8:c.844_845insAGACCT:p.D281_R282delinsDQTW

For more information, see "Sequence Variant Nomenclature" Website at http://varnomen.hgvs.org.

Table of Contents

 About This Book

 Introduction of Molecules

 Molecule Names and Identifications

 Molecule Mass and Weight

 Protein and Amino Acid

 Nucleobase, Nucleoside, Nucleotide, DNA and RNA

 Gene and Chromosome

 Protein Kinase (PK)

 DNA Sequencing

Gene Mutation

 What Is Gene Mutation

 What Is Point Mutation

 Base-Pair Insertion and Deletion

 Gene Mutation Inheritance Likelihood

 Types of Genetic Testing

 Mutation Detection with NGS

 What Is Allele Frequency

 What Is VCF (Variant Calling Format)

 "vcftools" - VCF Utility Command

 What Is VAF (Variant Allele Frequency)

Gene Mutation Naming Convention

 Gene Mutation Test Report

 What Is ctDNA Testing

 Sanger Sequencing Test Report

 SDF (Structure Data File)

 PyMol Installation

 PyMol GUI and CLI

 PyMol Selections

 PyMol Editing Functions

 PyMol Measurement Functions

 PyMol Movie Functions

 PyMol Python Integration

 PyMol Object Functions

 ChEMBL Database - European Molecular Biology Laboratory

 PubChem Database - National Library of Medicine

 PDB (Protein Data Bank)

 INSDC (International Nucleotide Sequence Database Collaboration)

 HGNC (HUGO Gene Nomenclature Committee)

 Relocated Tutorials

 Resources and Tools

 Molecule Related Terminologies

 References

 Full Version in PDF/EPUB