Molecule Tutorials - Herong's Tutorial Examples - v1.26, by Herong Yang
What Is NGS (Next-Generation Sequencing)
Provides a quick introduction of NGS (Next-Generation Sequencing), which randomly breaks patient's sample into millions of DNA fragments, reads fragments as nucleotide strings, then digitally align them to a reference genome sequence to construct patient's genome sequence.
What Is NGS (Next-Generation Sequencing)? - NGS (Next-Generation Sequencing) is a genomic testing technology that can be used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA.
Here is the main steps of NGS process:
1. Library Preparation — A DNA library is prepared from a patient's sample cells, which are randomly broken into a large amount (in millions) of DNA fragments. Amplification, purification, and other treatments are performed increases the efficiency of the library preparation process.
2. Sequencing — DNA fragments from the library are loaded onto a flow cell and placed on the sequencer. Then the SBS (Sequencing By Synthesis) process is performed to read the nucleotide string of each DNA fragment.
During the SBS process, chemically modified nucleotides bind to the DNA template strand through natural complementarity. Each nucleotide contains a fluorescent tag and a reversible terminator that blocks incorporation of the next base. The fluorescent signal indicates which nucleotide has been added, and the terminator is cleaved so the next base can bind.
The SBS process can be viewed as a DNA fragment reader. It reads nucleotides from a fragment and records their nucleotide letters sequentially. Each recorded nucleotide letter string is called a "read".
Sometimes, the same DNA fragment is read twice, forward and backward, recording 2 reads, a forward read and backward read. This should improve the overall quality of the NGS process.
The diagram (source: illumina.com) below shows the sequencing step of NGS:
2. Data Analysis — Reads (nucleotide letter strings) generated from the previous step are then aligned to a reference genome using a computer algorithm. When a read is aligned to a section of the reference genome, each nucleotide letter in the read is recored as a "hit" of the letter to the aligned position of the reference genome.
After all reads (in millions) are aligned, recorded hits on all positions in the reference genome form a hit distribution, which is then used to construct the genome sequence of the patient.
The diagram (source: illumina.com) below highlights a hit distribution of a given position: C, C, T, C, C, C, C, with C at 6/7, T at 1/7. So the constructed genome sequence should have C at this position. T with 1/7 can be discarded as process error. The result shows a variant (mutation) of the highlighted position from T to C comparing to the reference sequence.
Table of Contents
Molecule Names and Identifications
Nucleobase, Nucleoside, Nucleotide, DNA and RNA
What Is PCR (Polymerase Chain Reaction)
What Is Sanger Sequencing Method
►What Is NGS (Next-Generation Sequencing)
ChEMBL Database - European Molecular Biology Laboratory
PubChem Database - National Library of Medicine
INSDC (International Nucleotide Sequence Database Collaboration)
HGNC (HUGO Gene Nomenclature Committee)