What Is FASTA

This section provides a quick introduction of FASTA, FastA, a universal file format or representing either a nucleotide sequence or a peptide (protein) sequence, in which base pairs or amino acids are represented using single-letter codes.

What Is FASTA? - FASTA, or FastA, is a file format for representing either a nucleotide sequence or a peptide (protein) sequence, in which base pairs or amino acids are represented using single-letter codes.

FASTA file format was introduced by the FASTA software, which is a DNA and protein sequence search and alignment tool developed by by David J. Lipman and William R. Pearson in 1985.

FASTA is now become a near universal standard in the field of bioinformatics. And it is supported by every bioinformatic tools.

A FASTA file may contain multiple sequences. Each sequence starts with one line to provide an identifier and other information with a prefix of ">" as the first characher of the line. After the identifier line, the sequence data is provided in one or more lines.

As an example, the smallest protein sequence, Trp-Cage, can be written in a FASTA format file as:

>1L2Y_1|Chain A|TC5b|null
NLYIQWLKDGGPSSGRPPPS

Here is another example of FASTA files with 2 protein sequences:

>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
>1L2Y_1|Chain A|TC5b|null
NLYIQWLKDGGPSSGRPPPS

Table of Contents

 About This Book

 Introduction of Molecules

 Molecule Names and Identifications

 Molecule Mass and Weight

Protein and Amino Acid

 What Is Amino Acid

 The 20 Common Amino Acids

 Peptide, Peptide Bond, Amino Acid Residues

 What Is Protein

 Protein Structure Levels

 Alpha Helix and Beta Sheet

 Protein Visualization - Ribbon Diagram

 Composed Proteins or Protein Complexes

 wwpdb.org - Worldwide PDB (Protein Data Bank)

What Is FASTA

 Nucleobase, Nucleoside, Nucleotide, DNA and RNA

 Gene and Chromosome

 Protein Kinase (PK)

 DNA Sequencing

 Gene Mutation

 SDF (Structure Data File)

 PyMol Installation

 PyMol GUI and CLI

 PyMol Selections

 PyMol Editing Functions

 PyMol Measurement Functions

 PyMol Movie Functions

 PyMol Python Integration

 PyMol Object Functions

 ChEMBL Database - European Molecular Biology Laboratory

 PubChem Database - National Library of Medicine

 PDB (Protein Data Bank)

 INSDC (International Nucleotide Sequence Database Collaboration)

 HGNC (HUGO Gene Nomenclature Committee)

 Relocated Tutorials

 Resources and Tools

 Molecule Related Terminologies

 References

 Full Version in PDF/EPUB