SDF Format Specification

This section provides a summary of the SDF (Structure Data File) format specification.

Here is summary of SDF file format specifications:

1. Text File - A SDF in a text file to store multiple molecule structures.

2. Structure Separator Line ($$$$) - Each molecule structure is separated from others by the "$$$$" line.

3. Structure Header (3 lines) - Each molecule structure starts with a 3-line header. The first line should provides an ID for the structure. The second line should provides the source of the structure. The third line can be used for comments. Below is a 3-line SDF header example:

HY-001
herongyang.com
123456789012345678901234567890123456789012345678901234567890

4. The Counts Line - The 4th line of each structure after the 3-line header is the counts line, which provides 12 counts including atom counts, bond counts. etc. The length of each count is fixed to 3 characters except the last one, which is 6 characters. The first count specifies the number of atoms. The second count specifies the number of bonds. Below is a counts line example that says 13 atoms and 14 bonds:

 13 14  0  0  0  0  0  0  0  0  0     0

5. The Atom Block - Following the count line is the atom block with one atom per line. Each atom line starts with x, y and z coordinates taking 10 characters per coordinate. Coordinates are followed by a space and the atom's element type, which takes 3 characters. Additional atom properties can specified after the element type. Below is an atom line example specifying the location of a "N" atom:

    0.8400   -0.1600    0.0000 N   0  0     0  0  0  0  0  0

5. The Bond Block - Following the atom block is the bond block with one bond per line. Each bond line starts with 2 atom indexes for the bond. Atom indexes are followed by bond type, stereoscopy and other properties. Each value in the bond line takes 3 characters. Below is a bond line example specifying a single and non-stereo bond between atom #2 and #1:

  2  1  1  0  2  0  0

6. The Properties Block - Following the bond block is the properties block with one property per line. Each property line starts with "M xxx", where "xxx" is the property ID. "M END" indicates the end of the properties block. Below is a property line example saying "add a charge to atom #1 of +2".

M  CHG  1   1   2

7. Custom Fields - After the properties block, multiple custom fields can be specified with multiple lines per field. The first line identifies the field name in the form of "> <name>". The second line and more lines specifies the field value. The last line ends the field with an empty line.

A good explanation of the SDF file format is given by Nonlinear Dynamics at nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx.

A more detailed description of the SDF file format is given by Accelrys Software Inc. at http://download.accelrys.com/freeware/ctfile-formats/

Table of Contents

 About This Book

 Introduction of Molecules

 Molecule Names and Identifications

 Molecule Mass and Weight

 Protein and Amino Acid

 Nucleobase, Nucleoside, Nucleotide, DNA and RNA

 Gene and Chromosome

 Protein Kinase (PK)

 DNA Sequencing

 Gene Mutation

SDF (Structure Data File)

 What Is SDF (Structure Data File)

SDF Format Specification

 What Are CTfile and CTAB

 Convert SDF to SVG using Open Babel

 "sdf2svg" - PHP Script to Convert SDF to SVG

 PyMol Installation

 PyMol GUI and CLI

 PyMol Selections

 PyMol Editing Functions

 PyMol Measurement Functions

 PyMol Movie Functions

 PyMol Python Integration

 PyMol Object Functions

 ChEMBL Database - European Molecular Biology Laboratory

 PubChem Database - National Library of Medicine

 PDB (Protein Data Bank)

 INSDC (International Nucleotide Sequence Database Collaboration)

 HGNC (HUGO Gene Nomenclature Committee)

 Relocated Tutorials

 Resources and Tools

 Molecule Related Terminologies

 References

 Full Version in PDF/EPUB