Cheminformatics Tutorials - Herong's Tutorial Examples
∟SMILES (Simplified Molecular-Input Line-Entry System)
This chapter provides introductions and tutorial examples on SMILES (Simplified Molecular-Input Line-Entry System). Topics include SMILES representations for atoms, bonds, rings, disconnected structures, charges, directional bonds, isotopes, chiral centers.
What Is SMILES
What Is Canonical SMILES
Atom Represenations in SMILES
Bond Represenations in SMILES
Branch Represenations in SMILES
Ring Represenations in SMILES
Disconnected Structures in SMILES
Charge Represenations in SMILES
Isotope Represenations in SMILES
Directional Bonds in SMILES
Tetrahedral Centers in SMILES
Chirality Representations in SMILES
Hydrogen Representations in SMILES
Takeaways:
- SMILES (Simplified Molecular-Input Line-Entry System)
is a specification in the form of a line notation for describing
molecule structures using short ASCII strings.
- Canonical SMILES is a special version of SMILES where each SMILES string
uniquely identifies a single molecule structure.
- Each non-hydrogen atom is represented by its atomic symbol followed
by H or Hn for bonded hydrogens in square brackets [].
- Organic atoms, B, C, N, O, P, S, F, Cl, Br, and I,
may be represented without brackets if the number of bonded hydrogens conforms
to the lowest normal valence consistent with explicit bonds.
- Single, double, triple, and aromatic bonds are represented
by the symbols -, =, #, and :, respectively.
- Single bonds may be omitted.
- Aromatic bonds may be omitted, if atoms are represented by lower case letters.
- An extra bond and its connected branch of atoms
is represented in round brackets (...) inserted after the bonding atom
and before any existing bonds.
- A ring is represented by appending a numeric digit to the starting atom of a ring and close it with the numeric digit as the start atom.
- Disconnected structures are separated by the '.' symbol.
- Positive/negative charges are represented by repeating '+'/'-' symbols or '+n'/'-n'.
- Isotopes are represented by prefixing the mass count of the atom
in square brackets [].
- Directional bonds, '/' and '\', are used to identify different configurations
on additional atoms connected a pair of double-bond connected atoms.
- '@' is appended to a tetrahedral center atom in square brackets []
to indicate that the next 3 branches are in anticlockwise order
when looking at them from the preceding bond direction.
- Different types of chirality are represented in as different
chiral class codes in an format of [x@ccn].
- Hydrogens are represented explicitly as atoms 'O([H])[H]', explicitly as counts '[OH2]', or implicitly like 'O'.
Table of Contents
About This Book
►SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
Morgan Fingerprint Generator in RDKit
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction
Resources and Tools
Cheminformatics Related Terminologies
References
Full Version in PDF/EPUB