Cheminformatics Related Terminologies

This section provides a list of commonly used terminologies related to molecule study.

Commonly used terminologies related to molecule study:

Assay - An assay is an analysis done to determine the biological or pharmacological potency of a chemical compound or a drug.

Atomic Bond - An Atomic Bond is a physical process that ties atoms together to form a molecule.

Binding Affinity - A numeric value to indicate the binding strength of a protein-ligand interaction (also called DTI for Drug-Target Interaction). Binding Affinity is usually reported in as Kd, Ki, IC50, or EC50.

CAS (Chemical Abstracts Service) Number - A CAS Number is a unique numerical identifier assigned by the CAS to every chemical substance.

ChEMBL - ChEMBL is a manually curated database of bioactive molecules with drug-like properties provided by EMBL (European Molecular Biology Laboratory).

Chime Plug-in - Chime Plug-in is a free Web browser plug-in code that allows you to visualize molecule structures interactively in 3 dimensions.

Compound or Substance - A compound in molecule study usually refers to a substance composed of a single type of molecules.

CTAB (Connection Table) - A CTAB is a text format that describes the structural relationships and properties of a collection of atoms.

CTfile (Chemical Table File) - A CTfile refers to a set of 6 file formats: Molfile, RGfile, rxnfile, SDfile, RDfile, and XDfile. All CTfile formats use the CTAB format as the base component.

DTI (Drug-Target Interaction)

DTBA (Drug Target Binding Affinity)

EC50 (50% Effective Concentration) - EC50 is the concentration of a drug that is required to get 50% of its maximal inhibition (divid the maximum inhibition by 2). EC50 should be expressed like 0.0000123% concentration. In an ideal situation, if the biological function responds to the drug concentration linearly, EC50 is identical to IC50.

IC50 (50% Inhibitory Concentration) - IC50 is the concentration of a drug that is required for 50% inhibition of a biological function (half way between the maximum inhibition minimum inhibition). IC50 should be expressed like 0.0000123% concentration. In an ideal situation, if the biological function responds to the drug concentration linearly, IC50 is identical to EC50.

InChI (International Chemical Identifier) - InChI is a text label that denotes a chemical substance, developed under the auspices of IUPAC (International Union of Pure and Applied Chemistry).

Inhibitor - Inhibitor is a substance that slows down or prevents a particular chemical reaction or other process, or that reduces the activity of a particular reactant, catalyst, or enzyme.

Jmol - Jmol is a free and open-source software written in Java for visualization of molecule structures interactively in 3 dimensions as a standalone tool or a Web browser plugin.

Kd (Dissociation Constant) - Kd describes the drug-receptor interactions at equilibrium and defined as Kd = koff / kon, where and koff is the rate of dissociation and kon is the rate of association. Kd is reported in units of concentration like nM (nanomolar).

Ki (Inhibition Constant) - Ki describes the drug-receptor interactions at equilibrium. It is defined in the same way as Kd, but measuring. inhibition kinetics instead of association.

MDL (Molecular Design Limited) - MDL was a computer-aided drug design firm who developed the SDF file format. MDL was acquired by Symyx Technologies in 2007.

Molecule - A Molecule is a group of two or more atoms that form the smallest identifiable unit of a pure substance.

Molecule Formula - A Molecule Formula is an expression of element symbols and numbers of atoms of same element types that bonded together to form the molecule.

Molecule Structure - A Molecule Structure is a graphical model of a molecule on how its atoms are bonded together, with information about atom types/sizes/locations and bound types.

Molfile (Molecule File) - A Molfile Describes a single molecular structure which can contain disjoint fragments. A Molfile is a 3-line header and CTAB structure.

Open Babel - Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

pKd - pKd is a binding affinity indicator by taking Kd into the log10 space. pKd is defined as pKd = -log10(Kd in molar). Here are some useful conversion formulars:

pKd = -log10(Kd in molar, or M)
pKd = -log10((Kd in millimolar, or mM)/103)
pKd = -log10((Kd in micromolar, or uM)/106)
pKd = -log10((Kd in nanomolar, or nM)/109)

pKd = 3 == Kd = 1 mM
pKd = 6 == Kd = 1 uM
pKd = 6 == Kd = 1000 nM
pKd = 9 == Kd = 1 nM

pKi - pKi is a binding affinity indicator by taking the Ki into the log10 space. pKi is defined as pKi = -log10(Ki in molar).

PubChem - PubChem is a database of chemical molecules and their activities against biological assays provided by the National Library of Medicine of US Government.

PyMol - PyMol is a powerful molecule visualization software written by Warran Delano in Python. PyMol is able to produce high-quality graphics and movies.

RDfile (RDF - Reaction Data File) - A RDfile or RDF Contains multiple reactions as well as molecules, together with their associated data.

Receptor - Receptor is a region of tissue, or a molecule in a cell membrane, that responds specifically to a particular neurotransmitter, hormone, antigen, or other substance.

RGfile (Rgroup File) - A RGfile Describes a single molecular query with Rgroups (Relationship Groups). Each RGfile is a combination of CTABs defining the root molecule and each member of each Rgroup in the query.

rxnfile (Reaction File) - A rxnfile (Reaction File) Contains the structural information for the reactants and products of a single reaction.

SAR (Structure Activity Relationship) - SAR is a study on the relationship between molecular structures and their biological activities. Because similar molecular structures may have similar physical and biological properties, SAR can be used to predict biological activity of a new molecule structure based on known biological properties of similar molecular structures.

SDfile (SDF - Structure Data File) - A SDfile or SDF Contains multiple structures and data for any number of molecules. A SDfile uses a Molfile and a custom data block to form a molecule structure. And multiple molecule structures are concatenated to support multiple molecules.

SMILES (Simplified Molecular-Input Line-Entry System) - SMILES is a specification in the form of a line notation for describing molecule structures using short ASCII strings.

Skeletal Formula - A Skeletal Formula is graphical representation of a molecule structure with line segments to represent atomic bonds, intersections/ends of line segments to represent atoms, and element symbols to identify atom types.

Structure - Structure is a graphical model of a molecule on how its atoms are bonded together, with information about atom types/sizes/locations and bound types.

Substance or Compound - A substance in molecule study usually refers to a substance composed of a single type of molecules.

Substructure - Substructure is a subgraph of the graph that represents a molecule structure.

SVG (Scalable Vector Graphics) - SVG is an XML-based markup language for describing two dimensional based vector graphics. Using SVG to store molecule Skeletal Formula is better than other graphic file formats because of its scalability.

Drug Target - A drug target is a protein in human body that a drug tries to interact, either boosting or inhibiting its function, which will control or cure the related disease.

Wedge and Dash Projections - Wedge and Dash Projections are special graphical lines to represent bonds in Skeletal Formula that are not in the viewing plane.

XDfile (XML Data File) - An XDfile is the XML based data format for transferring recordsets of structure or reaction information with associated data.

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

 Daylight Fingerprint Generator in RDKit

 Morgan Fingerprint Generator in RDKit

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB