Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
Substructure Library in Binary and SMILES Formats
This section provides a quick introduction on building molecule libraries in binary and SMILES formats to reduce memory usage. Molecule libraries can be serialized to external files, so they can be restored quickly later.
By default, rdkit.Chem.rdSubstructLibrary.SubstructLibrary stores each molecule as an object. This requires a large amount of memory if you are adding a large number of molecules into the library.
RDKit offers 3 other formats to store molecules in the library to reduce memory usage:
1. Binary Format - Using rdkit.Chem.rdSubstructLibrary.CachedMolHolder to store molecules in binary format.
2. SMILES Format - Using rdkit.Chem.rdSubstructLibrary.CachedSmilesMolHolder to store molecules in SMILES format.
3. Trusted SMILES Format - Using rdkit.Chem.rdSubstructLibrary.CachedTrustedSmilesMolHolder to store molecules in Trusted SMILES format.
To use a specific library format, you need to instantiate the library object with a new format object as shown below:
from rdkit.Chem import rdSubstructLibrary # Creating an empty library in binary format f = rdSubstructLibrary.CachedMolHolder() l = rdSubstructLibrary.SubstructLibrary(f) # Creating an empty library in SMILES format f = rdSubstructLibrary.CachedSmilesMolHolder() l = rdSubstructLibrary.SubstructLibrary(f) # Creating an empty library in Trusted SMILES format f = rdSubstructLibrary.CachedTrustedSmilesMolHolder() l = rdSubstructLibrary.SubstructLibrary(f)
After finishing building a molecule library, you may want save it to a file and reuse it later to avoid spending time to build the same library again and again. RDKit offers two methods to write and read molecule libraries with external files.
l.ToStream(open(f, 'w')) - Writes the molecule library to a given file.
l.InitFromStream(open(f, 'rb')) - Reads the molecule library from a given file.
The short example below builds a molecule library in Trusted SMILES Format and writes it an external file.
herong$ more rdkit_write_library.py from rdkit import Chem ms = ["O=C(O)c2cc1ccc(Cl)n1cn2", "O=C(O)c2ccn1c(Cl)ccc1n2"] ms = list(map(Chem.MolFromSmiles, ms)) from rdkit.Chem import rdSubstructLibrary f = rdSubstructLibrary.CachedTrustedSmilesMolHolder() l = rdSubstructLibrary.SubstructLibrary(f) for m in ms: l.AddMol(m) l.ToStream(open('rdkit.sslib', 'w')) herong$ python rdkit_write_library.py herong$ more rdkit.sslib 22 serialization::archive 18 0 1 0 1 4 1 1 0 0 0 0 0 2 0 23 O=C(O)c1cc2ccc(Cl)n2cn1 23 O=C(O)c1ccn2c(Cl)ccc2n1 0 1 -1
Now you can use the next example to read the molecule library from a file and do a substructure search. The output is the same as the previous tutorial.
herong$ more rdkit_search_library.py from rdkit import Chem s = Chem.MolFromSmiles('c1nccc2n1ccc2') from rdkit.Chem import rdSubstructLibrary l = rdSubstructLibrary.SubstructLibrary() l.InitFromStream(open('rdkit.sslib', 'rb')) ms = l.GetMatches(s) ms = [l.GetMol(m) for m in ms] i = Chem.Draw.MolsToGridImage(ms) display(i) herong$ python rdkit_search_library.py
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
►Molecule Substructure Search with RDKit
RDKit m.HasSubstructMatch(s) - Substructure Match
RDKit GenerateDepictionMatching2DStructure(m, s) - Substructure Orientation
RDKit rdMolDraw2D.PrepareAndDrawMolecule - Substructure Highlight
RDKit Substructure Search with SMARTS
rdkit.Chem.rdFMCS - Maximum Common Substructure
rdkit.Chem.rdSubstructLibrary - Substructure Library
►Substructure Library in Binary and SMILES Formats
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
Morgan Fingerprint Generator in RDKit
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction