Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
UnfoldedRDKFingerprintCountBased() Method in RDKit
This section provides a tutorial example on understanding the fingerprint generation algorithm used in the rdkit.Chem.rdmolops.UnfoldedRDKFingerprintCountBased() function.
UnfoldedRDKFingerprintCountBased(mol) method is located in the rdkit.Chem.rdmolops module of the RDKit library. The same method is also packaged as rdkit.Chem.AllChem.UnfoldedRDKFingerprintCountBased() for easier access.
UnfoldedRDKFingerprintCountBased() method provides the same functionality as the RDKFingerprint() method, except that it return the fingerprint as a collection of integer identifiers hash values calculated from subgraphs, without folding them into a bit string. Counts of duplicated subgraphs are also included in the fingerprint.
Here is the definition of the method:
rdkit.Chem.rdmolops.UnfoldedRDKFingerprintCountBased((Mol)mol [, (int)minPath=1 [, (int)maxPath=7 [, (bool)useHs=True [, (bool)branchedPaths=True [, (bool)useBondOrder=True [, (AtomPairsParameters)atomInvariants=0 [, (AtomPairsParameters)fromAtoms=0 [, (AtomPairsParameters)atomBits=None [, (AtomPairsParameters)bitInfo=None]]]]]]]]]) -> ULongSparseIntVect
Descriptions of all arguments:
Now let's verify the algorithm with some simple tests.
1. Molecule with a 1 subgraph: "CC" expressed in SMILES.
from rdkit.Chem import AllChem from rdkit.DataStructs import cDataStructs mol = AllChem.MolFromSmiles('CC') atomBits = [] bitInfo = {} fp = AllChem.UnfoldedRDKFingerprintCountBased(mol, atomBits=atomBits, bitInfo=bitInfo)) display(cDataStructs.ULongSparseIntVect.GetNonzeroElements(fp) print(atomBits) print(bitInfo) # output: {4275705116: 1} [[4275705116], [4275705116]] {4275705116: [[0]]}
Notes on this test:
2. Molecule with a 3 subgraphs: "CCC" expressed in SMILES.
mol = AllChem.MolFromSmiles('CCC') atomBits = [] bitInfo = {} fp = AllChem.UnfoldedRDKFingerprintCountBased(mol, atomBits=atomBits, bitInfo=bitInfo) display(cDataStructs.ULongSparseIntVect.GetNonzeroElements(fp)) print(atomBits) print(bitInfo) # output: {1940446997: 1, 4275705116: 2} [[4275705116, 1940446997], [4275705116, 1940446997], [4275705116, 1940446997]] {1940446997: [[0, 1]], 4275705116: [[0], [1]]}
Notes on this test:
As you can see, these simple tests confirm that our understanding of the UnfoldedRDKFingerprintCountBased() method is accurate.
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
►Daylight Fingerprint Generator in RDKit
What Is Daylight Fingerprint Generator in RDKit
RDKFingerprint() Method in RDKit
Impact of 'useBondOrder' on RDKFingerprint()
Impact of 'branchedPaths' on RDKFingerprint()
Impact of 'maxPath' on RDKFingerprint()
Impact of 'fpSize' on RDKFingerprint()
Impact of 'tgtDensity' on RDKFingerprint()
Impact of 'nBitsPerHash' on RDKFingerprint()
►UnfoldedRDKFingerprintCountBased() Method in RDKit
GetRDKitFPGenerator() Method in RDKit
Morgan Fingerprint Generator in RDKit
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction