RDKFingerprint() Method in RDKit

This section provides a tutorial example on understanding the fingerprint generation algorithm used in the rdkit.Chem.rdmolops.RDKFingerprint() function.

RDKFingerprint(mol) method is located in the rdkit.Chem.rdmolops module of the RDKit library. The same method is also packaged as rdkit.Chem.AllChem.RDKFingerprint() for easier access. It returns a fingerprint as a bit string generated from the Daylight fingerprint method.

Here is the definition of the method:

rdkit.Chem.rdmolops.RDKFingerprint((Mol)mol[, 
  (int)minPath=1[, 
  (int)maxPath=7[, 
  (int)fpSize=2048[, 
  (int)nBitsPerHash=2[, 
  (bool)useHs=True[, 
  (float)tgtDensity=0.0[, 
  (int)minSize=128[, 
  (bool)branchedPaths=True[, 
  (bool)useBondOrder=True[, 
  (AtomPairsParameters)atomInvariants=0[, 
  (AtomPairsParameters)fromAtoms=0[, 
  (AtomPairsParameters)atomBits=None[, 
  (AtomPairsParameters)bitInfo=None]]]]]]]]]]]]]) 
-> ExplicitBitVect

Descriptions of all arguments:

Based on my understanding, RDKFingerprint() implements the Daylight fingerprint method with several modifications. Here is the RDKit's modified version:

Now let's verify the algorithm with some simple tests.

1. Molecule with a 1 subgraph: "CC" expressed in SMILES.

from rdkit import Chem
atomBits = []
bitInfo = {}
mol = Chem.MolFromSmiles('CC')
fp = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, atomBits=atomBits, bitInfo=bitInfo)
print(fp.ToBitString())
print(fp.GetNumOnBits())
print(atomBits)
print(bitInfo)

# output:
0000000000000000000000000000100000000000000000000000000000000000
1
[[28], [28]]
{28: [[0]]}

Notes on this test:

2. Molecule with a 2 unique subgraphs: "CCC" expressed in SMILES.

atomBits = []
bitInfo = {}
mol = Chem.MolFromSmiles('CCC')
fp = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, atomBits=atomBits, bitInfo=bitInfo)
print(fp.ToBitString())
print(fp.GetNumOnBits())
print(atomBits)
print(bitInfo)

# output:
0000000000000000000001000000100000000000000000000000000000000000
2
[[28, 21], [28, 21], [28, 21]]
{21: [[0, 1]], 28: [[0], [1]]}

Notes on this test:

3. Molecule with a 3 unique subgraphs: "CCO" expressed in SMILES.

atomBits = []
bitInfo = {}
mol = Chem.MolFromSmiles('CCO')
fp = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, atomBits=atomBits, bitInfo=bitInfo)
print(fp.ToBitString())
print(fp.GetNumOnBits())
print(atomBits)
print(bitInfo)

# output:
1000000000000000000000000000100000000000000000000000000000010000
3
[[28, 0], [28, 59, 0], [59, 0]]
{0: [[0, 1]], 28: [[0]], 59: [[1]]}

Notes on this test:

As you can see, these simple tests confirm that our understanding of the RDKFingerprint() method is accurate.

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

Daylight Fingerprint Generator in RDKit

 What Is Daylight Fingerprint Generator in RDKit

RDKFingerprint() Method in RDKit

 Impact of 'useBondOrder' on RDKFingerprint()

 Impact of 'branchedPaths' on RDKFingerprint()

 Impact of 'maxPath' on RDKFingerprint()

 Impact of 'fpSize' on RDKFingerprint()

 Impact of 'tgtDensity' on RDKFingerprint()

 Impact of 'nBitsPerHash' on RDKFingerprint()

 UnfoldedRDKFingerprintCountBased() Method in RDKit

 GetRDKitFPGenerator() Method in RDKit

 Morgan Fingerprint Generator in RDKit

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

 Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB