Impact of 'radius' on GetMorganFingerprint()

This section provides a tutorial example on impact of the 'radius' option on fingerprint generation with rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function.

The 'radius' option in the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function call allows you to control the maximum radius of local substructures to be encoded into the fingerprint.

1. For example, molecule "CCC" will produce 3 identifiers with radius=0 in the fingerprint. 1 identifier for each atom node. But atom 0 and atom 2 will have the same identifier, since they have the same number of connected bonds.

from rdkit.Chem import AllChem
from rdkit.DataStructs.cDataStructs import UIntSparseIntVect
radius = 0
bitInfo = {}
mol = AllChem.MolFromSmiles('CCC')
fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo)
display(UIntSparseIntVect.GetLength(fp))
display(UIntSparseIntVect.GetNonzeroElements(fp))
display(UIntSparseIntVect.GetTotalVal(fp))
print(bitInfo)

# output: 
4294967295
{2245384272: 1, 2246728737: 2}
3
{2245384272: ((1, 0),), 2246728737: ((0, 0), (2, 0))}

2. But the same molecule "CCC" will produce 6 identifiers with radius=1. 3 of them identifies 3 atoms with atom invariants encoded only corresponding to radius=0 output. The other 3 identifies are updated identifies from radius=0 with information from neighboring nodes with radius=1 (1 bond away from the atom).

radius = 1
bitInfo = {}
mol = AllChem.MolFromSmiles('CCC')
fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo)
display(UIntSparseIntVect.GetLength(fp))
display(UIntSparseIntVect.GetNonzeroElements(fp))
display(UIntSparseIntVect.GetTotalVal(fp))
print(bitInfo)

# output:
4294967295
{2068133184: 1, 2245384272: 1, 2246728737: 2, 3542456614: 2}
6
{2068133184: ((1, 1),), 
 2245384272: ((1, 0),), 
 2246728737: ((0, 0), (2, 0)), 
 3542456614: ((0, 1), (2, 1))}

We can read the output as:

As you can see, the Morgan generator keeps not only the updated identifiers from the last iteration of the Morgan algorithm, but it also keeps identifiers from intermediate iterations.

3. To try radius=2, we need to use a larger molecule "CCCC" This time, total of 9 identifiers are produced. 3 identifiers of each of 3 iterations: radius=0, radius=1, and radius=2.

radius = 2
bitInfo = {}
mol = AllChem.MolFromSmiles('CCCC')
fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo)
display(UIntSparseIntVect.GetLength(fp))
display(UIntSparseIntVect.GetNonzeroElements(fp))
display(UIntSparseIntVect.GetTotalVal(fp))
print(bitInfo)

# output: 
4294967295
{1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}
9
{1173125914: ((1, 1), (2, 1)), 
 1244535424: ((1, 2),), 
 2245384272: ((1, 0), (2, 0)), 
 2246728737: ((0, 0), (3, 0)), 
 3542456614: ((0, 1), (3, 1))}

Conclusion: higher value of radius produces more identifiers in the final fingerprint. The total number of identifiers is exactly the number of non-Hydrogen atom nodes times radius+1.

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

 Daylight Fingerprint Generator in RDKit

Morgan Fingerprint Generator in RDKit

 What Is Morgan Fingerprint Generator in RDKit

 GetMorganFingerprint() Method in RDKit

Impact of 'radius' on GetMorganFingerprint()

 Impact of 'useCounts' on GetMorganFingerprint()

 Impact of 'invariants' on GetMorganFingerprint()

 Impact of 'useBondTypes' on GetMorganFingerprint()

 Impact of 'fromAtoms' on GetMorganFingerprint()

 GetMorganFingerprintAsBitVect() Method in RDKit

 Impact of 'nBits' on GetMorganFingerprintAsBitVect()

 GetHashedMorganFingerprint() Method in RDKit

 Impact of 'nBits' on GetHashedMorganFingerprint()

 GetMorganGenerator() Method in RDKit

 Morgan Fingerprint Generator in RDKit for FCFP

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

 Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB