Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
Impact of 'radius' on GetMorganFingerprint()
This section provides a tutorial example on impact of the 'radius' option on fingerprint generation with rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function.
The 'radius' option in the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() function call allows you to control the maximum radius of local substructures to be encoded into the fingerprint.
1. For example, molecule "CCC" will produce 3 identifiers with radius=0 in the fingerprint. 1 identifier for each atom node. But atom 0 and atom 2 will have the same identifier, since they have the same number of connected bonds.
from rdkit.Chem import AllChem from rdkit.DataStructs.cDataStructs import UIntSparseIntVect radius = 0 bitInfo = {} mol = AllChem.MolFromSmiles('CCC') fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo) display(UIntSparseIntVect.GetLength(fp)) display(UIntSparseIntVect.GetNonzeroElements(fp)) display(UIntSparseIntVect.GetTotalVal(fp)) print(bitInfo) # output: 4294967295 {2245384272: 1, 2246728737: 2} 3 {2245384272: ((1, 0),), 2246728737: ((0, 0), (2, 0))}
2. But the same molecule "CCC" will produce 6 identifiers with radius=1. 3 of them identifies 3 atoms with atom invariants encoded only corresponding to radius=0 output. The other 3 identifies are updated identifies from radius=0 with information from neighboring nodes with radius=1 (1 bond away from the atom).
radius = 1 bitInfo = {} mol = AllChem.MolFromSmiles('CCC') fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo) display(UIntSparseIntVect.GetLength(fp)) display(UIntSparseIntVect.GetNonzeroElements(fp)) display(UIntSparseIntVect.GetTotalVal(fp)) print(bitInfo) # output: 4294967295 {2068133184: 1, 2245384272: 1, 2246728737: 2, 3542456614: 2} 6 {2068133184: ((1, 1),), 2245384272: ((1, 0),), 2246728737: ((0, 0), (2, 0)), 3542456614: ((0, 1), (2, 1))}
We can read the output as:
As you can see, the Morgan generator keeps not only the updated identifiers from the last iteration of the Morgan algorithm, but it also keeps identifiers from intermediate iterations.
3. To try radius=2, we need to use a larger molecule "CCCC" This time, total of 9 identifiers are produced. 3 identifiers of each of 3 iterations: radius=0, radius=1, and radius=2.
radius = 2 bitInfo = {} mol = AllChem.MolFromSmiles('CCCC') fp = AllChem.GetMorganFingerprint(mol, radius, bitInfo=bitInfo) display(UIntSparseIntVect.GetLength(fp)) display(UIntSparseIntVect.GetNonzeroElements(fp)) display(UIntSparseIntVect.GetTotalVal(fp)) print(bitInfo) # output: 4294967295 {1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2} 9 {1173125914: ((1, 1), (2, 1)), 1244535424: ((1, 2),), 2245384272: ((1, 0), (2, 0)), 2246728737: ((0, 0), (3, 0)), 3542456614: ((0, 1), (3, 1))}
Conclusion: higher value of radius produces more identifiers in the final fingerprint. The total number of identifiers is exactly the number of non-Hydrogen atom nodes times radius+1.
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
►Morgan Fingerprint Generator in RDKit
What Is Morgan Fingerprint Generator in RDKit
GetMorganFingerprint() Method in RDKit
►Impact of 'radius' on GetMorganFingerprint()
Impact of 'useCounts' on GetMorganFingerprint()
Impact of 'invariants' on GetMorganFingerprint()
Impact of 'useBondTypes' on GetMorganFingerprint()
Impact of 'fromAtoms' on GetMorganFingerprint()
GetMorganFingerprintAsBitVect() Method in RDKit
Impact of 'nBits' on GetMorganFingerprintAsBitVect()
GetHashedMorganFingerprint() Method in RDKit
Impact of 'nBits' on GetHashedMorganFingerprint()
GetMorganGenerator() Method in RDKit
Morgan Fingerprint Generator in RDKit for FCFP
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction