Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
GetMorganGenerator() Method in RDKit
This section provides a quick introduction on the rdkit.Chem.rdFingerprintGenerator.GetMorganGenerator() Method in the RDKit library.
GetMorganGenerator() method is located in the rdkit.Chem.rdFingerprintGenerator module of the RDKit library. It creates a Morgan fingerprint generator that uses the Morgan algorithm to update identifiers on each atom nodes based its local substructures at different radiuses.
Here is the definition of the method:
rdkit.Chem.rdFingerprintGenerator.GetMorganGenerator([(int)radius=3 [, (bool)countSimulation=False [, (bool)includeChirality=False [, (bool)useBondTypes=True [, (bool)onlyNonzeroInvariants=False [, (bool)includeRingMembership=True [, (AtomPairsParameters)countBounds=None [, (int)fpSize=2048 [, (AtomPairsParameters)atomInvariantsGenerator=None [, (AtomPairsParameters)bondInvariantsGenerator=None [, (AtomPairsParameters)useCountSimulation=None]]]]]]]]]]]) -> FingerprintGenerator
Descriptions of method arguments are:
Once a FingerprintGenerator object is created, you can call its instance methods to generate different types of Morgan fingerprints of a given molecule.
The above methods are actually providing same functionalities as Morgan fingerprint generation methods offered in the rdkit.Chem.rdMolDescriptors module as shown in the following tutorials.
1. GetMorganGenerator().GetFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect() method. For example:
from rdkit.Chem import AllChem from rdkit.Chem import rdFingerprintGenerator from rdkit.DataStructs import cDataStructs mol = AllChem.MolFromSmiles('CCCC') gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64) fp = gen.GetFingerprint(mol) display(fp.ToBitString()) fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=64) display(fp.ToBitString()) # output: '1000000000000000100000000010000001000010000000000000000000000000' '1000000000000000100000000010000001000010000000000000000000000000'
2. GetMorganGenerator().GetCountFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetHashedMorganFingerprint() method. For example:
mol = AllChem.MolFromSmiles('CCCC') gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64) fp = gen.GetCountFingerprint(mol) display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp)) fp = AllChem.GetHashedMorganFingerprint(mol, 2, nBits=64) display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp)) # output: {0: 1, 16: 2, 26: 2, 33: 2, 38: 2} {0: 1, 16: 2, 26: 2, 33: 2, 38: 2}
3. GetMorganGenerator().GetSparseFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() method, except that GetSparseFingerprint() returns identifiers in a SparseBitVect object. For example:
mol = AllChem.MolFromSmiles('CCCC') gen = rdFingerprintGenerator.GetMorganGenerator(radius=2) fp = gen.GetSparseFingerprint(mol) display(fp) display(fp.GetNumBits()) display(fp.GetNumOnBits()) display(list(fp.GetOnBits())) fp = AllChem.GetMorganFingerprint(mol, 2) display(fp) display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp)) # output: <rdkit.DataStructs.cDataStructs.SparseBitVect at 0x7fcbd6275ca0> 4294967295 5 [-2049583024, -2048238559, -752510682, 1173125914, 1244535424] <rdkit.DataStructs.cDataStructs.UIntSparseIntVect at 0x7fcbd6275ee0> {1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}
4. GetMorganGenerator().GetSparseCountFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() method. For example:
mol = AllChem.MolFromSmiles('CCCC') gen = rdFingerprintGenerator.GetMorganGenerator(radius=2) fp = gen.GetSparseCountFingerprint(mol) display(cDataStructs.ULongSparseIntVect.GetNonzeroElements(fp)) fp = AllChem.GetMorganFingerprint(mol, 2) display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp)) # output: {1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2} {1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
►Morgan Fingerprint Generator in RDKit
What Is Morgan Fingerprint Generator in RDKit
GetMorganFingerprint() Method in RDKit
Impact of 'radius' on GetMorganFingerprint()
Impact of 'useCounts' on GetMorganFingerprint()
Impact of 'invariants' on GetMorganFingerprint()
Impact of 'useBondTypes' on GetMorganFingerprint()
Impact of 'fromAtoms' on GetMorganFingerprint()
GetMorganFingerprintAsBitVect() Method in RDKit
Impact of 'nBits' on GetMorganFingerprintAsBitVect()
GetHashedMorganFingerprint() Method in RDKit
Impact of 'nBits' on GetHashedMorganFingerprint()
►GetMorganGenerator() Method in RDKit
Morgan Fingerprint Generator in RDKit for FCFP
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction