GetMorganGenerator() Method in RDKit

This section provides a quick introduction on the rdkit.Chem.rdFingerprintGenerator.GetMorganGenerator() Method in the RDKit library.

GetMorganGenerator() method is located in the rdkit.Chem.rdFingerprintGenerator module of the RDKit library. It creates a Morgan fingerprint generator that uses the Morgan algorithm to update identifiers on each atom nodes based its local substructures at different radiuses.

Here is the definition of the method:

rdkit.Chem.rdFingerprintGenerator.GetMorganGenerator([(int)radius=3 [, 
  (bool)countSimulation=False [, 
  (bool)includeChirality=False [, 
  (bool)useBondTypes=True [, 
  (bool)onlyNonzeroInvariants=False [, 
  (bool)includeRingMembership=True [, 
  (AtomPairsParameters)countBounds=None [, 
  (int)fpSize=2048 [, 
  (AtomPairsParameters)atomInvariantsGenerator=None [, 
  (AtomPairsParameters)bondInvariantsGenerator=None [, 
  (AtomPairsParameters)useCountSimulation=None]]]]]]]]]]]) 
-> FingerprintGenerator

Descriptions of method arguments are:

Once a FingerprintGenerator object is created, you can call its instance methods to generate different types of Morgan fingerprints of a given molecule.

The above methods are actually providing same functionalities as Morgan fingerprint generation methods offered in the rdkit.Chem.rdMolDescriptors module as shown in the following tutorials.

1. GetMorganGenerator().GetFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect() method. For example:

from rdkit.Chem import AllChem
from rdkit.Chem import rdFingerprintGenerator
from rdkit.DataStructs import cDataStructs
mol = AllChem.MolFromSmiles('CCCC')
gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64) 
fp = gen.GetFingerprint(mol)
display(fp.ToBitString())

fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=64)
display(fp.ToBitString())

# output: 
'1000000000000000100000000010000001000010000000000000000000000000'
'1000000000000000100000000010000001000010000000000000000000000000'

2. GetMorganGenerator().GetCountFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetHashedMorganFingerprint() method. For example:

mol = AllChem.MolFromSmiles('CCCC')
gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64) 
fp = gen.GetCountFingerprint(mol)
display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp))

fp = AllChem.GetHashedMorganFingerprint(mol, 2, nBits=64)
display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp))

# output: 
{0: 1, 16: 2, 26: 2, 33: 2, 38: 2}
{0: 1, 16: 2, 26: 2, 33: 2, 38: 2}

3. GetMorganGenerator().GetSparseFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() method, except that GetSparseFingerprint() returns identifiers in a SparseBitVect object. For example:

mol = AllChem.MolFromSmiles('CCCC')
gen = rdFingerprintGenerator.GetMorganGenerator(radius=2) 
fp = gen.GetSparseFingerprint(mol)
display(fp)
display(fp.GetNumBits())
display(fp.GetNumOnBits())
display(list(fp.GetOnBits()))

fp = AllChem.GetMorganFingerprint(mol, 2)
display(fp)
display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp))

# output:
<rdkit.DataStructs.cDataStructs.SparseBitVect at 0x7fcbd6275ca0>
4294967295
5
[-2049583024, -2048238559, -752510682, 1173125914, 1244535424]

<rdkit.DataStructs.cDataStructs.UIntSparseIntVect at 0x7fcbd6275ee0>
{1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}

4. GetMorganGenerator().GetSparseCountFingerprint() method is actually provides the same functionality as the rdkit.Chem.rdMolDescriptors.GetMorganFingerprint() method. For example:

mol = AllChem.MolFromSmiles('CCCC')
gen = rdFingerprintGenerator.GetMorganGenerator(radius=2) 
fp = gen.GetSparseCountFingerprint(mol)
display(cDataStructs.ULongSparseIntVect.GetNonzeroElements(fp))

fp = AllChem.GetMorganFingerprint(mol, 2)
display(cDataStructs.UIntSparseIntVect.GetNonzeroElements(fp))

# output: 
{1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}
{1173125914: 2, 1244535424: 1, 2245384272: 2, 2246728737: 2, 3542456614: 2}

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

 Daylight Fingerprint Generator in RDKit

Morgan Fingerprint Generator in RDKit

 What Is Morgan Fingerprint Generator in RDKit

 GetMorganFingerprint() Method in RDKit

 Impact of 'radius' on GetMorganFingerprint()

 Impact of 'useCounts' on GetMorganFingerprint()

 Impact of 'invariants' on GetMorganFingerprint()

 Impact of 'useBondTypes' on GetMorganFingerprint()

 Impact of 'fromAtoms' on GetMorganFingerprint()

 GetMorganFingerprintAsBitVect() Method in RDKit

 Impact of 'nBits' on GetMorganFingerprintAsBitVect()

 GetHashedMorganFingerprint() Method in RDKit

 Impact of 'nBits' on GetHashedMorganFingerprint()

GetMorganGenerator() Method in RDKit

 Morgan Fingerprint Generator in RDKit for FCFP

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

 Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB