Cheminformatics Tutorials - Herong's Tutorial Examples - v2.01, by Herong Yang
GetHashedMorganFingerprint() Method in RDKit
This section provides a quick introduction on the rdkit.Chem.rdMolDescriptors.GetHashedMorganFingerprint() Method in the RDKit library.
GetHashedMorganFingerprint(mol, radius) method is located in the rdkit.Chem.rdMolDescriptors module of the RDKit library. The same method is also packaged as rdkit.Chem.AllChem.GetMorganFingerprintAsBitVect() for easier access.
GetHashedMorganFingerprint() method provides the same functionality as the GetMorganFingerprint() method, except that it hashes identifiers in the fingerprint into integers smaller than a given limit.
Here is the definition of the method:
rdkit.Chem.rdMolDescriptors.GetHashedMorganFingerprint((Mol)mol, (int)radius [, (int)nBits=2048 [, (AtomPairsParameters)invariants=[] [, (AtomPairsParameters)fromAtoms=[] [, (bool)useChirality=False [, (bool)useBondTypes=True [, (bool)useFeatures=False [, (AtomPairsParameters)bitInfo=None [, (bool)includeRedundantEnvironments=False]]]]]]]]) -> UIntSparseIntVect
Descriptions of all arguments:
By default, GetHashedMorganFingerprint() uses the "useFeatures=False" option to provide an implementation of the ECFP (Extended Connectivity Fingerprint) method as described in the "ECFP (Extended Connectivity Fingerprint) Method - 2000" section in this book.
Note that GetHashedMorganFingerprint() returns the fingerprint as a collection of integers smaller than nBits. This is done by dividing identifiers of all atom nodes generated at different radiuses by nBits, and taking remainders.
If you want the fingerprint to be returned as a collection of identifiers, you can call the GetMorganFingerprint() method, as described in the "GetMorganFingerprint() Method in RDKit" section in this chapter.
Now let's verify its ECFP implementation with some simple tests.
1. Molecule "C" has a single non-hydrogen atom. So a single identifier will be generated at radius=0.
from rdkit.Chem import AllChem from rdkit.DataStructs.cDataStructs import UIntSparseIntVect mol = AllChem.MolFromSmiles('C') fp = AllChem.GetMorganFingerprint(mol, 0) display(UIntSparseIntVect.GetNonzeroElements(fp)) bitInfo = {} fp = AllChem.GetHashedMorganFingerprint(mol, 0, nBits=64, bitInfo=bitInfo) display(UIntSparseIntVect.GetNonzeroElements(fp)) print(bitInfo) # output: {2246733040: 1} {48: 1} {48: ((0, 0),)}
As you can see, GetHashedMorganFingerprint() hashes the only identifier 2246733040 into a smaller integer 48, which is the remainder of 2246733040 divided by 64: 2246733040 % 64 = 48.
2. Molecule with a 2 identical atoms: "CC", expressed in SMILES.
mol = AllChem.MolFromSmiles('CC') fp = AllChem.GetMorganFingerprint(mol, 0) display(UIntSparseIntVect.GetNonzeroElements(fp)) bitInfo = {} fp = AllChem.GetHashedMorganFingerprint(mol, 0, nBits=64, bitInfo=bitInfo) display(UIntSparseIntVect.GetNonzeroElements(fp)) print(bitInfo) # output: {2246728737: 2} {33: 2} {33: ((0, 0), (1, 0))}
GetHashedMorganFingerprint() hashes 2 identical identifiers: 2246728737 into 2 identical smaller integer 33, which is the remainder of 2246728737 divided by 64: 2246728737 % 64 = 33.
3. Same molecule "CC" with radius=1.
mol = AllChem.MolFromSmiles('CC') fp = AllChem.GetMorganFingerprint(mol, 1) display(UIntSparseIntVect.GetNonzeroElements(fp)) bitInfo = {} fp = AllChem.GetHashedMorganFingerprint(mol, 1, nBits=64, bitInfo=bitInfo) display(UIntSparseIntVect.GetNonzeroElements(fp)) print(bitInfo) # output: {2246728737: 2, 3545175291: 1} {33: 2, 59: 1} {33: ((0, 0), (1, 0)), 59: ((0, 1),)}
Now GetHashedMorganFingerprint() hashes 3 identifiers: 2246728737 for 2 times and 3545175291 once, resulting 33 for 2 times and 59 once respectively.
Notice that when duplicated identifiers are hashed duplicated smaller integers, the number of duplications is maintained.
Table of Contents
SMILES (Simplified Molecular-Input Line-Entry System)
Open Babel: The Open Source Chemistry Toolbox
Using Open Babel Command: "obabel"
Generating SVG Pictures with Open Babel
Substructure Search with Open Babel
Similarity Search with Open Babel
Fingerprint Index for Fastsearch with Open Babel
Stereochemistry with Open Babel
Command Line Tools Provided by Open Babel
RDKit: Open-Source Cheminformatics Software
rdkit.Chem.rdchem - The Core Module
rdkit.Chem.rdmolfiles - Molecular File Module
rdkit.Chem.rdDepictor - Compute 2D Coordinates
rdkit.Chem.Draw - Handle Molecule Images
Molecule Substructure Search with RDKit
rdkit.Chem.rdmolops - Molecule Operations
Daylight Fingerprint Generator in RDKit
►Morgan Fingerprint Generator in RDKit
What Is Morgan Fingerprint Generator in RDKit
GetMorganFingerprint() Method in RDKit
Impact of 'radius' on GetMorganFingerprint()
Impact of 'useCounts' on GetMorganFingerprint()
Impact of 'invariants' on GetMorganFingerprint()
Impact of 'useBondTypes' on GetMorganFingerprint()
Impact of 'fromAtoms' on GetMorganFingerprint()
GetMorganFingerprintAsBitVect() Method in RDKit
Impact of 'nBits' on GetMorganFingerprintAsBitVect()
►GetHashedMorganFingerprint() Method in RDKit
Impact of 'nBits' on GetHashedMorganFingerprint()
GetMorganGenerator() Method in RDKit
Morgan Fingerprint Generator in RDKit for FCFP
RDKit Performance on Substructure Search
Introduction to Molecular Fingerprints
OCSR (Optical Chemical Structure Recognition)
AlphaFold - Protein Structure Prediction