GetMorganFingerprintAsBitVect() Method in RDKit

This section provides a quick introduction on the rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect() Method in the RDKit library.

GetMorganFingerprintAsBitVect(mol, radius) method is located in the rdkit.Chem.rdMolDescriptors module of the RDKit library. The same method is also packaged as rdkit.Chem.AllChem.GetMorganFingerprintAsBitVect() for easier access.

GetMorganFingerprintAsBitVect() method provides the same functionality as the GetMorganFingerprint() method, except that it returns the fingerprint as a bit string.

Here is the full description of the method:

rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect((Mol)mol, 
  (int)radius[, 
  (int)nBits=2048[, 
  (AtomPairsParameters)invariants=[][, 
  (AtomPairsParameters)fromAtoms=[][, 
  (bool)useChirality=False[, 
  (bool)useBondTypes=True[, 
  (bool)useFeatures=False[, 
  (AtomPairsParameters)bitInfo=None[, 
  (bool)includeRedundantEnvironments=False]]]]]]]]) 
-> ExplicitBitVect

Descriptions of all arguments:

By default, GetMorganFingerprintAsBitVect() uses the "useFeatures=False" option to provide an implementation of the ECFP (Extended Connectivity Fingerprint) method as described in the "ECFP (Extended Connectivity Fingerprint) Method - 2000" section in this book.

Note that GetMorganFingerprintAsBitVect() returns the fingerprint as a bit string, representing identifiers of all atom nodes generated at different radiuses.

If you want the fingerprint to be returned as a collection of identifiers, you can call the GetMorganFingerprint() method, as described in the "GetMorganFingerprint() Method in RDKit" section in this chapter.

Now let's verify its ECFP implementation with some simple tests.

1. Molecule "C" has a single non-hydrogen atom. So a single identifier will be generated at radius=0.

from rdkit.Chem import AllChem
from rdkit.DataStructs.cDataStructs import UIntSparseIntVect
mol = AllChem.MolFromSmiles('C')
fp = AllChem.GetMorganFingerprint(mol, 0)
display(UIntSparseIntVect.GetNonzeroElements(fp))

bitInfo = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 0, nBits=64,
  bitInfo=bitInfo)
display(fp.ToBitString())
print(bitInfo)

# output: 
{2246733040: 1}
'0000000000000000000000000000000000000000000000001000000000000000'
{48: ((0, 0),)}

As you can see, GetMorganFingerprintAsBitVect() encodes the only identifier 2246733040 into a 64 long bit string with value 1 at location 48, which is the remainder of 2246733040 divided by 64: 2246733040 % 64 = 48.

2. Molecule with a 2 identical atoms: "CC", expressed in SMILES.

mol = AllChem.MolFromSmiles('CC')
fp = AllChem.GetMorganFingerprint(mol, 0)
display(UIntSparseIntVect.GetNonzeroElements(fp))

bitInfo = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 0, nBits=64,
  bitInfo=bitInfo)
display(fp.ToBitString())
print(bitInfo)

# output: 
{2246728737: 2}
'0000000000000000000000000000000001000000000000000000000000000000'
{33: ((0, 0), (1, 0))}

GetMorganFingerprint() returns 2 identical identifiers: 2246728737. GetMorganFingerprintAsBitVect() encodes both into a single value 1 at location 33, which is the remainder of 2246728737 divided by 64: 2246728737 % 64 = 33.

3. Same molecule "CC" with radius=1.

mol = AllChem.MolFromSmiles('CC')
fp = AllChem.GetMorganFingerprint(mol, 1)
display(UIntSparseIntVect.GetNonzeroElements(fp))

bitInfo = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 1, nBits=64,
  bitInfo=bitInfo)
display(fp.ToBitString())
print(bitInfo)

# output:
{2246728737: 2, 3545175291: 1}
'0000000000000000000000000000000001000000000000000000000000010000'
{33: ((0, 0), (1, 0)), 59: ((0, 1),)}

Now GetMorganFingerprint() returns 3 identifiers: 2246728737 for 2 times and 3545175291 once. GetMorganFingerprintAsBitVect() encodes them into value 1 at location 33 and 59.

Notice that when duplicated identifiers are encoded into a bit string, the number of duplications is lost. So the fingerprint generated by GetMorganFingerprintAsBitVect() carry less information than GetMorganFingerprint().

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

 Daylight Fingerprint Generator in RDKit

Morgan Fingerprint Generator in RDKit

 What Is Morgan Fingerprint Generator in RDKit

 GetMorganFingerprint() Method in RDKit

 Impact of 'radius' on GetMorganFingerprint()

 Impact of 'useCounts' on GetMorganFingerprint()

 Impact of 'invariants' on GetMorganFingerprint()

 Impact of 'useBondTypes' on GetMorganFingerprint()

 Impact of 'fromAtoms' on GetMorganFingerprint()

GetMorganFingerprintAsBitVect() Method in RDKit

 Impact of 'nBits' on GetMorganFingerprintAsBitVect()

 GetHashedMorganFingerprint() Method in RDKit

 Impact of 'nBits' on GetHashedMorganFingerprint()

 GetMorganGenerator() Method in RDKit

 Morgan Fingerprint Generator in RDKit for FCFP

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

 Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB