Impact of 'branchedPaths' on RDKFingerprint()

This section provides a tutorial example on impact of the 'branchedPaths' option on fingerprint generation with rdkit.Chem.rdmolops.RDKFingerprint() function.

The 'branchedPaths' option in the rdkit.Chem.rdmolops.RDKFingerprint() function call allows you to control whether subgraphs with branched paths should be used to turn on bits in the fingerprint. If branchedPaths=True is used, subgraphs with branched paths are included in the fingerprint generation process. If branchedPaths=False is used, subgraphs with branched paths are eliminated in the fingerprint generation process.

1. For example, "CC(C)CC" is a molecule with 2 branches as displayed in this 2D structure on Jupyter Notebook with the following code. Bond indices are displayed to help us identifying subgraphs.

from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.drawOptions.addAtomIndices = False
IPythonConsole.drawOptions.addBondIndices = True
mol = Chem.MolFromSmiles('CC(C)CC')
display(mol)
Molecule with 2 Branches Displayed with RDKit
Molecule with 2 Branches Displayed with RDKit

2. If we use the "branchedPaths=True" option, the fingerprint will be generated from 5 unique subgraphs.

atomBits = []
bitInfo = {}
mol = Chem.MolFromSmiles('CC(C)CC')
fp = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, branchedPaths=True,
  atomBits=atomBits, bitInfo=bitInfo)
print(fp.ToBitString())
print(fp.GetNumOnBits())
print(atomBits)
print(bitInfo)

# output:
0000000000000000000001000000100000010000000100000000000000000100
5
[[28, 21, 43, 61, 35], [28, 21, 43, 61, 35], [28, 21, 61, 43, 35], ...]

{ 21: [[0, 2], [0, 1], [1, 2], [2, 3]], 
  28: [[0], [1], [2], [3]], 
  35: [[0, 2, 3, 1]], 
  43: [[0, 2, 3], [1, 2, 3]], 
  61: [[0, 2, 1]]
}

As you can see from the output, there are 2 branched subgraphs included in the fingerprint: bit 35 with bonds of [0, 2, 3, 1], and bit 61 with bonds of [0, 2, 1].

3. Now if we use the "branchedPaths=False" option, the fingerprint will be generated from only 3 non-branched unique subgraphs.

atomBits = []
bitInfo = {}
mol = Chem.MolFromSmiles('CC(C)CC')
fp = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, branchedPaths=False,
  atomBits=atomBits, bitInfo=bitInfo)
print(fp.ToBitString())
print(fp.GetNumOnBits())
print(atomBits)
print(bitInfo)

# output:
0000000000000000000001000000100000000000000100000000000000000000
3
[[28, 21, 43], [28, 21, 43], [28, 21, 43], [28, 21, 43], [28, 21, 43]]

{ 21: [[0, 1], [0, 2], [2, 3], [1, 2]], 
  28: [[0], [1], [2], [3]], 
  43: [[0, 2, 3], [1, 2, 3]]
}

4. Without branched subgraphs, 2 different molecules, "CC(C)CC" and "CCCC" will have identical fingerprints.

mol = Chem.MolFromSmiles('CC(C)CC')
fp1 = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, branchedPaths=False,
  atomBits=atomBits, bitInfo=bitInfo)
print(fp1.ToBitString())

mol = Chem.MolFromSmiles('CCCC')
fp2 = Chem.RDKFingerprint(mol, 
  fpSize=64, nBitsPerHash=1, branchedPaths=False,
  atomBits=atomBits, bitInfo=bitInfo)
print(fp2.ToBitString())

# output: 
0000000000000000000001000000100000000000000100000000000000000000
0000000000000000000001000000100000000000000100000000000000000000

Conclusion: We should keep using "branchedPaths=True" option so that differences branched subgraphs and non-branched will be reflected the fingerprint.

Table of Contents

 About This Book

 SMILES (Simplified Molecular-Input Line-Entry System)

 Open Babel: The Open Source Chemistry Toolbox

 Using Open Babel Command: "obabel"

 Generating SVG Pictures with Open Babel

 Substructure Search with Open Babel

 Similarity Search with Open Babel

 Fingerprint Index for Fastsearch with Open Babel

 Stereochemistry with Open Babel

 Command Line Tools Provided by Open Babel

 RDKit: Open-Source Cheminformatics Software

 rdkit.Chem.rdchem - The Core Module

 rdkit.Chem.rdmolfiles - Molecular File Module

 rdkit.Chem.rdDepictor - Compute 2D Coordinates

 rdkit.Chem.Draw - Handle Molecule Images

 Molecule Substructure Search with RDKit

 rdkit.Chem.rdmolops - Molecule Operations

Daylight Fingerprint Generator in RDKit

 What Is Daylight Fingerprint Generator in RDKit

 RDKFingerprint() Method in RDKit

 Impact of 'useBondOrder' on RDKFingerprint()

Impact of 'branchedPaths' on RDKFingerprint()

 Impact of 'maxPath' on RDKFingerprint()

 Impact of 'fpSize' on RDKFingerprint()

 Impact of 'tgtDensity' on RDKFingerprint()

 Impact of 'nBitsPerHash' on RDKFingerprint()

 UnfoldedRDKFingerprintCountBased() Method in RDKit

 GetRDKitFPGenerator() Method in RDKit

 Morgan Fingerprint Generator in RDKit

 RDKit Performance on Substructure Search

 Introduction to Molecular Fingerprints

 OCSR (Optical Chemical Structure Recognition)

 AlphaFold - Protein Structure Prediction

 Resources and Tools

 Cheminformatics Related Terminologies

 References

 Full Version in PDF/EPUB