Try RDKit Python API

Provides a tutorial example on how to use the RDKit Python API. Unfortunately, it is not working because of the missing boost_python library..

Now I am ready to try the RDKit Python API with the Python 2 engine. It should work with the build I did with "-DRDK_BUILD_PYTHON_WRAPPERS=OFF".

1. Import the "rdkit" package into Python 2. I see an "ImportError: No module named rdBase" error. I have no idea where "rdBase" module is located.

herong$ export PYTHONPATH=/home/herong/rdkit

herong$ python2

Python 2.7.16 (default, Nov 17 2019, 00:07:27)
[GCC 8.3.1 20190507 (Red Hat 8.3.1-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdkit import Chem
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/herong/rdkit/rdkit/__init__.py", line 2, in <module>
    from .rdBase import rdkitVersion as v1.25
ImportError: No module named rdBase

2. Read RDKit Python documentation, I found this note: "Beginning with the 2019.03 release, the RDKit is no longer supporting Python 2. If you need to continue using Python 2, please stick with a release from the 2018.09 release cycle." So I have to rebuild RDKit with "-DRDK_BUILD_PYTHON_WRAPPERS=ON" to work with Python 3.

3. Unzip rdkit-master.zip into ~/rdkit and again build it again with no option, which takes the default setting of "-DRDK_BUILD_PYTHON_WRAPPERS=ON". I see errors on

herong$ unzip rdkit-master.zip

herong$ mv rdkit-master rdkit

herong$ cd rdkit

herong$ mkdir build

herong$ cd build

herong$ cmake ..
CMake Error: The following variables are used in this project, but
they are set to NOTFOUND. Please set them or make sure they are set
and tested correctly in the CMake files:
PYTHON_LIBRARY (ADVANCED)
linked by target "RDBoost" in directory /home/herong/rdkit/Code/RDBoost
linked by target "rdBase" in directory /home/herong/rdkit/Code/RDBoost/Wrap
...

4. Install "platform-python-devel" and run "cmake" again. I see the "No Boost libraries were found" error.

herong$ sudo dnf install platform-python-devel

...
Installed:
  platform-python-devel-3.6.8-15.1.el8.x86_64
  python-rpm-macros-3-37.el8.noarch
  python3-rpm-generators-5-4.el8.noarch

herong$ cmake ..
CMake Error at /usr/share/cmake/Modules/FindBoost.cmake:2044 (message):
  Unable to find the requested Boost libraries.
  Boost version: 1.66.0
  Boost include path: /usr/include
  Could not find the following Boost libraries:
          boost_python
  No Boost libraries were found.  You may need to set BOOST_LIBRARYDIR to the
  directory containing Boost libraries or BOOST_ROOT to the location of
  Boost.
...

5. Search for boost_python library file. I see no boost_python library.

herong$ ls -l /usr/lib64/libboost_p*
     35 May 13  2019 /usr/lib64/libboost_prg_exec_monitor.so
                       -> libboost_prg_exec_monitor.so.1.66.0
  89688 May 13  2019 /usr/lib64/libboost_prg_exec_monitor.so.1.66.0
     34 May 13  2019 /usr/lib64/libboost_program_options.so
                       -> libboost_program_options.so.1.66.0
 701288 May 13  2019 /usr/lib64/libboost_program_options.so.1.66.0
...

herong$ dnf info boost
Installed Packages
Name         : boost
Version      : 1.66.0
Release      : 6.el8
Architecture : x86_64
Size         : 1.3 k
Source       : boost-1.66.0-6.el8.src.rpm
Repository   : @System
From repo    : AppStream
Summary      : The free peer-reviewed portable C++ source libraries
URL          : http://www.boost.org

Too bad. the "boost 1.66" package I installed does not have the boost_python library. Not sure if I have to install it manually.

Table of Contents

 About This Book

 Introduction of Molecules

 Molecule Names and Identifications

 Molecule Mass and Weight

 Protein and Amino Acid

 Nucleobase, Nucleoside, Nucleotide, DNA and RNA

 Gene and Chromosome

 Protein Kinase (PK)

 SDF (Structure Data File)

RDKit: Open-Source Cheminformatics Software

 What Is RDKit

 RDKit Installation Options

 Build RDKit from Source Code on CentOS System

 Compile, Link and Run RDKit C++ API Examples

Try RDKit Python API

 PyMol Installation

 PyMol GUI and CLI

 PyMol Selections

 PyMol Editing Functions

 PyMol Measurement Functions

 PyMol Movie Functions

 PyMol Python Integration

 PyMol Object Functions

 ChEMBL Database - European Molecular Biology Laboratory

 PubChem Database - National Library of Medicine

 PDB (Protein Data Bank)

 INSDC (International Nucleotide Sequence Database Collaboration)

 HGNC (HUGO Gene Nomenclature Committee)

 Resources and Tools

 Molecule Related Terminologies

 References

 Full Version in PDF/EPUB