Use StoneMIND Collector Web Interface

Cheminformatics Tutorials - Herong's Tutorial Examples

∟OCSR (Optical Chemical Structure Recognition)

∟Use StoneMIND Collector Web Interface

This section provides a tutorial example on how to use the StoneMIND Collector Web Interface to scan an entire patent PDF document of 181 pages and recognize all molecules using 'IUPAC' and 'OCSR' methods.

StoneMIND Collector Interface offers additional functionalities to do batch extraction of patents or essays. Here is what I did to try

1. Go to StoneMIND Collector Website at https://www.stonewise.cn/mol_product.

2. Click the "Web Interface" button. I see the signin/signup screen in Chinese.

3. Click "Signup" and fill in the form and click "Submit". I see the StoneMIND Collector Web interface.

4. Click Knowledge Base > Data Extraction. I see an empty project list.

5. Click "Create Project" and enter "Test" as the project name. I see no tasks in the new project.

6. Click "Create Task". I see a task window with 2 options: "Upload PDF" and "Paten Number".

7. Select "Patent Number" and enter "WO2001000214A1". I see a new task created. StoneMIND Collector is smart to find the PDF file of the given patent number from the Internet automatically.

8. Click "Extract" on the task and select both "IUPAC" and "OCSR" methods. I see that StoneMIND Collector starts to scan the PDF file and extracts molecule structures.

11. Click "View" after the extraction process is completed, which may take some time. I see 188 molecules extracted from OCSR method and 189 from IUPAC method.

12. Go to page 6 in the PDF panel on the left. And click the first molecule diagram. I see the resulting molecule structure displayed on the right. It looks more accurate than the result done by StoneMIND Collector client. The top ring is complete and only mislabeled X2 and X3 atoms.

13. Inspect each extracted molecule and correct any mistakes as you see.

14. Click "Download" to save all extracted molecules to a local file.

Here is a summary of my StoneMIND Collector Web interface task:

Patent Number: WO2001000214A1
Patent Year: 2001
PDF Pages: 181
Molecules by OCSR: 188
Molecules by IUPAC: 189
Extraction Time in Minutes: 74
Seconds per Page: 25