Chemical reaction experimentations in a lab is a slow and expensive process. Even after getting the right products for a chemical reaction, chemists need to know how much of a product will form. Getting the actual yield of a reaction is needed for drug synthesis planning and is quite advantageous to decide on whether a synthesis path is to be preferred or avoided.
This project was related to predicting the yield of a Suzuki class of chemical reactions given the actual product. This was an 11 month long, end-to-end project which included tasks from acquiring and cleaning data from various sources to deploying deep learning models on servers for client usage.
With nearly 500+ machine learning and deep learning models trained, this project required steps from featurizing chemical molecules into numerical vectors to performing quantum computations.
Structuring the problem
There were mainly three approaches tried to structure the problem:
- Predicting the yield of reactions
- Classifying reactions into high- and low-yielding reactions
- Ranking a given list of reactions in the order of their yields
List of descriptors
Following descriptors were experimented in different combinations for modelling:
- Occupancy descriptors
- Potential energy in a grid
- Unversal Force Field (UFF) parameters
- ECFP features
- Modified ECFP features
- DFT-based properties
- Atom Pair - Bond Pair (AP-BP) descriptors
- Functional Group decmposition
- Structural properties
- Experimental conditions
Models
Many models were tried for the problem. Some of them includes:
- RandomForest
- XGBoost
- Feed Forward Neural Network
- Graph Convolutional Network
- Gated Graph Convolutional Network
- Attention Graph Convolutional Network
- Gated-Attention Graph Convolutional Network
The implementation code is propriety property of Aganitha Cognitive Solutions.