Suzuki Reaction Yield Predictor

Chemical reaction experimentations in a lab is a slow and expensive process. Even after getting the right products for a chemical reaction, chemists need to know how much of a product will form. Getting the actual yield of a reaction is needed for drug synthesis planning and is quite advantageous to decide on whether a synthesis path is to be preferred or avoided.

This project was related to predicting the yield of a Suzuki class of chemical reactions given the actual product. This was an 11 month long, end-to-end project which included tasks from acquiring and cleaning data from various sources to deploying deep learning models on servers for client usage.

With nearly 500+ machine learning and deep learning models trained, this project required steps from featurizing chemical molecules into numerical vectors to performing quantum computations.

Structuring the problem

There were mainly three approaches tried to structure the problem:

Predicting the yield of reactions
Classifying reactions into high- and low-yielding reactions
Ranking a given list of reactions in the order of their yields

List of descriptors

Following descriptors were experimented in different combinations for modelling:

Occupancy descriptors
Potential energy in a grid
Unversal Force Field (UFF) parameters
ECFP features
Modified ECFP features
DFT-based properties
Atom Pair - Bond Pair (AP-BP) descriptors
Functional Group decmposition
Structural properties
Experimental conditions

Models

Many models were tried for the problem. Some of them includes:

RandomForest
XGBoost
Feed Forward Neural Network
Graph Convolutional Network
Gated Graph Convolutional Network
Attention Graph Convolutional Network
Gated-Attention Graph Convolutional Network

The implementation code is propriety property of Aganitha Cognitive Solutions.

Structuring the problem

List of descriptors

Models

Related Projects

Subscribe