Figure 3 from the paper. (c) ACS 2020.
I found this miniperspective a very enjoyable read. It covers much more than the title suggests (at least to me), such as a mini history of deep learning in MedChem, when to use deep learning and when to use other ML techniques such as regression of random forest (see the figure above), and some of the fundamental challenges of using ML and generative models in MedChem (just to name a few).
I found the last topic particularly interesting and include two of my favourite quotes from the paper below, but I really recommend that you read the entire paper.
Critically, small-molecule drug discovery breaks standard assumptions in many technological applications of machine learning. Most machine learning algorithms operate on the assumption that training and testing data are independently and identically distributed (the i.i.d. assumption). For example, we would expect a standard image classifier trained to exclusively distinguish cats from dogs to generalize to new images of cats and dogs. This model will likely produce nonsensical classifications if asked to evaluate pictures of humans. In stark contrast, real-world drug-discovery breaks this standard i.i.d.assumption. The optimization and design of small molecules necessarily explore structural variations drawn from intentionally novel regions of chemical space. Large structural changes to small-molecule hits are typically required to become a lead. For a model to be useful to the practicing medicinal chemist, it must generalize to out-of-distribution examples.
Critically, if generative models are to guide drug design, they cannot merely produce trivial extensions of the training data set. It remains unclear whether the latent spaces of generative models, which effectively interpolate across the chemical space of the training data, are capable of usefully extrapolating into new regions of chemical structure space. Furthermore, current generative models are torn between novelty and accessibility.
This work is licensed under a Creative Commons Attribution 4.0 International License.