Kevin McCloskey, Ankur Talya, Federico Montia, Michael P. Brennera, and Lucy J. Colwella (2019)
Highlighted by Jan Jensen
Part of Figure 3. Red indicates atoms that make positive contributions to the predicted values.
Copyright (2019) National Academy of Sciences.
This paper shows that state-of-the-art ML models can easily be fooled even for relatively trivial classification problems.
The authors generate several synthetic classification sets using simple rules, such as the presence of a phenyl group, and train both a graph-convolutional and message passing NN. Not surprisingly, the hold-out performance is near perfect with AUCs near 1.000.
Then they use a technique called integrated gradients to compute atomic contributions to the predictions and check whether these contributions match the rules used to create the data sets. For example, if the ground truth rule is the presence of a benzene ring, then only benzene ring atoms should make significant positive contributions. For some ground truth rules, this is often not the case!
Figure 3A above shows a case where the ground truth rule is the presence of three groups: a phenyl, a primary amine, and an ether. While this model is correctly classified there are significant atomic contributions from some of the fused ring atoms. So either the atomic contributions are mis-assigned by the integrated gradients method or the prediction is correct for the wrong reasons. The authors argue that it is the latter because three atomic changes in and near the fused ring (Figure 3B) results in a molecule that the model mis-classifies.
The authors note:
It is dangerous to trust a model whose predictions one does not understand. A serious issue with neural networks is that, although a held-out test set may suggest that the model has learned to predict perfectly, there is no guarantee that the predictions are made for the right reason. Biases in the training set can easily cause errors in the model’s logic. The solution to this conundrum is to take the model seriously: Analyze it, ask it why it makes the predictions that it does, and avoid relying solely on aggregate accuracy metrics.
The integrated gradient (IG) method is interesting in and of itself, so a few more words on that:
Jiménez-Luna et al. have since shown that the IG approach can be used to extract pharmacophores from models trained on experimental data sets.
IG can only be applied to fully differentiable models such as NNs but Riniker and Landrum and Sheridan have developed fingerprint-based approaches that can be applied to any ML model but are theoretically more ad hoc. The Riniker-Landrum approach is available in RDKit while Jiménez-Luna et al. provide an implementation of the Sheridan approach, and also identify several examples where IG and the Sheridan approach gives different interpretations.
This work is licensed under a Creative Commons Attribution 4.0 International License.