Lior Hirschfeld, Kyle Swanson, Kevin Yang, Regina Barzilay, and Connor W. Coley 2020
Highlighted by Jan Jensen
Figure 3 from the paper. (c) American Chemical Society 2020
Given the blackbox nature of ML models it is very important to have some measure of how much to trust their predictions. There are many ways to do this paper shows "none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple data sets."
This conclusion is neatly summarised in the figure shown above for 5 common datasets, 2 different ML methods, and 4 different methods for uncertainty quantification. For each combination of these the plot shows the RMSE for for the 100, 50, 25, 10, and 5% of the test set on which the uncertainty quantification method calculated the lowest uncertainty for the hold-out set.
Generally, the RMSE drops as expected but the drops are in many cases decidedly modest past 50% and it can even increase in some cases. In most cases there is very little difference between the different uncertainty quantification methods, but sometimes there is and it's hard to predict when.
One thing that struck me when reading this paper is that many studies who include uncertainty quantification, e.g. using the ensemble approach, often just take it for granted that it works and don't present tests like this.
This work is licensed under a Creative Commons Attribution 4.0 International License.