Colin A. Grambow, Lagnajit Pattanaik, William H. Green (2020)
Highlighted by Jan Jensen
This paper describes a new data set of DFT barrier heights for 12,000 diverse chemical reactions and should stimulate a lot of new ML studies on chemical reactivity.
The molecules are sampled from GDB-7 so they are relative small and contain only H, C, N, and O. Each reaction is generated from a single molecule using single-ended GSM, so reactions with two reactants and two products are not represented in the data set. Other than these limitations the data set is very diverse:
The reactions span a wide range of both barriers and reaction energies (as seen in the figure above). Reactions with anywhere from 1 to 6 bond changes are represented (though there are only a handful with 6) as are changes to pretty much all bond types (C-H, C-C, C-N, etc). There are only 8 reaction templates with more than 100 examples and many have only a single reaction example. So, very diverse.
Best of all the authors provide atom-mapped reaction SMILES along with the barriers and reaction energies, which makes further benchmarking, analysis, and ML-studies very easy. It will be very exciting to see this data being put to good use!
This work is licensed under a Creative Commons Attribution 4.0 International License.
Highlighted by Jan Jensen
Figure 1 from the paper. Reproduced under the CC BY-NC-ND 4.0 licence
The molecules are sampled from GDB-7 so they are relative small and contain only H, C, N, and O. Each reaction is generated from a single molecule using single-ended GSM, so reactions with two reactants and two products are not represented in the data set. Other than these limitations the data set is very diverse:
The reactions span a wide range of both barriers and reaction energies (as seen in the figure above). Reactions with anywhere from 1 to 6 bond changes are represented (though there are only a handful with 6) as are changes to pretty much all bond types (C-H, C-C, C-N, etc). There are only 8 reaction templates with more than 100 examples and many have only a single reaction example. So, very diverse.
Best of all the authors provide atom-mapped reaction SMILES along with the barriers and reaction energies, which makes further benchmarking, analysis, and ML-studies very easy. It will be very exciting to see this data being put to good use!
This work is licensed under a Creative Commons Attribution 4.0 International License.
No comments:
Post a Comment