Monday, October 31, 2022

Semiempirical Hamiltonians learned from data can have accuracy comparable to Density Functional Theory

Frank Hu, Francis He, David J. Yaron (2022) 
Highlighted by Jan Jensen

Figure 7 from the paper. (c) The authors 2022. Reproduced under the BY-NC-ND licence


This paper uses ML techniques and algorithms (specifically PyTorch) to fit DFTB parameters, which results in a semiempirical quantum method (SQM) that has an accuracy similar to DFT. The advantage of such a physics-based method over a pure ML-based is that it is likely to be more transferable and requires much less training data. This should make it much easier to extend to other elements and new molecular properties, such as barriers.

Parameterising SQMs is notoriously difficult as the molecular properties depend exponentially on many of the parameters. As a result, most SQMs used today have parameterised by hand. The paper presents several methodological tricks to automate the fitting.

One is the use of high-order polynomial spline functions to describe how the Hamiltonian elements depend the fitting-parameters. The functions allow the computation of not only of the first derivative needed for back propagation, but also high-order derivatives, which are used for regularisation to avoid overfitting and keeping the parameters physically reasonable. Finally, the SCF and training loops are inverted to that the he charge fluctuations needed for the Fock operator are updated based on the current model parameters every 10 epochs. This enables computationally efficient back propagation during training, which is important because the training set is on the order of 100k.

Another neat feature is that the final model is simply a parameter file (SKF file), which can be read by most DFTB programs. So there is nothing new for the user to implement. However, currently the implementation is only for CNHO.


This work is licensed under a Creative Commons Attribution 4.0 International License.