Category:Machine-learned force fields

From VASP Wiki

Machine-learned force fields used in combination with ab-initio molecular dynamics (MD) allow capturing the underlying physics from first principles and still reach long simulation times relatively cheaply. Generally, an ab-initio MD step is computationally expensive due to the quantum-mechanical treatment of the electrons, e.g., within density-functional theory (DFT). In fully classical MD calculations, force fields are used to evaluate the force acting on each atom instead of DFT. These interatomic potentials are traditionally based on experimental observation and empirical inclusion of known forces, such as the van-der-Waals force, electrostatic charges, etc. Therefore, the quality of the force field depends on how well interactions in the specific system are known.

VASP offers machine learning force fields on-the-fly to overcome these two issues. Namely, the high computational cost required for ab-initio MD and the empirical knowledge necessary to construct a force field the traditional way. To learn more about on-the-fly machine learning read about the theory of on-the-fly machine learning force fields or about the setup of a basic calculation! Also, check out the description on best practices on how to construct, test, and retrain force fields, as well as the basic tutorial that provides hands-on experience for silicon and this tutorial: Liquid Si - MLFF.

Theory

VASP uses a Bayesian-learning algorithm for on-the-fly machine learning. The total energy and forces are predicted based on the machine-learned force field at each time step of the MD simulation. If the Bayesian error estimate exceeds a certain threshold an ab-initio calculation is performed, where electrons are treated quantum mechanically. Subsequently, previously untrained atomic environments are identified and added to a set of so-called local reference configurations. This set serves as a comparison database for future force field predictions. With new information in the form of total energy and atomic forces obtained from the triggered ab-initio calculation the machine-learned force field is updated and the MD simulation continues. In this way the force field is iteratively improved in the course of the MD simulation. Ideally, it will reach such high prediction quality that no more ab-initio calculations are required.

For details on the algorithm, check out the theory article about machine learning force fields.

How to

All related INCAR tags and input/output files begin with the prefix ML_. To learn about machine learning force fields, visit:

Input

Depending on whether the calculation is training, retraining, or applying the force field (see ML_MODE), VASP may require the following input files in addition to the usual input files (INCAR, POSCAR, etc.):

  • ML_AB Ab-initio training data.
  • ML_FF Force-field parameters.

Output

The machine-learning–force-field method generates the following output files:

  • ML_LOGFILE Main output file.
  • ML_ABN Generated ab-initio training data. It is used as ML_AB file to restart a calculation.
  • ML_REG Summary of regression results.
  • ML_HIS Summary of the histogram data.
  • ML_FFN Force-field parameters. It is used as ML_FF file to restart a calculation.
  • ML_EATOM Local atomic energies.
  • ML_HEAT Local heat flux.

Hyperparameters

Hyperparameters are user-defined parameters of the MLFF model that will not be optimized during training of the MLFF. For instance, cutoff radii (ML_RCUT1, ML_RCUT2), the normalization and weighting of ab-initio training data (ML_IWEIGHT, ML_WTOTEN, ML_WTIFOR, ML_WTSIF) , threshold and the total number of descriptors in the sparsification (ML_EPS_LOW, ML_RDES_SPARSDES). We recommend optimizing these in cross-validation by means of systematically varying the parameters and refitting the force field (ML_MODE=refit). In principle, it is necessary to do that for all hyperparameters available in VASP to obtain the best MLFF model. In practice, usually the default values are a good guess. The default parameters of VASP were optimized on a selected set of bulk materials and on a molecular data base.