ML IALGO LINREG: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 27: Line 27:


{{TAG|ML_IALGO_LINREG}}=4 dramatically improves the condition number of the fitting compared to {{TAG|ML_IALGO_LINREG}}=1 since it directly uses the design matrix. In contrast {{TAG|ML_IALGO_LINREG}}=1 requires to use the covariance matrix (square of the design matrix), which effectively doubles the condition number.
{{TAG|ML_IALGO_LINREG}}=4 dramatically improves the condition number of the fitting compared to {{TAG|ML_IALGO_LINREG}}=1 since it directly uses the design matrix. In contrast {{TAG|ML_IALGO_LINREG}}=1 requires to use the covariance matrix (square of the design matrix), which effectively doubles the condition number.
However, {{TAG|ML_IALGO_LINREG}}=4 needs significantly more memory than {{TAG|ML_IALGO_LINREG}}=1 (at least twice that much). Please always monitor the memory estimates in {{TAG|ML_LOGFILE}}! It also takes longer to finish than {{TAG|ML_IALGO_LINREG}}=1, but still the time needed for refitting compared to the time for training is negligible.
However, {{TAG|ML_IALGO_LINREG}}=4 needs significantly more memory than {{TAG|ML_IALGO_LINREG}}=1 (at least twice that much). Please always monitor the memory estimates in {{TAG|ML_LOGFILE}}! It should be also noted that it is computationally somewhat more demanding than {{TAG|ML_IALGO_LINREG}}=1, but it typically requires between a few minutes and an hour. So usually the extra cost is negligible compared to the original training.


== Related tags and articles ==
== Related tags and articles ==

Revision as of 09:20, 29 March 2023

ML_IALGO_LINREG = [integer]
Default: ML_IALGO_LINREG = 1 

Description: This tag determines the algorithm that is employed to solve the system of linear equations in the ridge regression method for machine learning.


In the ridge regression method for machine learning one needs to solve for the unknown weights minimizing

For more details please see here.

The following options are available to solve for :

For on the fly learning ML_MODE = TRAIN and reselection of local reference configurations ML_MODE = SELECT, it is strictly necessary to use Bayesian regression (ML_IALGO_LINREG=1), since uncertainty estimates are only available for Bayesian regression.

Refitting: Although the above modes result in an ML_FFN file that could be used for production runs, we strongly advise to refit the ML_ABN files. For that copy the ML_ABN file to the ML_AB file and use ML_MODE= REFIT (if Bayesian error estimation is required during production runs ML_MODE= REFITBAYESIAN is an option). This mode employs ML_IALGO_LINREG=4 by default.

From ML_IALGO_LINREG>1, ML_IALGO_LINREG=3 and 4 are the most tested approaches and we use ML_IALGO_LINREG=4 routinely before employing a machine learned force field. ML_IALGO_LINREG=4 gives more stable force fields and better fitting accuracy than ML_IALGO_LINREG=3, due to the regularization term employed.

ML_IALGO_LINREG=4 dramatically improves the condition number of the fitting compared to ML_IALGO_LINREG=1 since it directly uses the design matrix. In contrast ML_IALGO_LINREG=1 requires to use the covariance matrix (square of the design matrix), which effectively doubles the condition number. However, ML_IALGO_LINREG=4 needs significantly more memory than ML_IALGO_LINREG=1 (at least twice that much). Please always monitor the memory estimates in ML_LOGFILE! It should be also noted that it is computationally somewhat more demanding than ML_IALGO_LINREG=1, but it typically requires between a few minutes and an hour. So usually the extra cost is negligible compared to the original training.

Related tags and articles

ML_LMLFF, ML_MODE, ML_W1, ML_WTOTEN, ML_WTIFOR, ML_WTSIF

Examples that use this tag