ML ICRITERIA: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(29 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{TAGDEF|ML_FF_LCRITERIA|[logical]|.TRUE.}}
{{DISPLAYTITLE:ML_ICRITERIA}}
{{TAGDEF|ML_ICRITERIA|[integer]}}
{{DEF|ML_ICRITERIA|3|for {{TAG|ML_MODE}} {{=}} SELECT|1|else}}


Description: Decides whether the threshold ({{TAG|ML_FF_CTIFOR}}) is updated in the machine learning force field methods. {{TAG|ML_FF_CTIFOR}} determines whether a first principles calculations is performed.
 
Description: Decides whether ({{TAG|ML_ICRITERIA}}>0) or how the Bayesian error threshold ({{TAG|ML_CTIFOR}}) is updated within the machine learning force field method. {{TAG|ML_CTIFOR}} determines whether a first-principles calculation is performed.
----
----
The use of this tag in combination with the learning algorithms is described here: [[Machine learning force field calculations: Basics#Threshold for error of forces|here]].


Generally it is recommended to automatically update the criteria {{TAG|ML_FF_CTIFOR}} during machine learning. Details on how and when the update is performed are controlled by {{TAG|ML_FF_CSLOPE}}, {{TAG|ML_FF_CSIG}} and {{TAG|ML_FF_MHIS}}.
The following options are possible for {{TAG|ML_ICRITERIA}}:
* {{TAG|ML_ICRITERIA}} = 0: The threshold {{TAG|ML_CTIFOR}} is not updated. This method is only recommended for refining an existing force field. For example, if you know that {{TAG|ML_CTIFOR}} has taken a value of 0.03 in previous runs, you can continue to collect training data by now setting the threshold to {{TAG|ML_CTIFOR}}=0.03 to capture all contours and areas of the potential energy surface where first-principles data are still missing. To achieve extremely robust force fields, it is recommended to run {{TAG|NSW}}=100000 steps in this mode to slightly above the highest temperature to be considered. 
* {{TAG|ML_ICRITERIA}} = 1: Set {{TAG|ML_CTIFOR}} to a value proportional to the average Bayesian errors of the {{TAG|ML_MHIS}} steps. {{TAG|ML_ICRITERIA}} = 1, the average is calculated only for errors after updating the force field. Such updates are quite rare, so updates of {{TAG|ML_CTIFOR}} are also quite rare in this mode. Furthermore, since the first principle calculations are only performed for configurations with large Bayesian errors ("outliers"), the force field is updated only after the outliers are taken into account. Therefore, the Bayes errors included in the averaging are typically larger than the average Bayes error in this mode.  It is therefore recommended to set {{TAG|ML_CX}} to 0 (default) in this mode.
* {{TAG|ML_ICRITERIA}} = 2: Update the criteria using the moving average of all previous Bayesian errors. This method gives the average of the errors of all previous predictions (i.e. all previously considered MD steps), while {{TAG|ML_ICRITERIA}} = 1 gives only the average of the predictions immediately following the retraining. The length of the history in this mode is currently hard-coded and set to 400 steps (or {{TAG|ML_MHIS}} x 50 in the newer version). This mode tends to continue sampling, and is therefore somewhat prone to oversampling: as Bayesian errors decrease, the threshold is steadily lowered and additional first-principles computations are initiated. The recommended values for {{TAG|ML_CX}} in this mode are approximately 0.1 to 0.3. = 0,2, a first-principles calculation is typically performed every 50 steps. This means that if the number of ionic steps is, say, {{TAG|NSW}}=50,000, then about 1,000 first-principles calculations should be performed. For many materials, this results in a reasonably good and robust ML database.
*{{TAG|ML_ICRITERIA}}=3: This mode is the default for reselecting local reference configurations from an existing {{TAG|ML_AB}} file ({{TAG|ML_MODE}} = ''SELECT''). The {{FILE|ML_AB}} file shall contain a {{TAG|ML_CTIFOR}} for each structure stored in the {{FILE|ML_AB}} file. These values are used by {{VASP}} as Bayesian error thresholds for structure selection. This also means that the tags {{TAG|ML_CTIFOR}}, {{TAG|ML_CX}}, {{TAG|ML_CSLOPE}}, {{TAG|ML_CSIG}} and {{TAG|ML_MHIS}} set in {{FILE|INCAR}} are ignored. This mode is only available when {{TAG|ML_MODE}}=''SELECT'' is activated. It is important that the {{FILE|ML_AB}} file contains a {{TAG|ML_CTIFOR}} value for each structure included. Otherwise, {{VASP}} will throw an error and will also indicate to the user that some {{TAG|ML_CTIFOR}} values are missing from the {{FILE|ML_AB}} file.


{{TAG|ML_FF_CTIFOR}} is generally set to the average of the  Bayesian errors of the forces stored in a history. The number of entries in the history are controlled by  {{TAG|ML_FF_MHIS}}. To avoid that noisy data or an abrupt jump of the Bayesian error causes issues, the standard error of the history must be below the threshold {{TAG|ML_FF_CSIG}}, for the update to take place. Furthermore, the slope of the stored data must be below the threshold  {{TAG|ML_FF_CSLOPE}} (we recommend to set only  {{TAG|ML_FF_CSIG}}).
As mentioned above, the {{TAG|ML_CX}} tag can be used to fine-tune the update of {{TAG|ML_CTIFOR}}.  
The fact that the {{TAG|ML_ICRITERIA}} = 1 or {{TAG|ML_ICRITERIA}} = 2 is a matter of taste. Just remember that {{TAG|ML_CX}} must be set differently in both modes. While {{TAG|ML_ICRITERIA}} = 1, the {{TAG|ML_CX}} = 0.0, {{TAG|ML_ICRITERIA}} = 2, {{TAG|ML_CX}} = 0.2 is a good default. 
Most of our force fields use {{TAG|ML_ICRITERIA}} = 1, but this mode sometimes stagnates and stops the first principle calculations.
On the other hand, and as already mentioned, using {{TAG|ML_ICRITERIA}} = 2 is prone to oversampling, i.e. it may perform too many first principle calculations.  


If the previous conditions are met, the criteria {{TAG|ML_FF_CTIFOR}} is updated. To avoid too abrupt changes the average Bayesian error can be mixed with the current value of  {{TAG|ML_FF_CTIFOR}}. The mixing ratio can be determined by the tag {{TAG|ML_FF_XMIX}} (default is no mixing).
== Related tags and articles ==
{{TAG|ML_LMLFF}}, {{TAG|ML_CTIFOR}}, {{TAG|ML_CSLOPE}}, {{TAG|ML_CSIG}}, {{TAG|ML_MHIS}}, {{TAG|ML_CX}}


== Related Tags and Sections ==
{{sc|ML_ICRITERIA|Examples|Examples that use this tag}}
{{TAG|ML_FF_LMLFF}}, {{TAG|ML_FF_CTIFOR}},  {{TAG|ML_FF_CSLOPE}}, {{TAG|ML_FF_CSIG}}, {{TAG|ML_FF_MHIS}}, {{TAG|ML_FF_XMIX}}
 
{{sc|ML_FF_LCRITERIA|Examples|Examples that use this tag}}
----
----
 
[[Category:INCAR tag]][[Category:Machine-learned force fields]]
[[Category:INCAR]][[Category:Machine Learning]][[Category:Machine Learned Force Fields]][[Category: Alpha]]

Latest revision as of 09:09, 14 April 2023

ML_ICRITERIA = [integer] 

Default: ML_ICRITERIA = 3 for ML_MODE = SELECT
= 1 else


Description: Decides whether (ML_ICRITERIA>0) or how the Bayesian error threshold (ML_CTIFOR) is updated within the machine learning force field method. ML_CTIFOR determines whether a first-principles calculation is performed.


The use of this tag in combination with the learning algorithms is described here: here.

The following options are possible for ML_ICRITERIA:

  • ML_ICRITERIA = 0: The threshold ML_CTIFOR is not updated. This method is only recommended for refining an existing force field. For example, if you know that ML_CTIFOR has taken a value of 0.03 in previous runs, you can continue to collect training data by now setting the threshold to ML_CTIFOR=0.03 to capture all contours and areas of the potential energy surface where first-principles data are still missing. To achieve extremely robust force fields, it is recommended to run NSW=100000 steps in this mode to slightly above the highest temperature to be considered.
  • ML_ICRITERIA = 1: Set ML_CTIFOR to a value proportional to the average Bayesian errors of the ML_MHIS steps. ML_ICRITERIA = 1, the average is calculated only for errors after updating the force field. Such updates are quite rare, so updates of ML_CTIFOR are also quite rare in this mode. Furthermore, since the first principle calculations are only performed for configurations with large Bayesian errors ("outliers"), the force field is updated only after the outliers are taken into account. Therefore, the Bayes errors included in the averaging are typically larger than the average Bayes error in this mode. It is therefore recommended to set ML_CX to 0 (default) in this mode.
  • ML_ICRITERIA = 2: Update the criteria using the moving average of all previous Bayesian errors. This method gives the average of the errors of all previous predictions (i.e. all previously considered MD steps), while ML_ICRITERIA = 1 gives only the average of the predictions immediately following the retraining. The length of the history in this mode is currently hard-coded and set to 400 steps (or ML_MHIS x 50 in the newer version). This mode tends to continue sampling, and is therefore somewhat prone to oversampling: as Bayesian errors decrease, the threshold is steadily lowered and additional first-principles computations are initiated. The recommended values for ML_CX in this mode are approximately 0.1 to 0.3. = 0,2, a first-principles calculation is typically performed every 50 steps. This means that if the number of ionic steps is, say, NSW=50,000, then about 1,000 first-principles calculations should be performed. For many materials, this results in a reasonably good and robust ML database.
  • ML_ICRITERIA=3: This mode is the default for reselecting local reference configurations from an existing ML_AB file (ML_MODE = SELECT). The ML_AB file shall contain a ML_CTIFOR for each structure stored in the ML_AB file. These values are used by VASP as Bayesian error thresholds for structure selection. This also means that the tags ML_CTIFOR, ML_CX, ML_CSLOPE, ML_CSIG and ML_MHIS set in INCAR are ignored. This mode is only available when ML_MODE=SELECT is activated. It is important that the ML_AB file contains a ML_CTIFOR value for each structure included. Otherwise, VASP will throw an error and will also indicate to the user that some ML_CTIFOR values are missing from the ML_AB file.

As mentioned above, the ML_CX tag can be used to fine-tune the update of ML_CTIFOR. The fact that the ML_ICRITERIA = 1 or ML_ICRITERIA = 2 is a matter of taste. Just remember that ML_CX must be set differently in both modes. While ML_ICRITERIA = 1, the ML_CX = 0.0, ML_ICRITERIA = 2, ML_CX = 0.2 is a good default. Most of our force fields use ML_ICRITERIA = 1, but this mode sometimes stagnates and stops the first principle calculations. On the other hand, and as already mentioned, using ML_ICRITERIA = 2 is prone to oversampling, i.e. it may perform too many first principle calculations.

Related tags and articles

ML_LMLFF, ML_CTIFOR, ML_CSLOPE, ML_CSIG, ML_MHIS, ML_CX

Examples that use this tag