ML ICRITERIA: Difference between revisions

Latest revision as of 09:09, 14 April 2023

Default: ML_ICRITERIA	= 3	for ML_MODE = SELECT
	= 1	else

Description: Decides whether (ML_ICRITERIA>0) or how the Bayesian error threshold (ML_CTIFOR) is updated within the machine learning force field method. ML_CTIFOR determines whether a first-principles calculation is performed.

The use of this tag in combination with the learning algorithms is described here: here.

The following options are possible for ML_ICRITERIA:

ML_ICRITERIA = 0: The threshold ML_CTIFOR is not updated. This method is only recommended for refining an existing force field. For example, if you know that ML_CTIFOR has taken a value of 0.03 in previous runs, you can continue to collect training data by now setting the threshold to ML_CTIFOR=0.03 to capture all contours and areas of the potential energy surface where first-principles data are still missing. To achieve extremely robust force fields, it is recommended to run NSW=100000 steps in this mode to slightly above the highest temperature to be considered.
ML_ICRITERIA = 1: Set ML_CTIFOR to a value proportional to the average Bayesian errors of the ML_MHIS steps. ML_ICRITERIA = 1, the average is calculated only for errors after updating the force field. Such updates are quite rare, so updates of ML_CTIFOR are also quite rare in this mode. Furthermore, since the first principle calculations are only performed for configurations with large Bayesian errors ("outliers"), the force field is updated only after the outliers are taken into account. Therefore, the Bayes errors included in the averaging are typically larger than the average Bayes error in this mode. It is therefore recommended to set ML_CX to 0 (default) in this mode.
ML_ICRITERIA = 2: Update the criteria using the moving average of all previous Bayesian errors. This method gives the average of the errors of all previous predictions (i.e. all previously considered MD steps), while ML_ICRITERIA = 1 gives only the average of the predictions immediately following the retraining. The length of the history in this mode is currently hard-coded and set to 400 steps (or ML_MHIS x 50 in the newer version). This mode tends to continue sampling, and is therefore somewhat prone to oversampling: as Bayesian errors decrease, the threshold is steadily lowered and additional first-principles computations are initiated. The recommended values for ML_CX in this mode are approximately 0.1 to 0.3. = 0,2, a first-principles calculation is typically performed every 50 steps. This means that if the number of ionic steps is, say, NSW=50,000, then about 1,000 first-principles calculations should be performed. For many materials, this results in a reasonably good and robust ML database.
ML_ICRITERIA=3: This mode is the default for reselecting local reference configurations from an existing ML_AB file (ML_MODE = SELECT). The ML_AB file shall contain a ML_CTIFOR for each structure stored in the ML_AB file. These values are used by VASP as Bayesian error thresholds for structure selection. This also means that the tags ML_CTIFOR, ML_CX, ML_CSLOPE, ML_CSIG and ML_MHIS set in INCAR are ignored. This mode is only available when ML_MODE=SELECT is activated. It is important that the ML_AB file contains a ML_CTIFOR value for each structure included. Otherwise, VASP will throw an error and will also indicate to the user that some ML_CTIFOR values are missing from the ML_AB file.

As mentioned above, the ML_CX tag can be used to fine-tune the update of ML_CTIFOR. The fact that the ML_ICRITERIA = 1 or ML_ICRITERIA = 2 is a matter of taste. Just remember that ML_CX must be set differently in both modes. While ML_ICRITERIA = 1, the ML_CX = 0.0, ML_ICRITERIA = 2, ML_CX = 0.2 is a good default. Most of our force fields use ML_ICRITERIA = 1, but this mode sometimes stagnates and stops the first principle calculations. On the other hand, and as already mentioned, using ML_ICRITERIA = 2 is prone to oversampling, i.e. it may perform too many first principle calculations.

@@ Line 1: / Line 1: @@
 {{DISPLAYTITLE:ML_ICRITERIA}}
 {{TAGDEF|ML_ICRITERIA|[integer]}}
-{{DEF|ML_ICRITERIA|0|for {{TAG|ML_MODE}} {{=}} SELECT|1|else}}
+{{DEF|ML_ICRITERIA|3|for {{TAG|ML_MODE}} {{=}} SELECT|1|else}}
-{{TAGDEF|ML_ICRITERIA|[integer]|1}}
-Description: Decides whether ({{TAG|ML_ICRITERIA}}>0) or how the Bayesian error threshold ({{TAG|ML_CTIFOR}}) is updated within the machine learning force field method. {{TAG|ML_CTIFOR}} determines whether a first principles calculations is performed.
+Description: Decides whether ({{TAG|ML_ICRITERIA}}>0) or how the Bayesian error threshold ({{TAG|ML_CTIFOR}}) is updated within the machine learning force field method. {{TAG|ML_CTIFOR}} determines whether a first-principles calculation is performed.
 ----
 The use of this tag in combination with the learning algorithms is described here: [[Machine learning force field calculations: Basics#Threshold for error of forces|here]].
 The following options are possible for {{TAG|ML_ICRITERIA}}:
-* {{TAG|ML_ICRITERIA}} = 0: No update of the threshold {{TAG|ML_CTIFOR}} is performed. This mode is the default to reselect local reference configurations from an existing {{TAG|ML_AB}} file ({{TAG|ML_MODE}} = ''SELECT''). Otherwise, we recommend to use this mode only to refine an existing force field. For instance, if you know that in previous runs {{TAG|ML_CTIFOR}} was taking a value of 0.03, you might continue acquiring training data with the threshold now fixed to {{TAG|ML_CTIFOR}}=0.03, in order to catch all outliners and areas of the potential energy surface, where first principle data are still missing. To obtain highly robust force fields, we recommend to run for say {{TAG|NSW}}=100000 (one hundred thousand steps) in this mode at the highest temperature to be considered (or slightly above the highest considered temperature).
+* {{TAG|ML_ICRITERIA}} = 0: The threshold {{TAG|ML_CTIFOR}} is not updated. This method is only recommended for refining an existing force field. For example, if you know that {{TAG|ML_CTIFOR}} has taken a value of 0.03 in previous runs, you can continue to collect training data by now setting the threshold to {{TAG|ML_CTIFOR}}=0.03 to capture all contours and areas of the potential energy surface where first-principles data are still missing. To achieve extremely robust force fields, it is recommended to run {{TAG|NSW}}=100000 steps in this mode to slightly above the highest temperature to be considered.
-* {{TAG|ML_ICRITERIA}} = 1: Set {{TAG|ML_CTIFOR}} to a value proportional to the  average Bayesian errors of {{TAG|ML_MHIS}} steps. For {{TAG|ML_ICRITERIA}} = 1, the average is calculated only over the errors after  updates of the force field. Such updates occur only rather rarely, hence updates of {{TAG|ML_CTIFOR}} are also fairly seldom in this mode. Furthermore, since first principles calculations are only performed for configurations with large Bayesian errors ("outliers"), also updates of the force fields occur only after outliners have been considered. Hence the Bayesian errors that enter the averaging are also typically larger than the average Bayesian error in this mode.  It is thus recommended to set {{TAG|ML_CX}} to 0 in this mode (default).
+* {{TAG|ML_ICRITERIA}} = 1: Set {{TAG|ML_CTIFOR}} to a value proportional to the average Bayesian errors of the {{TAG|ML_MHIS}} steps. {{TAG|ML_ICRITERIA}} = 1, the average is calculated only for errors after updating the force field. Such updates are quite rare, so updates of {{TAG|ML_CTIFOR}} are also quite rare in this mode. Furthermore, since the first principle calculations are only performed for configurations with large Bayesian errors ("outliers"), the force field is updated only after the outliers are taken into account. Therefore, the Bayes errors included in the averaging are typically larger than the average Bayes error in this mode.  It is therefore recommended to set {{TAG|ML_CX}} to 0 (default) in this mode.
-* {{TAG|ML_ICRITERIA}} = 2: Update of criteria using gliding average of all previous Bayesian errors. This mode averages the error over all previous predictions (that is every previously considered MD step), whereas the {{TAG|ML_ICRITERIA}} = 1 averages only over predictions immediately after re-training. The history length in this mode is currently hard coded and set to 400 steps (or {{TAG|ML_MHIS}} x 50 in newer version). This mode tends to continue sampling, and it is thus somewhat prone to oversampling: as the Bayesian errors decrease, also the threshold will be continuously lowered and further first principles calculations are initiated. Recommended values for {{TAG|ML_CX}} are about 0.1- 0.3 in this mode. For a value around {{TAG|ML_CX}} = 0.2, typically every 50 steps a first principles calculation is performed. This means that if the number of ionic steps is set to say {{TAG|NSW}}=50000, about 1000 first principles calculations are performed. This results in a fairly good and robust data base for ML for many materials.
+* {{TAG|ML_ICRITERIA}} = 2: Update the criteria using the moving average of all previous Bayesian errors. This method gives the average of the errors of all previous predictions (i.e. all previously considered MD steps), while {{TAG|ML_ICRITERIA}} = 1 gives only the average of the predictions immediately following the retraining. The length of the history in this mode is currently hard-coded and set to 400 steps (or {{TAG|ML_MHIS}} x 50 in the newer version). This mode tends to continue sampling, and is therefore somewhat prone to oversampling: as Bayesian errors decrease, the threshold is steadily lowered and additional first-principles computations are initiated. The recommended values for {{TAG|ML_CX}} in this mode are approximately 0.1 to 0.3. = 0,2, a first-principles calculation is typically performed every 50 steps. This means that if the number of ionic steps is, say, {{TAG|NSW}}=50,000, then about 1,000 first-principles calculations should be performed. For many materials, this results in a reasonably good and robust ML database.
+*{{TAG|ML_ICRITERIA}}=3: This mode is the default for reselecting local reference configurations from an existing {{TAG|ML_AB}} file ({{TAG|ML_MODE}} = ''SELECT''). The {{FILE|ML_AB}} file shall contain a {{TAG|ML_CTIFOR}} for each structure stored in the {{FILE|ML_AB}} file. These values are used by {{VASP}} as Bayesian error thresholds for structure selection. This also means that the tags {{TAG|ML_CTIFOR}}, {{TAG|ML_CX}}, {{TAG|ML_CSLOPE}}, {{TAG|ML_CSIG}} and {{TAG|ML_MHIS}} set in {{FILE|INCAR}} are ignored. This mode is only available when {{TAG|ML_MODE}}=''SELECT'' is activated. It is important that the {{FILE|ML_AB}} file contains a {{TAG|ML_CTIFOR}} value for each structure included. Otherwise, {{VASP}} will throw an error and will also indicate to the user that some {{TAG|ML_CTIFOR}} values are missing from the {{FILE|ML_AB}} file.
-As already hinted above, the tag {{TAG|ML_CX}} allows to fine tune the update of {{TAG|ML_CTIFOR}}.
+As mentioned above, the {{TAG|ML_CX}} tag can be used to fine-tune the update of {{TAG|ML_CTIFOR}}.
-Whether to use  {{TAG|ML_ICRITERIA}} = 1 or {{TAG|ML_ICRITERIA}} = 2, is a matter of taste. Just recall that {{TAG|ML_CX}} must be set differently for both modes.  Whereas a good default for {{TAG|ML_ICRITERIA}} = 1 is {{TAG|ML_CX}} = 0.0, a sensible default for {{TAG|ML_ICRITERIA}} = 2 is {{TAG|ML_CX}} = 0.2.
+The fact that the {{TAG|ML_ICRITERIA}} = 1 or {{TAG|ML_ICRITERIA}} = 2 is a matter of taste. Just remember that {{TAG|ML_CX}} must be set differently in both modes.  While {{TAG|ML_ICRITERIA}} = 1, the {{TAG|ML_CX}} = 0.0, {{TAG|ML_ICRITERIA}} = 2, {{TAG|ML_CX}} = 0.2 is a good default.
-Most of our force-fields have been generated using {{TAG|ML_ICRITERIA}} = 1, but this mode sometimes stagnates and stops performing first principles calculations.
+Most of our force fields use {{TAG|ML_ICRITERIA}} = 1, but this mode sometimes stagnates and stops the first principle calculations.
-On the other hand and as already mentioned, {{TAG|ML_ICRITERIA}} = 2 tends to over-sample, that is, it can  perform too many first principles calculations.
+On the other hand, and as already mentioned, using {{TAG|ML_ICRITERIA}} = 2 is prone to oversampling, i.e. it may perform too many first principle calculations.
 == Related tags and articles ==