ML AB: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This file is used within the machine learning force field method. It contains the ab initio data from previous calculations: Bravais matrices, atom positions, energies, forces, and stress tensors (the charge is also written out but only optionally used). It is used for continuation runs ({{TAG|ML_ISTART}}=1 or {{TAG|ML_ISTART}}=2). The updated data is written to {{TAG|ML_ABN}}. Essentially the {{TAG|ML_AB}} and the {{TAG|ML_ABN}} files are the same and for continuation runs the {{TAG|ML_ABN}} file is just copied to {{TAG|ML_AB}}.
{{DISPLAYTITLE:ML_AB}}
This file is used as input (with file name {{FILE|ML_AB}}) and output ({{FILE|ML_ABN}}) within the machine learning force field method. It contains the collection of ab initio data from previous calculations: Bravais matrices, atom positions, energies, forces, and stress tensors. Depending on the mode of operation it is used in the following ways:


Here is a sample output how this file should look like:
* '''{{TAGO|ML_MODE|train}}, starting from scratch:''' A complete {{FILE|ML_ABN}} file containing all ab initio data and the list of current local reference configurations is written whenever a learning step is performed (check the line <code>STATUS</code> in the log file {{FILE|ML_LOGFILE}} for entries <code>learning</code> and <code>critical</code>).
* '''{{TAGO|ML_MODE|train}}, continuation run:''' Same {{FILE|ML_ABN}} output as above. In addition, upon start-up, the user-provided {{FILE|ML_AB}} file is read and an initial machine-learned force field is generated from the contained data.
* '''{{TAGO|ML_MODE|select}}, reselection of local reference configurations:''' Same {{FILE|ML_ABN}} output as for {{TAGO|ML_MODE|train}}. The {{FILE|ML_AB}} file is read and the contained structures are fed sequentially to the on-the-fly training algorithm. The list of local reference configurations in the {{FILE|ML_AB}} file is ignored, however, a dummy section must still be present (see below).
{{NB|tip|The {{FILE|ML_AB}} file is not required for {{TAGO|ML_MODE|run}} (prediction only) because all necessary data (e.g. descriptors of local reference configurations) are already stored in the {{FILE|ML_FF}} file.}}
An {{FILE|ML_ABN}} output file from {{TAGO|ML_MODE|train, select}} can always be reused as input for {{TAGO|ML_MODE|train, select}} by just renaming (copying) it to {{FILE|ML_AB}}.
 
== Example ==
 
As an example, here is a shortened version of an actual {{FILE|ML_AB}} file:
<pre>
<pre>
  1.0 Version
  1.0 Version
Line 126: Line 135:
     N      8
     N      8
     H      48
     H      48
==================================================
    CTIFOR
--------------------------------------------------
  7.2153124269575984E-003
==================================================
==================================================
     Primitive lattice vectors (ang.)
     Primitive lattice vectors (ang.)
Line 133: Line 146:
   0.000000000000000E+000  0.000000000000000E+000  12.6322002000000
   0.000000000000000E+000  0.000000000000000E+000  12.6322002000000
==================================================
==================================================
     Primitive lattice vectors (ang.)
     Atomic positions (ang.)
--------------------------------------------------
--------------------------------------------------
   3.53104385888580        2.84086367297985        2.90622172474177
   3.53104385888580        2.84086367297985        2.90622172474177
Line 145: Line 158:
   10.2580847101708        12.3062955711284        3.18366035907868
   10.2580847101708        12.3062955711284        3.18366035907868
   3.82895321819843        12.3181255490181        2.42031967883849
   3.82895321819843        12.3181255490181        2.42031967883849
  10.1158676534974        5.94488260102727        2.75661681864481
...
  3.42933902087078        6.02488805917889        2.84745157268693
...
  9.18922717253027        12.1136591064009        9.46054840697861
...
  4.09386338320418        11.6115721146332        9.19741414579098
  10.8706574778817        5.69143379206153        9.19302841652753
  3.28837848479649        5.54175742799501        9.53413058541621
  0.296886340278756        8.76547030178476        2.14101850168883
  6.59684935748724        8.79194887882670        3.82650473255966
  0.428245756493863        2.59798612899847        3.75100434123024
  6.70544911563391        2.85984511315708        2.37579642412770
  0.167087945617848        9.78691660807122        9.55763154626161
  6.34369358757671        8.38277722914886        8.74186174907648
  0.559608821523150        2.07775553422575        9.20173753351574
  6.87941885045959        3.56783791109442        9.82839336125391
  3.07405022896748        3.09908590962395        12.5451964605531
  10.9067521899973        2.28046049576184      0.104457094470827
  4.17655919205017        9.00052367316014        12.4950366804108
  9.57381442172574        9.63734691392405        12.5593223382356
  4.06122044369206        2.81843048618285        6.28132394650503
  10.0324883908695        2.67194263966303        6.42281604385187
  2.79804490379300        8.38461292282242        5.91346560757258
  9.74338219847610        9.86112720347275        6.30931718288953
  1.02263147830728        11.8274672052164        12.2743957556041
  6.71436142573495      0.386098550930723        12.0927367222414
  0.991655629415002        6.29167324150435      2.378913781076451E-002
  7.14212415871260        6.38847353848768      0.166929245025766
  1.09263314243103        11.6784394398712        6.00501298083308
  6.81239222852508      7.720564358957013E-003  5.81972148231710
  12.2736872563375        6.43282169866644        5.66579243537609
  7.03317153345761        5.14732535903573        5.99477016290181
  0.858105285808099      0.283147091616275      0.675081476339173
  6.64382772328535        11.9135358226671      0.486324292315528
  12.1290965852444        6.41236307456918        12.4661996813288
  5.82925852098534        5.86599668592475        12.2883714127747
  12.6153725097845        12.5337698516607        6.56653663546619
  6.35103756069245        11.3929376788950        6.53562500033421
  0.778482961295645        5.56159346864120        6.14172445516157
  5.73903474890177        5.82181169206098        6.35219235502436
  1.39231326814732        10.9212146326541      0.149268424872883
  3.939326025945113E-002  11.6437361612826        11.8110946376369
  1.74935186413609        12.1799107044156        11.5239603745818
  0.219251785556775        12.5863663388017        1.46495273189263
  1.79629345326380      0.550477216917984        1.10264446811277
  0.433027134776968        1.16136170079748      0.247301477080943
  7.47084014873485        1.11657695587760        12.4232258809939
  7.00037760733078        12.5578893937650        11.1284921930181
  5.71954930570457      0.856496283120137        12.0251567066103
  7.61085990373643        11.5168018815484      0.690630688278504
  6.23853627489008        12.2728981918211        1.40342304304974
  6.02740643486776        11.1126016632056      0.149460054228370
  1.20781180142941        6.38205822816568        1.10090963496361
  1.30703456185503        5.30675633519347        12.2743651983594
  1.47630627851831        7.10433680255390        12.0901416833158
  11.7621904754867        7.32187511263496      0.249279535632534
  11.8726032953672        6.39920054504097        11.4325117583012
  11.6159294130364        5.60757643872615      0.307208116835027
  6.96316067118499        6.86949591238280        1.14246554953540
  7.83304201553071        5.53589266661168      0.271337509118816
  7.52808549088313        7.11797891936522        12.0681382077556
  5.97661916021294        5.15415784597902        11.5100009830914
  5.27799010316068        5.38983062639406      0.433243790796200
  5.22843658841443        6.65494954981429        11.8991224872664
  1.83172457256604        12.3416476290819        5.52649457958814
  1.54972896640659        11.1181400817720        6.83705292263959
  0.648621850029045        10.9903637641685        5.26684073441985
  0.369786667605785      0.594001759550438        7.29113473843099
  12.1416059949500      0.477902130144820        5.79947837064481
  11.8694779019449        11.9372006513511        7.03844245269026
  6.01989988334354      0.312232402782980        5.11664974019429
  6.99096598595716      0.792723052203353        6.57274547477411
  7.74224555756176        12.3874410373418        5.27964783721995
  6.00339684612195        10.6587719270935        5.84683130722452
  7.14282215559495        10.9600103940183        7.10141971883287
  5.55376703796905        11.6135066184794        7.20693512529441
  11.9406974346555        7.05504330939053        6.51257443020246
  2.714098707952951E-002  7.06120520390067        4.84203332376256
  11.4583680636654        5.77759486211653        5.31733197106969
  1.66376098131111        6.13239021727701        6.29908869054496
  0.531478524113899        5.08316854283047        7.06072975689720
  1.00270039770073        4.80209855904143        5.42909365005565
  7.64349842575924        5.86116484985319        5.41769547730595
  7.54190860984904        4.86396863337472        6.93075006543724
  6.79897064568669        4.25518530494504        5.39084316330668
  5.45647449699007        6.52847427737476        5.60715687601479
  4.94996304461365        5.11235831149750        6.44446782056207
  5.82075594460117        6.34049825639383        7.27914125592649
==================================================
==================================================
     Total energy (eV)
     Total energy (eV)
Line 248: Line 178:
   1.081052495208487E-002 -0.454162570762754      -2.885905409516716E-002
   1.081052495208487E-002 -0.454162570762754      -2.885905409516716E-002
   5.233785861238309E-002 -4.907001101287316E-002  0.357709899123724
   5.233785861238309E-002 -4.907001101287316E-002  0.357709899123724
  -0.230224912339896      -0.117150750780048      -0.139616094943659
  ...
  0.102540834222563      -0.189192332090928      9.187250072345658E-002
  ...
-0.126562656993933      0.628538272647399      -0.196286321056689
  ...
  6.400912781516797E-003  0.744239334702281      -6.458723081110410E-002
-5.767073532084405E-002  0.228774953892571      -1.163578545057205E-002
-0.173831332474762      -0.172806125471746      -0.265324184988485
  1.606320476476653E-002  0.660725518682814      0.586717861519040
-2.231576570645055E-002 -1.560116870238250E-002 -1.805735858026897E-002
  0.277268049005576      0.104449846788358      7.874861907744674E-002
  5.093357421225891E-002 -3.720669225924848E-002  8.757906194611484E-002
  0.347770128512301      -6.949672170195867E-002 -0.165017281025361
-1.269724232927270E-002 -9.689722526238803E-002  0.122560372172203
  0.574529900215143      -0.101828666147038      7.727101789060012E-003
-7.991533025139014E-002 -0.370056241375952      -0.161435958643188
  0.144376629787889      -8.956373239692209E-002 -6.544695680537893E-003
-0.400712836144330      0.625208458752742      -0.171877996812729
-1.962948749927612E-003  9.197469371301489E-003 -0.452887176022089
-3.715853555351770E-002 -0.173215539978128      0.332659777368484
-0.144627793865140      -0.187179133803005      0.107892570036324
-0.160253556891691      1.361217397710326E-002 -0.246257522509988
  0.147313626899787      0.158811221976471      0.654642200063784
  0.245982292413043      4.495057557062357E-002 -0.172033604940800
  1.602619336194822E-002  8.940426700782211E-002  4.061537314007424E-002
-3.972686150744753E-003  2.478948137657966E-002 -0.150985239025978
-3.911336919224873E-002 -0.223506340362864      -8.407378751987249E-002
  7.952842953777761E-002 -6.428570303620873E-003 -5.915601499825128E-002
-0.108683502802389      -2.302055674537051E-002  1.765387261678253E-002
  2.267177236257645E-002 -1.092894411794276E-003 -8.732975406640264E-002
-2.098288054081188E-002  4.989421611240275E-002  9.353814616424660E-002
  0.118098620110549      8.187916713641386E-003  3.656823747240168E-002
  9.370910630657611E-002  0.208342417209851      0.565248897185937
  0.267079688232096      -0.635114231083110      0.642873515340547
-0.817269057946005      -8.563202425841476E-002 -0.115036787077222
-0.823414722211538      -0.311977297712063      1.602079170174520E-002
-0.712711490118085      0.145541437665101      -0.261711591127002
-0.163832340621356      -0.600419571641310      1.897748175097558E-002
  0.502339157552856      -0.954639324352480      0.438566129025636
-0.733917210494281      0.692432617735001      -0.124020907331446
-0.179177052816627      0.336358142948464      -0.266590593238711
  0.429386892960721      0.100505339266029      0.259282259520863
-0.309972938504256      -0.137675637286257      0.253909602486682
  0.647686654001155      0.497060905621999      -0.947304089068558
-0.938500364048072      -0.435288124795083      -0.377035602843034
  0.191807545012799      -0.840984103180574      0.540112639284247
-0.358101325754663      -0.274123389441717      -0.172244311711192
-0.138125380866680      0.200340421917116      0.462689344137429
  0.378464099600265      -0.143521062252049      0.118651189300695
-0.937055982021507      0.219148676843214      -0.410077569488035
  0.250411992246748      -0.403749792486203      -1.02306843982497
  0.614209418389904      0.781763931304182      0.497075299075630
-7.052131973808007E-002 -0.134395658412516      -0.625282520752103
-0.171136244885699      0.442365498861689      9.031855581644366E-002
  -0.172681513960009      -0.318875802095300      0.287301141211352
  0.374410416112840      -0.956235768527160      -0.310638532600879
  8.059798362741355E-002 -0.180943976917728      0.772223434493475
  0.533360186265331        1.05125252345256      -0.527874869139172
  0.128728118145883      -0.154882692292142      -0.384641613788981
-0.299138550651608      0.354715067927667      2.577432037699579E-002
-0.132818810452223      -0.285936716858765      0.311358498598005
-7.303989129144473E-002  0.520269225771285      0.577131663286597
  0.563093374511039      0.567300131480568      -0.965330423659155
  0.563541406640260      -0.813823565323585      0.565027107920914
-0.286555856081272      -0.235045427102016      0.164788445010825
-0.207487589540055      0.233718068708780      -0.379010969096529
  0.205013411671021      0.234686167300782      0.297283983626369
-0.441723372993105      -0.408131383768703      -0.419842346381472
  0.443551792439487      -0.560261847190817      0.904620757344219
  0.830023666442679      0.731507062178048      -0.589697080427258
  0.322861509902754      -5.775184854104464E-002  0.294944575700402
-2.288662392739194E-002 -0.297057632221112      -0.343365983697753
-0.324770558686271      0.132355149522852      0.196751217295306
  0.600539261544107      0.621937407783978      0.756384010497966
-0.977989248441274      0.379893549244113      -0.646092160273432
  0.646686430512128      -0.329065562364926      -0.359231009354962
  0.162733472498922      -0.236669896097235      -0.346842379000060
-0.126041720282131      -0.290232445613701      0.279895665961673
  0.375133729061458      0.281223973696193      0.224595987021582
-0.818441966993585      -0.336796863407170      -0.184554781009615
  0.218179773727040      0.523160572910113      -1.05750572256536
-0.301298005761141      0.685177042094319      0.725604212549017
-0.191421327844714      -0.301791749752562      0.232815392507705
-0.210336488468268      4.669582926755100E-002 -0.424875253104636
  0.158608839394937      0.369479135587538      0.212208315938015
  0.275039999794429      -0.769356293594466      0.949276743918999
  0.740755625065134      0.488181150231506      -9.011843825240384E-002
  -4.530946267273289E-002 -0.186973791838126      -0.887267373770487
==================================================
==================================================
     Stress (kbar)
     Stress (kbar)
Line 352: Line 199:
</pre>
</pre>


*All element type dependent information is limited to 3 entries per line. For more than 3 types or multiples of 3 the entries are written over multiple lines.
== General format remarks ==
*All element dependent quantities must follow the order of the element entries of the line <code>The atom types in the data file</code>.
 
{{NB|important|All element-dependent quantities must follow the order of the element entries given in the header entry named <code>The atom types in the data file</code>.}}
*All element-type-dependent information is limited to 3 entries per line. For more than 3 types or multiples of 3, the entries are written over multiple lines.
*The order of the entries for the header and also the data is fixed.  
*The order of the entries for the header and also the data is fixed.  
*The ledger lines cannot be omitted. "*****" and "-----" lines for the header. *****", "-----" and "=====" lines for the data.
*The ledger lines cannot be omitted. <code>*****</code> and <code>-----</code> lines for the header. <code>*****</code>, <code>-----</code> and <code>=====</code> lines for the data.


== Header ==
== Header specification==
*<code>1.0 Version</code>: This entry refers to the version of the {{TAG|ML_AB}} file. It was introduced to be able to distinguish versions if possibly changes occur in this file that influence I/O compatibility. If not stated otherwise use <code>1.0 Version</code>.  
 
*<code>The number of configurations</code>: Total number of training configurations.
*<code>1.0 Version</code>: In the very beginning of the header this entry specifies to the version of the {{FILE|ML_AB}} file. If in the future the contents of the file will be changed or extended the version number will ensure I/O compatibility. If not stated otherwise use <code>1.0 Version</code>.  
*<code>The maximum number of atom type</code>: Union of the types of all configurations.
*<code>The number of configurations</code>: Total number of training structures stored in this {{FILE|ML_AB}} file.
*<code>The atom types in the data file</code>: Listing of all atom types (two characters for each type as in VASP) appearing in all structures. Multiple lines for more than 3 element types. Maximum 3 entries per line.  
*<code>The maximum number of atom type</code>: Total number of unique types listed in all structures (e.g. if the file contains some ab initio data for H<sub>2</sub>O, some data for MgO and some data for NaCl, then the total number of types is 5).
*<code>The atom types in the data file</code>: Listing of all atom types (two characters for each type as in {{VASP}}) appearing in all structures. Multiple lines for more than 3 element types. Maximum 3 entries per line.  
*<code>The maximum number of atoms per system</code>: The largest number of atoms within one structure among all training structures.
*<code>The maximum number of atoms per system</code>: The largest number of atoms within one structure among all training structures.
*<code>The maximum number of atoms per atom type</code>: The largest number of atoms per element within one structure among all elements within all training structures.
*<code>The maximum number of atoms per atom type</code>: The largest number of atoms per element within one structure among all elements within all training structures.
*<code>Reference atomic energy (eV)</code>: Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for <code>ML_ISCALE_TOTEN=1</code>.
*<code>Reference atomic energy (eV)</code>: Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for {{TAGO|ML_ISCALE_TOTEN|1}}.
*<code>Atomic mass</code>: Atomic mass of each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>Atomic mass</code>: Atomic mass of each element type (in u). Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>The numbers of basis sets per atom type</code>: Number of local reference configurations for each type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>The numbers of basis sets per atom type</code>: Number of local reference configurations for each type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>Basis set for X</code>: List of local reference configurations for each type. This line is followed by a block with two columns. The first column shows from which training structure the local reference configuration is taken. The second column shows the number of the atom in that training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field.
*<code>Basis set for X</code>: List of local reference configurations for each type. This line is followed by a block with two columns. The first column denotes from which training structure the local reference configuration is taken. The second column is the index of the atom in the given training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field. For {{TAGO|ML_MODE|select}} this section is ignored and a new list of local reference configurations will be written to {{FILE|ML_ABN}}. However, upon reading in the {{FILE|ML_AB}} file a dummy line (e.g. only one line with <code>1 1</code>) for each type still needs to be present (also set <code>The numbers of basis sets per atom type</code> to 1 in this case).
{{NB|warning|The maximum number of the training structures {{TAGO|ML_MCONF}} and the maximum number of the local reference configurations {{TAGO|ML_MB}} in the {{FILE|INCAR}} file have to be set larger than the entries <code>The number of configurations</code> and <code>The numbers of basis sets per atom type</code> in the {{FILE|ML_AB}} file, respectively.}}


== Training structure data ==
== Training structure data format ==
*<code>Configuration num.      n</code>: The data is stored for each configuration of the training data. The training structures have to be numbered consecutively starting with 1.
 
*<code>System name</code>: Name of the structure. The length of the system names are limited to 40 characters (same as for the structure names in the [[POSCAR]] file).
*<code>Configuration num.      n</code>: Denotes the beginning of a structure in the training data. Training structures have to be numbered consecutively starting with 1.
*<code>The number of atom types</code>: The number of atom types in the structure. This has to be at least a subset of element types of <code>The atom types in the data file</code> in the header and can maximally have all element types of the header.
*<code>System name</code>: Name of the structure, taken from the {{FILE|POSCAR}} file which was used to start the {{TAGO|ML_MODE|train}} run. Copied from the input {{FILE|ML_AB}} file in case of {{TAGO|ML_MODE|select}}. The length of system names is limited to 40 characters.
*<code>The number of atom types</code>: The number of atom types in the structure. Because the list of types in this structure has to be a subset of all types appearing in the {{FILE|ML_AB}} this number must be smaller or equal to the number given in the header section <code>The atom types in the data file</code>.
*<code>The number of atoms</code>: Number of atoms in the structure.
*<code>The number of atoms</code>: Number of atoms in the structure.
*<code>Atom types and atom numbers</code>: Atom types and number of atoms per type in the structure. Each type is written on a separate line.
*<code>Atom types and atom numbers</code>: Atom types and the number of atoms per type in the structure. Each type is written on a separate line.
*Optional <code>CTIFOR</code>: Value of {{TAG|ML_CTIFOR}} used for the sampling of the structure. This line is optional and may not occur in your file. It is important, that either none of the training structures contain this entry or all of them contain it. It is not permitted to have mixed entries.  
*<code>CTIFOR</code> (''optional''): Value of {{TAGO|ML_CTIFOR}} used while sampling this structure. Depending on {{TAGO|ML_ICRITERIA}} the value may change between structures. This line is always present if the {{FILE|ML_ABN}} file was created by {{VASP}} with {{TAGO|ML_MODE|train}}. Then, also continuation and re-selection runs with {{TAGO|ML_MODE|train, select}} will write out current <code>CTIFOR</code> values in {{FILE|ML_ABN}} files. On the other hand, if {{FILE|ML_AB}} files are created from external training data this section may be omitted. In this case {{TAGO|ML_MODE|train, select}} runs will also not include <code>CTIFOR</code> sections. {{NB|warning|Training structures with a value for <code>CTIFOR</code> and without must not be combined. Either <code>CTIFOR</code> is provided for all structures or none of them.|:}}
*<code>Primitive lattice vectors (ang.)</code>: Bravais matrix of the structure. The units are in Angstrom.
*<code>Primitive lattice vectors (ang.)</code>: Bravais matrix of the structure, one line corresponds to one lattice vector. The unit of length units is Angstrom.
*<code>Wycoff positions (Cartesian)</code>: Ionic positions in Cartesian coordinates. The units are in Angstrom.
*<code>Atomic positions (ang.)</code>: Ionic positions in Cartesian coordinates (given in Angstrom). Note that the order of atoms needs to correspond to the atom types list in <code>Atom types and atom numbers</code>.
*<code>Total energy (eV)</code>: Total energy (in eV) of the structure.
*<code>Total energy (eV)</code>: Total energy (in eV) of the structure.
*<code>Forces (eV ang.^-1)</code>: Forces (in eV/Angstrom) for each atom in the structure.
*<code>Forces (eV ang.^-1)</code>: Forces (in eV/Angstrom) for each atom in the structure.
*<code>Stress (kbar)</code>: 6 entries for the stress tensor (in kB) of the structure.
*<code>Stress (kbar)</code>: 6 entries for the stress tensor (in kb) of the structure.


== Merging different ML_AB files ==
== Merging different ML_AB files ==


*The training structure data can be simply concatenated, but the numbering of the structures needs to be renewed, so that it goes from 1 to the new maximum number of structures seamlessly.
Multiple {{FILE|ML_AB}} files may be merged by hand, keeping the following restrictions and tips in mind:
*We strongly advise to group structures with the same number of elements and atoms per element in the training data together, otherwise the code will automatically reorder the data, such that those are sticking together. This makes problems in the <code>diff</code> of an {{TAG|ML_AB}} file and it's corresponding {{TAG|ML_ABN}} file.
*The training structure data can be simply concatenated, i.e., by just adding more structure sections starting with <code>Configuration num.      n</code> at the end of the file. However, the structure numbering needs to be updated in such a way that they are enumerated continuously starting from 1.
*Adjust the header if needed (element types, maximum number of atoms, maximum number of atoms per element type, etc.).
*We strongly advise to group structures with the same number of elements and atoms per element in the training data together, otherwise the code will automatically reorder the data, such that those are sticking together. If one relies on the automatic reordering it will not be possible to easily "diff" the input {{FILE|ML_AB}} file and its corresponding {{FILE|ML_ABN}} output file.
*The local reference configurations need to be recalculated, since they were only calculated for separate structures. To do this first set <code>The numbers of basis sets per atom type</code> to 1 for each species. Then also set the block <code>Basis set for X</code> with dummy value <code> 1   1</code> for each species. After that run the code using {{TAG|ML_ISTART}}=3. This will select new local reference configurations on the scratch for the new combined training data. If calculations for {{TAG|ML_ISTART}}=3 are too time consuming using the default settings, it is useful to increase {{TAG|ML_MCONF_NEW}} to values around 10-16 and set {{TAG|ML_CDOUB}}=4. This often accelerates the calculations a factor 2-4.
*The header must be adjusted to reflect the combined number of element types, the maximum number of atoms, etc.  
 
*The lists of local reference configurations cannot be easily merged (renumbering would be required). Instead, it is recommended to recalculate them using {{TAGO|ML_MODE|select}}. However, to start with a valid {{FILE|ML_AB}} file first manually set <code>The numbers of basis sets per atom type</code> to 1 for each species. Also, set the block <code>Basis set for X</code> with dummy value <code>1 1</code> for each species. After running with {{TAGO|ML_MODE|select}} the output {{FILE|ML_ABN}} will contain the selected new local reference configurations for the combined training data. {{NB|tip|If calculations for {{TAGO|ML_MODE|select}} are too time consuming using the default settings it is useful to increase {{TAGO|ML_MCONF_NEW}} to values around 10-16 and set {{TAGO|ML_CDOUB|4}}. This often accelerates the calculations by a factor of 2-4.|:}}


'''Important''': The maximum size of the training structures {{TAG|ML_MCONF}} and the maximum size for the local configurations {{TAG|ML_MB}} in the {{TAG|INCAR}} file have to be set larger than the entries ''The number of configurations'' and ''The numbers of basis sets per atom type'' in the {{TAG|ML_AB}} file.
----
----
[[Category:Files]][[Category:Machine-learned force fields]][[Category:Input files]]
[[Category:Files]][[Category:Machine-learned force fields]][[Category:Input files]]

Latest revision as of 08:32, 20 October 2023

This file is used as input (with file name ML_AB) and output (ML_ABN) within the machine learning force field method. It contains the collection of ab initio data from previous calculations: Bravais matrices, atom positions, energies, forces, and stress tensors. Depending on the mode of operation it is used in the following ways:

  • ML_MODE = train, starting from scratch: A complete ML_ABN file containing all ab initio data and the list of current local reference configurations is written whenever a learning step is performed (check the line STATUS in the log file ML_LOGFILE for entries learning and critical).
  • ML_MODE = train, continuation run: Same ML_ABN output as above. In addition, upon start-up, the user-provided ML_AB file is read and an initial machine-learned force field is generated from the contained data.
  • ML_MODE = select, reselection of local reference configurations: Same ML_ABN output as for ML_MODE = train. The ML_AB file is read and the contained structures are fed sequentially to the on-the-fly training algorithm. The list of local reference configurations in the ML_AB file is ignored, however, a dummy section must still be present (see below).
Tip: The ML_AB file is not required for ML_MODE = run (prediction only) because all necessary data (e.g. descriptors of local reference configurations) are already stored in the ML_FF file.

An ML_ABN output file from ML_MODE = train, select can always be reused as input for ML_MODE = train, select by just renaming (copying) it to ML_AB.

Example

As an example, here is a shortened version of an actual ML_AB file:

 1.0 Version
**************************************************
     The number of configurations
--------------------------------------------------
        299
**************************************************
     The maximum number of atom type
--------------------------------------------------
       5
**************************************************
     The atom types in the data file
--------------------------------------------------
     Pb I  C
     N  H
**************************************************
     The maximum number of atoms per system
--------------------------------------------------
             96
**************************************************
     The maximum number of atoms per atom type
--------------------------------------------------
             48
**************************************************
     Reference atomic energy (eV)
--------------------------------------------------
  -72.5297190000000       -35.4081430000000       -2.39269120000000
  -4.60003440000000       -1.12020270000000
**************************************************
     Atomic mass
--------------------------------------------------
   20.0000000000000        20.0000000000000        12.0110000000000
   14.0010000000000        8.00000000000000
**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       130  1202   128
       125   790
**************************************************
     Basis set for Pb
--------------------------------------------------
          1      1
        100      8
          1      3
        100      4
          1      5
          1      6
 ...
 ...
 ...
**************************************************
     Basis set for I
--------------------------------------------------
          1      9
          1     10
        100     32
        100     31
          1     13
        100     29
          1     15
          1     16
 ...
 ...
 ...
**************************************************
     Basis set for C
--------------------------------------------------
        100     39
        101     40
        104     40
        101     39
        101     38
        108     40
        101     37
 ...
 ...
 ...
**************************************************
     Basis set for N
--------------------------------------------------
          1     41
        100     47
          1     43
          1     44
        100     45
          1     46
 ...
 ...
 ...
**************************************************
     Basis set for H
--------------------------------------------------
        101     96
        108     96
        101     95
        101     94
        108     95
        101     93
        101     92
 ...
 ...
 ...
**************************************************
     Configuration num.      1
==================================================
     System name
--------------------------------------------------
     Optimal
==================================================
     The number of atom types
--------------------------------------------------
       5
==================================================
     The number of atoms
--------------------------------------------------
         96
**************************************************
     Atom types and atom numbers
--------------------------------------------------
     Pb      8
     I      24
     C       8
     N       8
     H      48
==================================================
     CTIFOR
--------------------------------------------------
   7.2153124269575984E-003
==================================================
     Primitive lattice vectors (ang.)
--------------------------------------------------
   12.6230002000000       0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000   12.6230002000000       0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000   12.6322002000000
==================================================
     Atomic positions (ang.)
--------------------------------------------------
   3.53104385888580        2.84086367297985        2.90622172474177
   9.81419124013876        2.65432768009571        3.05638374363947
   3.26003769786731        9.08189602171279        2.78238128942769
   9.68338433877730        9.01798419847282        3.33422943250601
   3.97567522985842        2.30549969401587        9.43194287333753
   10.2367187113626        2.60925731212548        9.47119538915201
   3.14970369394084        8.58643640964228        9.24921780934012
   9.89456550951183        9.28033187172892        9.29623786496524
   10.2580847101708        12.3062955711284        3.18366035907868
   3.82895321819843        12.3181255490181        2.42031967883849
 ...
 ...
 ...
==================================================
     Total energy (eV)
--------------------------------------------------
  -1844.06244866897
==================================================
     Forces (eV ang.^-1)
--------------------------------------------------
  2.660349497586850E-002 -4.547882666592111E-003  0.190783123263071
  0.130884508367191       0.299290099652476       1.596358887670635E-002
  3.408685056302496E-002 -4.091615555857331E-002  0.178271772476586
 -8.681206662816165E-002 -2.646077052932483E-002 -0.627496783708147
 -2.387963973365542E-002  0.272206550808848      -0.188554040851596
 -0.349175317569579       0.372666466514608       9.810640873955712E-002
  0.508292852334109       2.851700722091148E-002 -0.297636066674050
 -0.477466544993604      -0.767209034380190       0.537092981997701
  1.081052495208487E-002 -0.454162570762754      -2.885905409516716E-002
  5.233785861238309E-002 -4.907001101287316E-002  0.357709899123724
 ...
 ...
 ...
==================================================
     Stress (kbar)
--------------------------------------------------
     XX YY ZZ
--------------------------------------------------
  -12.6559383536223       -8.82753684858342       -13.1791695209263
--------------------------------------------------
     XY YZ ZX
--------------------------------------------------
  -1.91691819690402        2.12274173946129       0.103818583636094
**************************************************
     Configuration num.      2
==================================================
 ...
 ...
 ...

General format remarks

Important: All element-dependent quantities must follow the order of the element entries given in the header entry named The atom types in the data file.
  • All element-type-dependent information is limited to 3 entries per line. For more than 3 types or multiples of 3, the entries are written over multiple lines.
  • The order of the entries for the header and also the data is fixed.
  • The ledger lines cannot be omitted. ***** and ----- lines for the header. *****, ----- and ===== lines for the data.

Header specification

  • 1.0 Version: In the very beginning of the header this entry specifies to the version of the ML_AB file. If in the future the contents of the file will be changed or extended the version number will ensure I/O compatibility. If not stated otherwise use 1.0 Version.
  • The number of configurations: Total number of training structures stored in this ML_AB file.
  • The maximum number of atom type: Total number of unique types listed in all structures (e.g. if the file contains some ab initio data for H2O, some data for MgO and some data for NaCl, then the total number of types is 5).
  • The atom types in the data file: Listing of all atom types (two characters for each type as in VASP) appearing in all structures. Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • The maximum number of atoms per system: The largest number of atoms within one structure among all training structures.
  • The maximum number of atoms per atom type: The largest number of atoms per element within one structure among all elements within all training structures.
  • Reference atomic energy (eV): Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for ML_ISCALE_TOTEN = 1.
  • Atomic mass: Atomic mass of each element type (in u). Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • The numbers of basis sets per atom type: Number of local reference configurations for each type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • Basis set for X: List of local reference configurations for each type. This line is followed by a block with two columns. The first column denotes from which training structure the local reference configuration is taken. The second column is the index of the atom in the given training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field. For ML_MODE = select this section is ignored and a new list of local reference configurations will be written to ML_ABN. However, upon reading in the ML_AB file a dummy line (e.g. only one line with 1 1) for each type still needs to be present (also set The numbers of basis sets per atom type to 1 in this case).
Warning: The maximum number of the training structures ML_MCONF and the maximum number of the local reference configurations ML_MB in the INCAR file have to be set larger than the entries The number of configurations and The numbers of basis sets per atom type in the ML_AB file, respectively.

Training structure data format

  • Configuration num. n: Denotes the beginning of a structure in the training data. Training structures have to be numbered consecutively starting with 1.
  • System name: Name of the structure, taken from the POSCAR file which was used to start the ML_MODE = train run. Copied from the input ML_AB file in case of ML_MODE = select. The length of system names is limited to 40 characters.
  • The number of atom types: The number of atom types in the structure. Because the list of types in this structure has to be a subset of all types appearing in the ML_AB this number must be smaller or equal to the number given in the header section The atom types in the data file.
  • The number of atoms: Number of atoms in the structure.
  • Atom types and atom numbers: Atom types and the number of atoms per type in the structure. Each type is written on a separate line.
  • CTIFOR (optional): Value of ML_CTIFOR used while sampling this structure. Depending on ML_ICRITERIA the value may change between structures. This line is always present if the ML_ABN file was created by VASP with ML_MODE = train. Then, also continuation and re-selection runs with ML_MODE = train, select will write out current CTIFOR values in ML_ABN files. On the other hand, if ML_AB files are created from external training data this section may be omitted. In this case ML_MODE = train, select runs will also not include CTIFOR sections.
Warning: Training structures with a value for CTIFOR and without must not be combined. Either CTIFOR is provided for all structures or none of them.
  • Primitive lattice vectors (ang.): Bravais matrix of the structure, one line corresponds to one lattice vector. The unit of length units is Angstrom.
  • Atomic positions (ang.): Ionic positions in Cartesian coordinates (given in Angstrom). Note that the order of atoms needs to correspond to the atom types list in Atom types and atom numbers.
  • Total energy (eV): Total energy (in eV) of the structure.
  • Forces (eV ang.^-1): Forces (in eV/Angstrom) for each atom in the structure.
  • Stress (kbar): 6 entries for the stress tensor (in kb) of the structure.

Merging different ML_AB files

Multiple ML_AB files may be merged by hand, keeping the following restrictions and tips in mind:

  • The training structure data can be simply concatenated, i.e., by just adding more structure sections starting with Configuration num. n at the end of the file. However, the structure numbering needs to be updated in such a way that they are enumerated continuously starting from 1.
  • We strongly advise to group structures with the same number of elements and atoms per element in the training data together, otherwise the code will automatically reorder the data, such that those are sticking together. If one relies on the automatic reordering it will not be possible to easily "diff" the input ML_AB file and its corresponding ML_ABN output file.
  • The header must be adjusted to reflect the combined number of element types, the maximum number of atoms, etc.
  • The lists of local reference configurations cannot be easily merged (renumbering would be required). Instead, it is recommended to recalculate them using ML_MODE = select. However, to start with a valid ML_AB file first manually set The numbers of basis sets per atom type to 1 for each species. Also, set the block Basis set for X with dummy value 1 1 for each species. After running with ML_MODE = select the output ML_ABN will contain the selected new local reference configurations for the combined training data.
Tip: If calculations for ML_MODE = select are too time consuming using the default settings it is useful to increase ML_MCONF_NEW to values around 10-16 and set ML_CDOUB = 4. This often accelerates the calculations by a factor of 2-4.