Liquid Si - MLFF

From VASP Wiki

Task

Generating a machine learning force field for liquid Si.

Input

POSCAR

  • In this example we start from a 64 atom super cell of diamond-fcc Si (the same as in this example: Liquid Si - Standard MD:
Si cubic diamond 2x2x2 super cell of conventional cell
     5.43090000000000
    2.00000000   0.00000000   0.00000000
    0.00000000   2.00000000   0.00000000
    0.00000000   0.00000000   2.00000000
   Si
   64
Direct
   0.00000000   0.00000000   0.00000000
   0.50000000   0.00000000   0.00000000
   0.00000000   0.50000000   0.00000000
   0.50000000   0.50000000   0.00000000
   0.00000000   0.00000000   0.50000000
   0.50000000   0.00000000   0.50000000
   0.00000000   0.50000000   0.50000000
   0.50000000   0.50000000   0.50000000
   0.37500000   0.12500000   0.37500000
   0.87500000   0.12500000   0.37500000
   0.37500000   0.62500000   0.37500000
   0.87500000   0.62500000   0.37500000
   0.37500000   0.12500000   0.87500000
   0.87500000   0.12500000   0.87500000
   0.37500000   0.62500000   0.87500000
   0.87500000   0.62500000   0.87500000
   0.00000000   0.25000000   0.25000000
   0.50000000   0.25000000   0.25000000
   0.00000000   0.75000000   0.25000000
   0.50000000   0.75000000   0.25000000
   0.00000000   0.25000000   0.75000000
   0.50000000   0.25000000   0.75000000
   0.00000000   0.75000000   0.75000000
   0.50000000   0.75000000   0.75000000
   0.37500000   0.37500000   0.12500000
   0.87500000   0.37500000   0.12500000
   0.37500000   0.87500000   0.12500000
   0.87500000   0.87500000   0.12500000
   0.37500000   0.37500000   0.62500000
   0.87500000   0.37500000   0.62500000
   0.37500000   0.87500000   0.62500000
   0.87500000   0.87500000   0.62500000
   0.25000000   0.00000000   0.25000000
   0.75000000   0.00000000   0.25000000
   0.25000000   0.50000000   0.25000000
   0.75000000   0.50000000   0.25000000
   0.25000000   0.00000000   0.75000000
   0.75000000   0.00000000   0.75000000 
   0.25000000   0.50000000   0.75000000
   0.75000000   0.50000000   0.75000000
   0.12500000   0.12500000   0.12500000
   0.62500000   0.12500000   0.12500000
   0.12500000   0.62500000   0.12500000
   0.62500000   0.62500000   0.12500000
   0.12500000   0.12500000   0.62500000
   0.62500000   0.12500000   0.62500000
   0.12500000   0.62500000   0.62500000
   0.62500000   0.62500000   0.62500000
   0.25000000   0.25000000   0.00000000
   0.75000000   0.25000000   0.00000000
   0.25000000   0.75000000   0.00000000
   0.75000000   0.75000000   0.00000000
   0.25000000   0.25000000   0.50000000
   0.75000000   0.25000000   0.50000000
   0.25000000   0.75000000   0.50000000
   0.75000000   0.75000000   0.50000000
   0.12500000   0.37500000   0.37500000
   0.62500000   0.37500000   0.37500000
   0.12500000   0.87500000   0.37500000
   0.62500000   0.87500000   0.37500000
   0.12500000   0.37500000   0.87500000
   0.62500000   0.37500000   0.87500000
   0.12500000   0.87500000   0.87500000
   0.62500000   0.87500000   0.87500000

KPOINTS

  • We will start with a single k point in this example:
K-Points
 0
Gamma
 1  1  1
 0  0  0

INCAR

#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ALGO = FAST
ISYM = -1
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.

#Parallelization of ab initio calculations
NCORE = 2

#MD
IBRION = 0
MDALGO = 2
ISIF = 2
SMASS = 1.0
TEBEG = 2000
NSW = 10000
POTIM = 3.0
#Parameters for initialization of wavefunctions and charge density
INIWAV = 1
IWAVPR = 1
ISTART = 0
ICHARG = 0

#Machine learning paramters
ML_FF_LMLFF = .TRUE.
ML_FF_ISTART = 0
ML_FF_NWRITE = 2
ML_FF_EATOM = -.70128086E+00
  • The user should be familiar at this step how to run a basic molecular dynamics calculations. If not please go through the example here: Liquid Si - Standard MD.
  • Machine learning is switched on by setting the following tag: ML_FF_LMLFF=.TRUE..
  • By setting the tag ML_FF_ISTART to zero learning from scratch is selected.
  • The flag ML_FF_NWRITE=2 selects a more verbose output where the error on energies, forces and stress are output to the ML_LOGFILE file. This setting is very handy to check the accuracy of the force field.
  • The tag ML_FF_EATOM=-.70128086E+00 sets the atomic reference energy for each species. How to obtain that energy is explained below.
  • Since in the on the fly learning several hundreds or thousands of molecular dynamics steps can lie between two ab initio calculations the usual extrapolation of the wave functions and charge densities from the history of previous wave functions and charge densities IWAVPR=2 is not advisable, hence we will set IWAVPR=1 for which the charge densities are extrapolated from atomic charge densities.

Calculation

Reference energy

Before the force field for liquid Si can be calculated, the atomic energy of a single Si atom in a large enough box has to be calculated. For that the following steps have to be done:

  • Create a new directory Si_ATOM by typing mkdir Si_ATOM and go to that directory cd Si_ATOM.
  • Create an INCAR file with the following parameters:
#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ALGO = FAST
ISYM = 0
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.
ISPIN = 2
  • Create a POSCAR file with a single atom in a large enough box (the box should be orthorombic to have enough degrees of freedom for electronic relaxation):
Si atom
     1.00090000000000
    12.00000000   0.00000000   0.00000000
     0.00000000  12.01000000   0.00000000
     0.00000000   0.00000000  12.02000000
   Si
   1
Direct
   0.00000000   0.00000000   0.00000000
  • Create a KPOINTS file with a single k point (or copy the one delivered with this example):
test
0 0 0
Gamma
 1 1 1
 0 0 0
  • Copy the POTCAR from the previous directory to this directory (cp ../POTCAR).
  • Run the calculation and look at the total energy in the OUTCAR file (or OSZICAR). AFter that switch back to the original directory. That energy will be used for the ML_FF_EATOM tag. If multiple atom types are present in the structure than this step has to be repeated for each atom type separately and the reference energies are provided as a list after the ML_FF_EATOM tag, where the ordering of the energies corresponds to the ordering of the elements in the POTCAR file.

Creating the liquid structure

We will start this example from a perfect super cell of crystalline silicon (fcc diamond structure) containing 64 atoms. The temperature is set to 2000 K so that when an MD is run the melting should occur relatively fast. We will do the melting with on-the-fly learning. This will greatly accelerate the melting since after some time most of the ab initio calculations are skipped and the very fast force field takes over. This calculation will be executed for 10000 steps with a step size of 3 fs.

Please run now the calculation.

After running the calculation we should obtain a fairly good liquid in the CONTCAR file.

As a side effect we have also learned a force field, but with maybe a quite bad trajectory at the beginning. Usually it is more systematic to learn on the pure structures. In our case this means to use the CONTCAR file obtained after the melting. Also the force field was only learned using a single k point. To obtain a better accuracy we will use more k points. So after this step we will start the learning from scratch using the CONTCAR obtained so and using more accurate parameters.

But before we do that we will take a look at the accuracy of the force obtained now.

The main output files for the machine learning are:

  • ML_LOGFILE: This contains the main output of the machine learning. Since we use ML_FF_NWRITE=2 the error on energy, forces and stress of the force field compared to ab initio is also written out on this file for every step.
  • ML_ABNCAR: This contains the ab initio data used for the learning. It will be needed for continuation runs as ML_ABCAR.
  • ML_FFNCAR: This contains the regression results (weights, parameters, etc.). It will be needed for continuation runs as ML_FFCAR.

In the ML_LOGFILE we will look for the last entry on the error of energy, forces and stress, Bayesian error and spilling factor, which should look similar to this (since we run a molecular dynamics calculation in parallel on different computers actual results can deviate a little):

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.016994
             Error in force  (eV Angst^-1):     0.247664
                      Error in stress (kB):     5.368218
               Bayesian error (eV Angst-1):     0.064904    0.127195
                       Spilling factor (-):     0.000690    0.020000
====================================================================================================

The first entry of the Bayesian error is the estimated error from our model. The second entry is the error threshold. In our case it is newly determined during the calculations. The first and second entry for the spilling factor are the calculated spilling factor and the threshold, respectively. The threshold for the spilling factor is usually always kept constant during the calculations.

Next we will look at the accuracy of structural properties of the force field. For the we first run a 3000 fs molecular dynamics calculation with and without the force field starting from the new CONTCAR file. First we save that new CONTCAR file to

cp CONTCAR POSCAR.T2000_relaxed

Now do the following steps to run the force field calculation:

Now do the following steps to run the ab initio reference calculation:

  • Copy POSCAR.T2000_relaxed to POSCAR.
  • Change ML_FF_LMLFF=.TRUE. to {TAG|ML_FF_LMLFF}}=.FALSE.: This will turn off the machine learning completely.
  • Run caclulation.
  • Copy XDATCAR to XDATCAR.AI_3ps.

To analyze the pair correlation function use the PERL script pair_correlation_function.pl:

#!/usr/bin/perl

use strict;
use warnings;

#configuration for which ensemble average is to be calculated
my $confmin=1;            #starting index of configurations in XDATCAR file for pair correlation function
my $confmax=20000;           #last index of configurations in XDATCAR file for pair correlation function
my $confskip=1;           #stepsize for configuration loop
my $species_1 = 1;        #species 1 for which pair correlation function is going to be calculated
my $species_2 = 1;        #species 2 for which pair correlation function is going to be calculated
#setting radial grid 
my $rmin=0.0;             #minimal value of radial grid
my $rmax=10.0;            #maximum value of radial grid
my $nr=300;                #number of equidistant steps in radial grid
my $dr=($rmax-$rmin)/$nr; #stepsize in radial grid
my $tol=0.0000000001;     #tolerance limit for r->0 
 
my $z=0;                  #counter
my $numelem;              #number of elements
my @elements;             #number of atoms per element saved in list/array
my $lattscale;            #scaling factor for lattice
my @b;                    #Bravais matrix
my $nconf=0;              #number of configurations in XDATCAR file
my @cart;                 #Cartesian coordinates for each atom and configuration
my $atmin_1=0;            #first index of species one
my $atmax_1;              #last index of species one
my $atmin_2=0;            #first index of species two
my $atmax_2;              #last index of species two
my @vol;                  #volume of cell (determinant of Bravais matrix)
my $pi=4*atan2(1, 1);     #constant pi
my $natom=0;              #total number of atoms in cell
my @pcf;                  #pair correlation function (list/array)
my $mult_x=1;             #periodic repetition of cells in x dimension
my $mult_y=1;             #periodic repetition of cells in y dimension
my $mult_z=1;             #periodic repetition of cells in z dimension
my @cart_super;           #Cartesian cells over multiple cells
my @vec_len;              #Length of lattice vectors in 3 spatial coordinates
#my $ensemble_type="NpT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
my $ensemble_type="NVT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
my $av_vol=0;               #Average volume in cell

#reading in XDATCAR file
while (<>)
{
   chomp;
   $_=~s/^/ /;
   my @help=split(/[\t,\s]+/);
   $z++;
   if ($z==2)
   {
      $lattscale = $help[1];
   }
   if ($z==3)
   {
      $b[$nconf+1][1][1]=$help[1]*$lattscale;
      $b[$nconf+1][1][2]=$help[2]*$lattscale;
      $b[$nconf+1][1][3]=$help[3]*$lattscale;
   }
   if ($z==4)
   {
      $b[$nconf+1][2][1]=$help[1]*$lattscale;
      $b[$nconf+1][2][2]=$help[2]*$lattscale;
      $b[$nconf+1][2][3]=$help[3]*$lattscale;
   }
   if ($z==5)
   {
      $b[$nconf+1][3][1]=$help[1]*$lattscale;
      $b[$nconf+1][3][2]=$help[2]*$lattscale;
      $b[$nconf+1][3][3]=$help[3]*$lattscale;
   }
   if ($z==7)
   {
      if ($nconf==0)
      {
         $numelem=@help-1;
         for (my $i=1;$i<=$numelem;$i++)
         {
            $elements[$i]=$help[$i];
            $natom=$natom+$help[$i];
         }
      }
   }
   if ($_=~m/Direct/)
   {
      $nconf=$nconf+1;
      #for NVT ensemble only one Bravais matrix exists, so it has to be copied
      if ($ensemble_type eq "NVT")
      {
         for (my $i=1;$i<=3;$i++)
         { 
            for (my $j=1;$j<=3;$j++)
            {
               $b[$nconf][$i][$j]=$b[1][$i][$j];
            }
         }
      }
      for (my $i=1;$i<=$natom;$i++)
      {
         $_=<>;
         chomp;
         $_=~s/^/ /;
         my @helpat=split(/[\t,\s]+/);
         $cart[$nconf][$i][1]=$b[1][1][1]*$helpat[1]+$b[1][1][2]*$helpat[2]+$b[1][1][3]*$helpat[3];
         $cart[$nconf][$i][2]=$b[1][2][1]*$helpat[1]+$b[1][2][2]*$helpat[2]+$b[1][2][3]*$helpat[3];
         $cart[$nconf][$i][3]=$b[1][3][1]*$helpat[1]+$b[1][3][2]*$helpat[2]+$b[1][3][3]*$helpat[3];
      }
      if ($ensemble_type eq "NpT")
      {
         $z=0;
      }
   } 
   last if eof;
}

if ($confmin>$nconf)
{
   print "Error, confmin larger than number of configurations. Exiting...\n";
   exit;
}
if ($confmax>$nconf)
{
   $confmax=$nconf;
}
 
for (my $i=1;$i<=$nconf;$i++)
{
   #calculate lattice vector lengths
   $vec_len[$i][1]=($b[$i][1][1]*$b[$i][1][1]+$b[$i][1][2]*$b[$i][1][2]+$b[$i][1][3]*$b[$i][1][3])**0.5;
   $vec_len[$i][2]=($b[$i][2][1]*$b[$i][2][1]+$b[$i][2][2]*$b[$i][2][2]+$b[$i][2][3]*$b[$i][2][3])**0.5;
   $vec_len[$i][3]=($b[$i][3][1]*$b[$i][3][1]+$b[$i][3][2]*$b[$i][3][2]+$b[$i][3][3]*$b[$i][3][3])**0.5;
   #calculate volume of cell
   $vol[$i]=$b[$i][1][1]*$b[$i][2][2]*$b[$i][3][3]+$b[$i][1][2]*$b[$i][2][3]*$b[$i][3][1]+$b[$i][1][3]*$b[$i][2][1]*$b[$i][3][2]-$b[$i][3][1]*$b[$i][2][2]*$b[$i][1][3]-$b[$i][3][2]*$b[$i][2][3]*$b[$i][1][1]-$b[$i][3][3]*$b[$i][2][1]*$b[$i][1][2];
   $av_vol=$av_vol+$vol[$i];
}
$av_vol=$av_vol/$nconf;

#choose species 1 for which pair correlation function is going to be calculated
$atmin_1=1;
if ($species_1>1)
{
   for (my $i=1;$i<$species_1;$i++)
   {
     $atmin_1=$atmin_1+$elements[$i];
   }
}
$atmax_1=$atmin_1+$elements[$species_1]-1;
#choose species 2 to which paircorrelation function is calculated to
$atmin_2=1;
if ($species_2>1)
{
   for (my $i=1;$i<$species_2;$i++)
   {
     $atmin_2=$atmin_2+$elements[$i];
   }
}
$atmax_2=$atmin_2+$elements[$species_2]-1;
#initialize pair correlation function
for (my $i=0;$i<=($nr-1);$i++)
{
   $pcf[$i]=0.0;
}
# loop over configurations, make histogram of pair correlation function 
for (my $j=$confmin;$j<=$confmax;$j=$j+$confskip)
{
   for (my $k=$atmin_1;$k<=$atmax_1;$k++)
   {
       for (my $l=$atmin_2;$l<=$atmax_2;$l++)
       {
          if ($k==$l) {next};
          for (my $g_x=-$mult_x;$g_x<=$mult_x;$g_x++)
          {
             for (my $g_y=-$mult_y;$g_y<=$mult_y;$g_y++)
             {
                for (my $g_z=-$mult_y;$g_z<=$mult_z;$g_z++)
                {
                   my $at2_x=$cart[$j][$l][1]+$vec_len[$j][1]*$g_x;
                   my $at2_y=$cart[$j][$l][2]+$vec_len[$j][2]*$g_y;
                   my $at2_z=$cart[$j][$l][3]+$vec_len[$j][3]*$g_z;
                   my $dist=($cart[$j][$k][1]-$at2_x)**2.0+($cart[$j][$k][2]-$at2_y)**2.0+($cart[$j][$k][3]-$at2_z)**2.0;
                   $dist=$dist**0.5;
                   #determine integer multiple 
                   my $zz=int(($dist-$rmin)/$dr+0.5);
                   if ($zz<$nr)
                   {
                      $pcf[$zz]=$pcf[$zz]+1.0;
                   }
                }
             }
          }
       }
   }
}
 
#make ensemble average, rescale functions and print
for (my $i=0;$i<=($nr-1);$i++)
{
   my $r=$rmin+$i*$dr;
   if ($r<$tol)
   {
      $pcf[$i]=0.0;
   }
   else
   {
      $pcf[$i]=$pcf[$i]*$av_vol/(4*$pi*$r*$r*$dr*(($confmax-$confmin)/$confskip)*($atmax_2-$atmin_2+1)*($atmax_1-$atmin_1+1));#*((2.0*$mult_x+1.0)*(2.0*$mult_y+1.0)*(2.0*$mult_z+1.0)));
   }
   print $r," ",$pcf[$i],"\n";
}

Obtain the pair correlation function from the previously saved XDATCAR files:

perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_MLFF_3ps.dat
perl pair_correlation_function.pl XDATCAR.AI_3ps > pair_AI_3ps.dat

Plot the pair correlation functions using the following command:

gnuplot -e "set terminal jpeg; set xlabel 'r(Ang)'; set ylabel 'PCF'; set style data lines; plot 'pair_MLFF_3ps.dat', 'pair_AI_3ps.dat' " > PC_MLFF_vs_AI_3ps.jpg

The pair correlation functions obtained that way should look similar like that:

We see that pair correlation is quite well reproduced although the error on the force is a little bit large. Nevertheless we just used this learning to very create a liquid structure very fast, since creating a liquid from a perfect crystal can be a time consuming task.

Obtaining more accurate force field

After learning how to get a good starting structure for the the liquid we will restart the learning but with better ab initio parameters using the CONTCAR from the previous subsection. It is important in most calculations to have quite accurate ab initio calculations since bad ab initio parameters can really limit the accuracy of the learning or even inhibit a proper learning.

We do the following steps to restart learning:

  • Copy POSCAR.T2000_relaxed to POSCAR.
  • Change INCAR tags:
  • Increase the k mesh in the KPOINTS file. The new file looks like:
test
0 0 0
Gamma
 2 2 2
 0 0 0
  • Run the calculation using the standard version of VASP (usually vasp_std).

After running the calculation we will again look at the error in the ML_LOGFILE, where the obtained values should be close to:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.006079
             Error in force  (eV Angst^-1):     0.165678
                      Error in stress (kB):     2.553148
               Bayesian error (eV Angst-1):     0.044695    0.101357
                       Spilling factor (-):     0.002285    0.020000
====================================================================================================

We immediately see that the errors are significantly lower than in the training with one k point.

Next we try to look at how well the force field is learned and try to monitor how much training data is added to a structure after each learning cycle.

The ML_ABNCAR file contains the training data. The number of reference structures and the basis-set size are written at the beginning of this file. The values in this example at this step should be close to the following:

**************************************************
     The number of configurations
--------------------------------------------------
         55

and

**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       396

for the training-set size and basis-set size, respectively.

Now we want to further continue the learning to see how much data is added and how the accuracy of the force field changes. Do the following steps for a continuation run with on-the-fly learning:

  • Copy ML_ABNCAR to ML_ABCAR.
  • Copy CONTCAR to POSCAR.
  • Modify the INCAR file in the following way:
    • Change ML_FF_ISTART=0 to ML_FF_ISTART=1.
    • Add ML_FF_CTIFOR=x, where x is the last criterion for the threshold of the Bayesian error obtained from the ML_LOGFILE (from the entry above it would be 0.101357, but please use the value that you obtained in your calculation). By setting already a good guess for the threshold several ab initio steps can be skipped from the beginning which would be required to determine the Bayesian error criterion. Mind: When continuation runs are performed where the crystal structure is changed, the last obtained error criterion for the Bayesian error can be too large and then many ab initio steps can be skipped wrongly (this can especially happen when going from first a liquid to a solid phase). In that case it is safer to use the default of ML_FF_CTIFOR=10^{-16}</math>}}.

After learning we again look at the last entry of the errors in the ML_LOGFILE, which should look like:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.006252
             Error in force  (eV Angst^-1):     0.168549
                      Error in stress (kB):     2.606332
               Bayesian error (eV Angst-1):     0.037569    0.101357
                       Spilling factor (-):     0.001114    0.020000
====================================================================================================

We see that the accuracy has not significantly changed. Also the threshold for the Bayesian error has not change (by typing grep "Bayesian error (eV Angst-1):" ML_LOGFILE we see that it was the same the whole time). By looking at the actual errors we see that that the threshold was overwritten only very few times. To see how often the threshold was overstepped type the following command on the command line:

grep "Bayesian error (eV Angst-1):" ML_LOGFILE |awk '{if ($5>0.101357) {print $5}}'

Consequently only very few structures were added to the ab initio data in the ML_ABNCAR file. The entry for the reference structures and basis sets should look like:

**************************************************
     The number of configurations
--------------------------------------------------
         65

and

**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       407

This means that either the force field is already accurate enough or that the threshold is set too high and ab initio calculations are skipped because the force field is too accurate for that criterion.

Next we will test the learning with the default value for ML_FF_CTIFOR (which is practically zero), to see if the force field changes significantly. For that the following steps have to be done:

After rerunning the calculation we see that many structures have been added in the ML_ABNCAR file:

 **************************************************
     The number of configurations
--------------------------------------------------
        160
...
**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       726

Nevertheless the accuracy of the force field did not improve significantly in the ML_LOGFILE:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.005950
             Error in force  (eV Angst^-1):     0.162064
                      Error in stress (kB):     2.628567
               Bayesian error (eV Angst-1):     0.029723    0.061348
                       Spilling factor (-):     0.000406    0.020000
====================================================================================================

Ideally one should continue learning until no structures need to be added to the training data and basis set any more. Practically this is hard to achieve, since the prediction of the Bayesian error has some inaccuracies and learning can be executed even if the force field is accurate enough. Nevertheless for practical calculations one has to stop the calculation at a given point. In this example we stop here because the accuracy on test structures compared to the ab initio data is not improving in the ML_LOGFILE by further addition of training data and this example is only a tutorial example. For high quality production calculations suited for publications we would advise to learn until almost no data is added to the training structures and basis set. In many cases this involves a length of 100 ps for learning.

User task: The user should continue learning and observe the convergence of the number of reference structures. Most likely the maximum allowed basis set size ML_FF_MB_MB needs to be increased. The user should also look at the pair correlation function with respect to the ab initio data. Is it still ok?

Download