Running machine-learned force fields in LAMMPS: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 58: Line 58:


Similar to **VASP** also **VASPml** requires to enter compiler details and library paths into a file named `makefile.include` before the build process can be started. Template files for this file can be found in the `arch` subdirectory. Usually it is convenient to start from one of these files. Hence, first copy it to the base directory and rename it to `makefile.include`, e.g.
Similar to **VASP** also **VASPml** requires to enter compiler details and library paths into a file named `makefile.include` before the build process can be started. Template files for this file can be found in the `arch` subdirectory. Usually it is convenient to start from one of these files. Hence, first copy it to the base directory and rename it to `makefile.include`, e.g.
```
 
<code>
cp arch/makefile.include.gnu makefile.include
cp arch/makefile.include.gnu makefile.include
```
</code>
 
Then modify the contents to reflect the compiler and library settings on your machine, for details see the [compiler and linker options](#compiler-and-linker-options) section below. Once done, the **VASPml** library can be built by executing this command in the top directory:
Then modify the contents to reflect the compiler and library settings on your machine, for details see the [compiler and linker options](#compiler-and-linker-options) section below. Once done, the **VASPml** library can be built by executing this command in the top directory:
```
 
<code>
make -j  
make -j  
```
</code>
 
This will automatically build the two _targets_ `libvaspml` and `applications`, and is equivalent of running two make commands explicitly in this order:
This will automatically build the two _targets_ `libvaspml` and `applications`, and is equivalent of running two make commands explicitly in this order:
```
 
<code>
make libvaspml -j
make libvaspml -j
make applications -j
make applications -j
```
</code>
 
The `libvaspml` target builds the library with same name and places it in the `lib` folder. The `applications` target compiles and links standalone applications present in `src/applications`. At the moment there is only one application named `vaspml-predict` which predicts energy, forces and stress for one `POSCAR` file with a given `ML_FF` force field file. The executable will be copied to the `bin` directory.
The `libvaspml` target builds the library with same name and places it in the `lib` folder. The `applications` target compiles and links standalone applications present in `src/applications`. At the moment there is only one application named `vaspml-predict` which predicts energy, forces and stress for one `POSCAR` file with a given `ML_FF` force field file. The executable will be copied to the `bin` directory.


Line 98: Line 105:
### Automatic patching and compilations of **LAMMPS**
### Automatic patching and compilations of **LAMMPS**


```
<code>
make lammps -j
make lammps -j
```
</code>


#### Technical details
#### Technical details
Line 111: Line 118:


**LAMMPS** comes with its own powerful script language which allows the user to specify all relevant MD simulation parameters in a single file. Please consult the [LAMMPS documentation](https://docs.lammps.org/Commands_input.html) for details. Within the **LAMMPS** script language the commands `pair_style` and `pair_coeff` are responsible selecting a force field. The patch **VASPml** provides introduces a new `pair_style` called `vasp`. The `pair_style vasp` command does not have any additional arguments, all configurable settings are given as arguments to the `pair_coeff` command in this format:
**LAMMPS** comes with its own powerful script language which allows the user to specify all relevant MD simulation parameters in a single file. Please consult the [LAMMPS documentation](https://docs.lammps.org/Commands_input.html) for details. Within the **LAMMPS** script language the commands `pair_style` and `pair_coeff` are responsible selecting a force field. The patch **VASPml** provides introduces a new `pair_style` called `vasp`. The `pair_style vasp` command does not have any additional arguments, all configurable settings are given as arguments to the `pair_coeff` command in this format:
```
 
<code>
pair_style vasp
pair_style vasp
pair_coeff * * file types
pair_coeff * * file types
```
</code>
 
The `pair_coeff` command must be followed by `* *`, then followed by the name of the **VASP** force field file, typically `ML_FF`. Finally, there comes a mapping from **LAMMPS** atom types to **VASP** force fiel types, e.g., `H O Na Cl` means that **LAMMPS** types `1`, `2`, `3` and `4` are mapped to **VASP** types `H`, `O`, `Na` and `Cl`, respectively. A valid example may look like this:
The `pair_coeff` command must be followed by `* *`, then followed by the name of the **VASP** force field file, typically `ML_FF`. Finally, there comes a mapping from **LAMMPS** atom types to **VASP** force fiel types, e.g., `H O Na Cl` means that **LAMMPS** types `1`, `2`, `3` and `4` are mapped to **VASP** types `H`, `O`, `Na` and `Cl`, respectively. A valid example may look like this:
```
 
<code>
pair_style vasp
pair_style vasp
pair_coeff * * ML_FF Pb Br Cs
pair_coeff * * ML_FF Pb Br Cs
```
</code>
 
This will map the **LAMMPS** atom types `1`, `2` and `3` in the [input data file](#LAMMPS input data file) to the types `Pb`, `Br` and `Cs` for which a pre-trained machine-learned force field should be present in the `ML_FF` file in the execution directory. A summary of the type mapping is provided in the screen output and the `log.lammps` file, e.g. for the example above it looks like this:
This will map the **LAMMPS** atom types `1`, `2` and `3` in the [input data file](#LAMMPS input data file) to the types `Pb`, `Br` and `Cs` for which a pre-trained machine-learned force field should be present in the `ML_FF` file in the execution directory. A summary of the type mapping is provided in the screen output and the `log.lammps` file, e.g. for the example above it looks like this:
```
```
Line 130: Line 143:
```
```
On the left side we find the mapping, the right side gives an overview of types present in the force field file. In this example, there is a one-to-one mapping, hence, the table looks pretty obvious and contains somewhat redundant information. However, it is also possible to leave out a mapping from specified **LAMMPS** types by supplying `NULL` instead of a valid **VASP** type name. This can be helpful when multiple force fields should be combined, see [`pair_style hybrid`](https://docs.lammps.org/pair_hybrid.html). Furthermore, multiple **LAMMPS** types may be mapped to the same **VASP** types. Finally, the force field file may contain types which are not used in the current MD simulation. Therefore, a more complicated example may look like this:
On the left side we find the mapping, the right side gives an overview of types present in the force field file. In this example, there is a one-to-one mapping, hence, the table looks pretty obvious and contains somewhat redundant information. However, it is also possible to leave out a mapping from specified **LAMMPS** types by supplying `NULL` instead of a valid **VASP** type name. This can be helpful when multiple force fields should be combined, see [`pair_style hybrid`](https://docs.lammps.org/pair_hybrid.html). Furthermore, multiple **LAMMPS** types may be mapped to the same **VASP** types. Finally, the force field file may contain types which are not used in the current MD simulation. Therefore, a more complicated example may look like this:
```
 
<code>
pair_coeff * * vasp ML_FF NULL Cs NULL Br Pb Br
pair_coeff * * vasp ML_FF NULL Cs NULL Br Pb Br
```
</code>
 
and the corresponding table could contain this information:
and the corresponding table could contain this information:
```
```
Line 152: Line 167:


Where **VASP** uses POSCAR files to define the input structure (lattice and ion positions) **LAMMPS** uses [its own file format](https://docs.lammps.org/read_data.html) to start MD simulations from. For simple cubic or orthorhombic systems the files can be manually converted with little effort. However, this becomes more cumbersome with triclinic simulation cells because **LAMMPS** originally only supported _restricted triclinic_ boxes. Here, the first lattice vector is _restricted_ to lie along the x-axis of the Cartesian coordinate system and the second vector must lie in the xy-plane. The third lattice vector can be arbitrary as long as it points out of the xy-plane and the three vectors form a right-hand system. Note that any set of lattice vectors can be transformed (rotated and/or mirrored) to fulfill these conditions without changing the physical situation. These restrictions do not apply to **VASP** POSCAR files and therefore _general_ triclinic lattices need to be [transformed](https://docs.lammps.org/Howto_triclinic.html) to create a valid **LAMMPS** input file. This task can be performed by the Python script `poscar2lammps_data.py` which is located in the `res` directory relative to the base folder. It takes two command-line arguments:
Where **VASP** uses POSCAR files to define the input structure (lattice and ion positions) **LAMMPS** uses [its own file format](https://docs.lammps.org/read_data.html) to start MD simulations from. For simple cubic or orthorhombic systems the files can be manually converted with little effort. However, this becomes more cumbersome with triclinic simulation cells because **LAMMPS** originally only supported _restricted triclinic_ boxes. Here, the first lattice vector is _restricted_ to lie along the x-axis of the Cartesian coordinate system and the second vector must lie in the xy-plane. The third lattice vector can be arbitrary as long as it points out of the xy-plane and the three vectors form a right-hand system. Note that any set of lattice vectors can be transformed (rotated and/or mirrored) to fulfill these conditions without changing the physical situation. These restrictions do not apply to **VASP** POSCAR files and therefore _general_ triclinic lattices need to be [transformed](https://docs.lammps.org/Howto_triclinic.html) to create a valid **LAMMPS** input file. This task can be performed by the Python script `poscar2lammps_data.py` which is located in the `res` directory relative to the base folder. It takes two command-line arguments:
```
 
<code>
poscar2lammps_data.py <in> <out>
poscar2lammps_data.py <in> <out>
```
</code>
where `<in>` is the POSCAR input file and `out` is the resulting **LAMMPS** data file in restricted triclinic form. If `<out>` is omitted, the output file is written to `lammps.data`.


Please be aware that this script is not heavily tested and its results should be checked for consistency. Also, recent versions of **LAMMPS** (e.g. `patch_17Apr2024`) do support general triclinic lattices for convenience, see remarks [here](https://docs.lammps.org/Howto_triclinic.html#general-triclinic-simulation-boxes-in-lammps).
where `<in>` is the {{FILE|POSCAR}} input file and `out` is the resulting **LAMMPS** data file in restricted triclinic form. If `<out>` is omitted, the output file is written to `lammps.data`.
Please be aware that this script is not heavily tested and its results should be checked for consistency. Alternatively, the ase environment can be used to convert {{FILE|POSCAR}} to lammps input files. For this, a {{FILE|POSCAR}} can be read into an atoms environment of ase and then written to a file in lammps format. Also, recent versions of **LAMMPS** (e.g. `patch_17Apr2024`) do support general triclinic lattices for convenience, see remarks [here](https://docs.lammps.org/Howto_triclinic.html#general-triclinic-simulation-boxes-in-lammps).


### Example directory
### Example directory


An example **LAMMPS** MD simulation of Cesium Lead Bromide can be found in the following directory relative to the **VASPml** base directory:
An example **LAMMPS** MD simulation of Cesium Lead Bromide can be found in the following directory relative to the **VASPml** base directory:
```
 
<code>
examples/lammps/CsPbBr3
examples/lammps/CsPbBr3
```
</code>
 
To execute, first [compile the patched **LAMMPS** executable](#Automatic-patching-and-compilations-of-**LAMMPS**) change into the directory above and run a parallel MD simulation with this command:
To execute, first [compile the patched **LAMMPS** executable](#Automatic-patching-and-compilations-of-**LAMMPS**) change into the directory above and run a parallel MD simulation with this command:
```
 
<code>
mpirun -np 4 ../../../bin/lmp_mpi -in in.lmp
mpirun -np 4 ../../../bin/lmp_mpi -in in.lmp
```
<code>
 
Here, `lmp_mpi` is the patched **LAMMPS** executable and `-in in.lmp` is one of its command line arguments specifying that the **LAMMPS** commands should be read from a script file called `in.lmp`. This file is present in the example directory and contains an already advanced MD setup for a simulation of 100 time steps sampling the NpT ensemble. `in.lmp` also specifies that the output trajectory should be written to `out.dump` and global thermodynamic properties (e.g. potential energy, pressure,...) are written to `out.prop`. The example **LAMMPS** script file can be easily altered to sample also NVE or NVT ensembles. Many other simulation parameters can also be modified by changing the variable values at the beginning of the file. Please have a look at the comments in `in.lmp` and visit the [**LAMMPS** documentation](https://docs.lammps.org/Manual.html) for more information.
Here, `lmp_mpi` is the patched **LAMMPS** executable and `-in in.lmp` is one of its command line arguments specifying that the **LAMMPS** commands should be read from a script file called `in.lmp`. This file is present in the example directory and contains an already advanced MD setup for a simulation of 100 time steps sampling the NpT ensemble. `in.lmp` also specifies that the output trajectory should be written to `out.dump` and global thermodynamic properties (e.g. potential energy, pressure,...) are written to `out.prop`. The example **LAMMPS** script file can be easily altered to sample also NVE or NVT ensembles. Many other simulation parameters can also be modified by changing the variable values at the beginning of the file. Please have a look at the comments in `in.lmp` and visit the [**LAMMPS** documentation](https://docs.lammps.org/Manual.html) for more information.

Revision as of 05:58, 17 June 2024

Quick How-To for experienced **VASP**/**LAMMPS** users

1. Just like in **VASP** pick a template from the `arch` directory and copy it to the base directory, e.g.

cp arch/makefile.include.gnu makefile.include

2. Modify the build settings in `makefile.include` according to your system. 3. Compile a patched version of **LAMMPS** with support for **VASP** machine-learned force fields:

make lammps -j

4. Switch to the `examples/lammps/CsPbBr3` directory and try to run the example MD simulation, e.g. with

mpirun -np 4 ../../../bin/lmp_mpi -in in.lmp

5. Inspect the **LAMMPS** input script `in.lmp` and modify to your needs.

If unsure, consult the following detailed documentation sections below:

- [Build instructions](#build-instructions)

   - [Prerequisites](#prerequisites)
   - [Build library and applications](#build-library-and-applications)
   - [Automatic patching and compilations of **LAMMPS**](#automatic-patching-and-compilations-of-lammps)

- [Running **LAMMPS** with **VASP** machine-learned force field](#running-lammps-with-vasp-machine-learned-force-field)

   - [**LAMMPS** input scripts](#lammps-input-scripts)
   - [**LAMMPS** input data file](#lammps-input-data-file)
   - [Example directory](#example-directory)
  1. Build instructions

In future the source of **VASPml** will be distributed as part of the official **VASP** release. The build process of **VASP** will include the steps necessary to compile and link also the **VASPml** library and interfaces. However, at this point (and most likely even when integrated into **VASP**) it is possible to build **VASPml** completely independent of **VASP**. The following sections describe details of such an independent build of **VASPml**.

      1. Prerequisites

1. **VASPml** requires a C++ compiler conforming to the C++17 language standard, for example compilers which are part of:

   - GNU Compiler Collection
   - Intel oneAPI Base Toolkit
   - NVIDIA HPC SDK
   - NEC SDK

2. Numerical libraries: LAPACK and BLAS, which are distributed for example as part of:

   - OpenBLAS
   - Intel oneAPI Math Kernel Library (part of Base Toolkit)
   - NVIDIA HPC SDK
   - NEC NLC (NEC Numeric Library Collection)

3. An MPI (Message Passing Interface) implementation, e.g. in

   - OpenMPI
   - Intel MPI (part of Intel oneAPI HPC Toolkit)
   - NVIDIA HPC SDK (OpenMPI)
   - NEC MPI
      1. Build library and applications

Similar to **VASP** also **VASPml** requires to enter compiler details and library paths into a file named `makefile.include` before the build process can be started. Template files for this file can be found in the `arch` subdirectory. Usually it is convenient to start from one of these files. Hence, first copy it to the base directory and rename it to `makefile.include`, e.g.

cp arch/makefile.include.gnu makefile.include

Then modify the contents to reflect the compiler and library settings on your machine, for details see the [compiler and linker options](#compiler-and-linker-options) section below. Once done, the **VASPml** library can be built by executing this command in the top directory:

make -j

This will automatically build the two _targets_ `libvaspml` and `applications`, and is equivalent of running two make commands explicitly in this order:

make libvaspml -j

make applications -j

The `libvaspml` target builds the library with same name and places it in the `lib` folder. The `applications` target compiles and links standalone applications present in `src/applications`. At the moment there is only one application named `vaspml-predict` which predicts energy, forces and stress for one `POSCAR` file with a given `ML_FF` force field file. The executable will be copied to the `bin` directory.

With the `-j` flag present `make` will run the build process in parallel, starting as many parallel jobs as possible. You can limit the maximum load the build process is allowed to cause with the `-l` flag, please review the documentation of GNU `make`.

        1. Compiler and linker options

The following compiler and linker options in the `makefile.include` should be reviewed and eventually modified before starting the build process:

- `CXX`: This should be a C++17-compatible C++ compiler with MPI support. - `CXXFLAGS`: Specifies the flags for the C++ compiler. - `INCLUDE`: Paths in which to look for headers of required libraries. Here the include directory of BLAS should be listed. - `FC` (deprecated) - `FFLAGS` (deprecated)

        1. Compile-time options

- `-DVASPML_DEBUG_LEVEL`: If set to 1, 2 or 3 enables various sanity checks during runtime with low, medium and high impact on performance, respectively. - `-DVASPML_USE_CBLAS`: Use CBLAS (C interface for BLAS routines) for linear algebra. This is the default and should always be used. - `-DVASPML_USE_MKL`: Use Intel MKL for linear algebra. - `-DVASPML_FORTRAN_MATH` (deprecated): Enable legacy Fortran math routines.

        1. Makefile options

- `--no-color`: Disables colored output of makefiles. - `--no-logo`: Disables the logo **VASPml** logo output.

      1. Automatic patching and compilations of **LAMMPS**

make lammps -j

        1. Technical details

From a technical standpoint **LAMMPS** and **VASPml** interact in the following way: on the **LAMMPS** side a new class `PairVASP` (inheriting from `Pair`) is implemented in `pair_vasp.cpp/h`. Its purpose is to transfer the neighbor lists to **VASPml**, trigger processing, and receive back the energy and force contributions. **VASPml** enters the received neighbor list data into its own structures and computes energy and force predictions according to the pre-trained machine-learned force field. A typical build for the combination of the two codes requires to first compile the `libvaspml` library. Then, **LAMMPS** is patched with the additional `pair_vasp.cpp/h` files, which are automatically compiled during the **LAMMPS** build. In the final stage, **LAMMPS** is linked to the `libvaspml` library, resulting in a patched executable. This can be done manually but **VASPml** also offers a convenient automated way covering all steps (`make lammps`).

  1. Running LAMMPS with VASP machine-learned force field
      1. LAMMPS input scripts
    • LAMMPS** comes with its own powerful script language which allows the user to specify all relevant MD simulation parameters in a single file. Please consult the [LAMMPS documentation](https://docs.lammps.org/Commands_input.html) for details. Within the **LAMMPS** script language the commands `pair_style` and `pair_coeff` are responsible selecting a force field. The patch **VASPml** provides introduces a new `pair_style` called `vasp`. The `pair_style vasp` command does not have any additional arguments, all configurable settings are given as arguments to the `pair_coeff` command in this format:

pair_style vasp

pair_coeff * * file types

The `pair_coeff` command must be followed by `* *`, then followed by the name of the **VASP** force field file, typically `ML_FF`. Finally, there comes a mapping from **LAMMPS** atom types to **VASP** force fiel types, e.g., `H O Na Cl` means that **LAMMPS** types `1`, `2`, `3` and `4` are mapped to **VASP** types `H`, `O`, `Na` and `Cl`, respectively. A valid example may look like this:

pair_style vasp

pair_coeff * * ML_FF Pb Br Cs

This will map the **LAMMPS** atom types `1`, `2` and `3` in the [input data file](#LAMMPS input data file) to the types `Pb`, `Br` and `Cs` for which a pre-trained machine-learned force field should be present in the `ML_FF` file in the execution directory. A summary of the type mapping is provided in the screen output and the `log.lammps` file, e.g. for the example above it looks like this: ```

  LAMMPS       pair_coeff      VASP      |             VASP force field
   types       names           subtypes  |     types       names        subtypes

| -------------------------------------

       1 <---> Pb        <---> 0         |         0 <---> Pb     <---> 0        
       2 <---> Br        <---> 1         |         1 <---> Br     <---> 1        
       3 <---> Cs        <---> 2         |         2 <---> Cs     <---> 2  

``` On the left side we find the mapping, the right side gives an overview of types present in the force field file. In this example, there is a one-to-one mapping, hence, the table looks pretty obvious and contains somewhat redundant information. However, it is also possible to leave out a mapping from specified **LAMMPS** types by supplying `NULL` instead of a valid **VASP** type name. This can be helpful when multiple force fields should be combined, see [`pair_style hybrid`](https://docs.lammps.org/pair_hybrid.html). Furthermore, multiple **LAMMPS** types may be mapped to the same **VASP** types. Finally, the force field file may contain types which are not used in the current MD simulation. Therefore, a more complicated example may look like this:

pair_coeff * * vasp ML_FF NULL Cs NULL Br Pb Br

and the corresponding table could contain this information: ```

  LAMMPS       pair_coeff      VASP      |             VASP force field
   types       names           subtypes  |     types       names        subtypes

| -------------------------------------

       1 <---> unmapped! <---> unmapped! |         0 <---> Ca     <---> unused!
       2 <---> Cs        <---> 2         |         1 <---> Pb     <---> 0        
       3 <---> unmapped! <---> unmapped! |         2 <---> O      <---> unused!
       4 <---> Br        <---> 1         |         3 <---> Br     <---> 1        
       5 <---> Pb        <---> 0         |         4 <---> Cs     <---> 2        
       6 <---> Br        <---> 1         |

``` It is important to always ensure that the type mapping is correctly set up because mixed-up types may not immediately result in errors. An MD simulation may still run and only post-processing may ultimately reveal inconsistencies which can be tedious to trace back to type-mapping mistakes.

The `pair_style vasp` expects input coordinates to be in the units of Ångström and returns energies and forces with the energy unit of eV. Hence, it is only compatible with the **LAMMPS** setting `units metal` in the input script, otherwise an error will occur.

      1. LAMMPS input data file

Where **VASP** uses POSCAR files to define the input structure (lattice and ion positions) **LAMMPS** uses [its own file format](https://docs.lammps.org/read_data.html) to start MD simulations from. For simple cubic or orthorhombic systems the files can be manually converted with little effort. However, this becomes more cumbersome with triclinic simulation cells because **LAMMPS** originally only supported _restricted triclinic_ boxes. Here, the first lattice vector is _restricted_ to lie along the x-axis of the Cartesian coordinate system and the second vector must lie in the xy-plane. The third lattice vector can be arbitrary as long as it points out of the xy-plane and the three vectors form a right-hand system. Note that any set of lattice vectors can be transformed (rotated and/or mirrored) to fulfill these conditions without changing the physical situation. These restrictions do not apply to **VASP** POSCAR files and therefore _general_ triclinic lattices need to be [transformed](https://docs.lammps.org/Howto_triclinic.html) to create a valid **LAMMPS** input file. This task can be performed by the Python script `poscar2lammps_data.py` which is located in the `res` directory relative to the base folder. It takes two command-line arguments:

poscar2lammps_data.py <in> <out>

where `<in>` is the POSCAR input file and `out` is the resulting **LAMMPS** data file in restricted triclinic form. If `<out>` is omitted, the output file is written to `lammps.data`. Please be aware that this script is not heavily tested and its results should be checked for consistency. Alternatively, the ase environment can be used to convert POSCAR to lammps input files. For this, a POSCAR can be read into an atoms environment of ase and then written to a file in lammps format. Also, recent versions of **LAMMPS** (e.g. `patch_17Apr2024`) do support general triclinic lattices for convenience, see remarks [here](https://docs.lammps.org/Howto_triclinic.html#general-triclinic-simulation-boxes-in-lammps).

      1. Example directory

An example **LAMMPS** MD simulation of Cesium Lead Bromide can be found in the following directory relative to the **VASPml** base directory:

examples/lammps/CsPbBr3

To execute, first [compile the patched **LAMMPS** executable](#Automatic-patching-and-compilations-of-**LAMMPS**) change into the directory above and run a parallel MD simulation with this command:

mpirun -np 4 ../../../bin/lmp_mpi -in in.lmp

Here, `lmp_mpi` is the patched **LAMMPS** executable and `-in in.lmp` is one of its command line arguments specifying that the **LAMMPS** commands should be read from a script file called `in.lmp`. This file is present in the example directory and contains an already advanced MD setup for a simulation of 100 time steps sampling the NpT ensemble. `in.lmp` also specifies that the output trajectory should be written to `out.dump` and global thermodynamic properties (e.g. potential energy, pressure,...) are written to `out.prop`. The example **LAMMPS** script file can be easily altered to sample also NVE or NVT ensembles. Many other simulation parameters can also be modified by changing the variable values at the beginning of the file. Please have a look at the comments in `in.lmp` and visit the [**LAMMPS** documentation](https://docs.lammps.org/Manual.html) for more information.