Category:Bethe-Salpeter equations: Difference between revisions

From VASP Wiki
 
(10 intermediate revisions by 2 users not shown)
Line 74: Line 74:


Although the dielectric function is frequency-dependent, the static approximation <math>W_{\mathbf{G}, \mathbf{G}^{\prime}}(\mathbf{q}, \omega=0)</math> is considered a standard for practical BSE calculations.  
Although the dielectric function is frequency-dependent, the static approximation <math>W_{\mathbf{G}, \mathbf{G}^{\prime}}(\mathbf{q}, \omega=0)</math> is considered a standard for practical BSE calculations.  


== Scaling ==
== Scaling ==
The scaling of the BSE equation strongly limits its application for large systems. The main limiting factor is the diagonalization of the BSE Hamiltonian. The rank of the Hamiltonian is
The steep scaling of BSE with the system size can be a limiting factor for its application in large systems. This should be considered when performing BSE calculations.
 
=== Building matrix ===
The {{TAGO|ALGO|BSE/TDHF}} algorithm as a first step, requires building the Hamiltonian of rank
:<math>N_{\rm rank} = N_k\times N_c\times N_v</math>,
:<math>N_{\rm rank} = N_k\times N_c\times N_v</math>,
where <math>N_k</math> is the number of k-points in the full Brillouin zone and <math>N_c</math> and <math>N_v</math> are the number of conduction and valence bands, respectively. This computation scales as


where <math>N_k</math> is the number of k-points in the Brillouin zone and <math>N_c</math> and <math>N_v</math> are the number of conduction and valence bands, respectively.  The diagonalization of the matrix scales cubically with the matrix rank, i.e.,  <math>N_{\rm rank}^3</math>.
:<math>N_k\times N_q\times (N_v\times N_v\times N_G\times N_c\times N_c)</math>,


Despite the fact that this matrix diagonalization is usually the bottleneck for bigger systems, the construction of the BSE Hamiltonian also scales unfavorably and can play a dominant role in big systems, i.e.,
where <math>N_q</math> is the number of q-points and <math>N_G</math> number of G-vectors. To simplify it, we can estimate this computation as <math>N^4-N^5</math> with the system size.


:<math>N_k\times N_q\times (N_v\times N_v\times N_G\times N_c\times N_c)</math>,
=== Solving equation ===
 
In the second step, the equation has to be solved. VASP provides different methods for doing that.
where <math>N_q</math> is the number of q-points and <math>N_G</math> number of G-vectors.
==== Exact diagonalization ====
The exact diagonalization algorithm ({{TAGO|IBSE|2}}) scales cubically with the matrix rank <math>N_{\rm rank}^3</math>
or as <math>N^6</math> with the system size.
==== Iterative solution ====
The iterative solution, as in the time-evolution ({{TAGO|IBSE|1}}) or Lanczos
({{TAGO|IBSE|3}}) algorithms, do not
require diagonalizaing the full matrix but instead, require computing the matrix-vector multiplication for a number of steps or iterations <math>m</math>. Thus, solving the equation via the time-evolution or Lanzcos algorithms scales as <math>N_{\rm rank}^2\times m</math> or <math>N^4</math> with the system size. The number of iterations depends on the algorithm and the required precision, which can be selected via {{TAG|BSEPREC}} .


== Exact diagonalization ==
== Exact diagonalization ==
Line 126: Line 133:
* Calculating the dielectric function
* Calculating the dielectric function
* Calculations beyond the Tamm-Dancoff approximation
* Calculations beyond the Tamm-Dancoff approximation
<!--
 
 
==Lanczos algorithm==
==Lanczos algorithm==
The expression for the dielectric function can be re-written as a continued fraction  
The expression for the dielectric function can be re-written as a continued fraction  
Line 133: Line 141:
     - \cfrac{b_2^2}{...}}},
     - \cfrac{b_2^2}{...}}},
</math>
</math>
where <math>|u_0\rangle</math> is an initial guess vector computed from the dipole moments, <math>|u_0\rangle = \sum_{cv\mathbf{k}} \langle c\mathbf{k}|r_\alpha|v\mathbf{k}\rangle \langle v\mathbf{k}|r_\beta|c\mathbf{k}\rangle</math>. The <math>a</math> and <math>b</math> coefficients are evaluated iteratively, with the iterative algorithm stopping once the difference between <math>\epsilon(\omega)</math> from two consecutive iterations is below a certain threshold, set by {{TAG|LANCZOSTHR}} in the {{TAG|INCAR}}. By default, {{TAG|LANCZOSTHR}}=<math>10^{-3}</math>.
where <math>|u_0\rangle</math> is an initial guess vector computed from the dipole moments, <math>|u_0\rangle = \sum_{cv\mathbf{k}} \langle c\mathbf{k}|r_\alpha|v\mathbf{k}\rangle \langle v\mathbf{k}|r_\beta|c\mathbf{k}\rangle</math>. The <math>a</math> and <math>b</math> coefficients are evaluated iteratively, with the iterative algorithm stopping once the difference between <math>\epsilon(\omega)</math> from two consecutive iterations is below a certain threshold selected by {{TAG|BSEPREC}}.


Using the dipole moments as the starting point means that the iterative algorithm is sensitive only to optically active transitions, i.e. <math>v\to c</math> transitions with non-zero dipole moment. As such, the algorithm will ignore optically inactive transitions and can reach convergence faster than other methods for larger matrices.
Using the dipole moments as the starting point means that the iterative algorithm is sensitive only to optically active transitions, i.e. <math>v\to c</math> transitions with non-zero dipole moment. As such, the algorithm will ignore optically inactive transitions and can reach convergence faster than other methods for larger matrices.
Line 139: Line 147:
The following features are currently supported:
The following features are currently supported:
* Calculating the dielectric function
* Calculating the dielectric function
<!--
* Calculating the eigenvalues of bright excitonic states
* Calculating the eigenvalues of bright excitonic states
-->
 
<!--
expression with the u_0 vector explicitly written  
expression with the u_0 vector explicitly written  
<math\delta_{\alpha\beta} - \frac{4\pi}{\Omega}\sum_{cv\mathbf{k}} \langle|c\mathbf{k}|r_\alpha|v\mathbf{k}\rangle
<math\delta_{\alpha\beta} - \frac{4\pi}{\Omega}\sum_{cv\mathbf{k}} \langle|c\mathbf{k}|r_\alpha|v\mathbf{k}\rangle
Line 147: Line 155:
     \cfrac{1}{(\omega - a_1 + \mathrm i\eta) - \cfrac{b_1^2}{(\omega -a_2 + \mathrm i\eta) - \cfrac{b_2^2}{...}}}
     \cfrac{1}{(\omega - a_1 + \mathrm i\eta) - \cfrac{b_1^2}{(\omega -a_2 + \mathrm i\eta) - \cfrac{b_2^2}{...}}}
</math>
</math>
->
-->
== Performing BSE calculations on GPU ==
As of VASP 6.5, the BSE/TDHF calculations with {{TAGO|IBSE|1}} or {{TAGO|IBSE|2}} can be fully run on NVIDIA GPUs.
To be able to offload the BSE calculations to GPUs one has to compile VASP with the [https://docs.nvidia.com/cuda/cusolvermp cuSOLVERMp] and [https://docs.nvidia.com/cuda/cublasmp cuBLASMp] libraries provided with NVHPC-SDK 24.7 or newer.
To be able to use these libraries VASP has to be compiled with HPC-X (MPI shipped with NVHPC-SDK), which can be loaded via
 
module load nvhpc-hpcx-cuda12/24.7
 
To enable these libraries in VASP, make sure to include the following lines in your <code>makefile.include</code>
 
CPP_OPTIONS+= -DCUSOLVERMP -DCUBLASMP
LLIBS      += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal
 
To be able to perform the BSE calculation on GPUs, VASP needs to store the full BSE Hamiltonian in the GPU memory, which is often the limiting factor. The memory required to store the BSE Hamiltonian can be estimated as <math>N_{\rm rank}^2\times 16\cdot 10^{-9}</math> in Gb for {{TAGO|ANTIRES|0}}. In the case of exact diagonalization {{TAGO|IBSE|2}}, the eigensolver requires an additional scratch space.
{{NB|mind|When running BSE calculations on GPUs, we recommend not setting {{TAG|OMEGAMAX}} or setting it to a larger value so that all the bands selected in {{TAG|NBANDSV}} and {{TAG|NBANDSO}} are included in the kernel. Otherwise, additional data transfers between CPU and GPU might be required, which leads to a serious performance degradation on GPUs.|}}


== How to ==
== How to ==

Latest revision as of 13:54, 20 December 2024

The formalism of the Bethe-Salpeter equation (BSE) allows for calculating the polarizability with the electron-hole interaction and constitutes the state of the art for calculating absorption spectra in solids.

Theory

Bethe-Salpeter equation

In the BSE, the excitation energies correspond to the eigenvalues of the following linear problem[1]


The matrices and describe the resonant and anti-resonant transitions between the occupied and unoccupied states

The energies and orbitals of these states are usually obtained in a calculation, but DFT and Hybrid functional calculations can be used as well. The electron-electron interaction and electron-hole interaction are described via the bare Coulomb and the screened potential .

The coupling between resonant and anti-resonant terms is described via terms and

Due to the presence of this coupling, the Bethe-Salpeter Hamiltonian is non-Hermitian.

Tamm-Dancoff approximation

A common approximation to the BSE is the Tamm-Dancoff approximation (TDA), which neglects the coupling between resonant and anti-resonant terms, i.e., and . Hence, the TDA reduces the BSE to a Hermitian problem

In reciprocal space, the matrix is written as

where is the cell volume, is the bare Coulomb potential without the long-range part

and the screened Coulomb potential

Here, the dielectric function describes the screening in within the random-phase approximation (RPA)

Although the dielectric function is frequency-dependent, the static approximation is considered a standard for practical BSE calculations.

Scaling

The steep scaling of BSE with the system size can be a limiting factor for its application in large systems. This should be considered when performing BSE calculations.

Building matrix

The ALGO = BSE/TDHF algorithm as a first step, requires building the Hamiltonian of rank

,

where is the number of k-points in the full Brillouin zone and and are the number of conduction and valence bands, respectively. This computation scales as

,

where is the number of q-points and number of G-vectors. To simplify it, we can estimate this computation as with the system size.

Solving equation

In the second step, the equation has to be solved. VASP provides different methods for doing that.

Exact diagonalization

The exact diagonalization algorithm (IBSE = 2) scales cubically with the matrix rank or as with the system size.

Iterative solution

The iterative solution, as in the time-evolution (IBSE = 1) or Lanczos (IBSE = 3) algorithms, do not require diagonalizaing the full matrix but instead, require computing the matrix-vector multiplication for a number of steps or iterations . Thus, solving the equation via the time-evolution or Lanzcos algorithms scales as or with the system size. The number of iterations depends on the algorithm and the required precision, which can be selected via BSEPREC .

Exact diagonalization

The diagonalization of the BSE Hamiltonian can be perform using various eigensolvers provided in ScaLAPACK, ELPA, and cuSolver libraries. The advantage of this approach is that the eigenvectors can be directly obtained and used for the analysis of the excitons. Using the eigenvalues and eigenvectors of the BSE Hamiltonian, the macroscopic dielectric which accounts for the excitonic effects can be found

The following features are currently supported:

Time evolution

Alternatively, it is possible to use the time-evolution algorithm which applies a short Dirac delta pulse of electric field and then follows the evolution of the dipole moments. The dielectric function is found via a Fourier transform [2]

,

where and are the dipole moments.

The solution found this way is strictly equivalent to the same solution as the exact diagonalization and can be used for obtaining the absorption spectrum, but does not yield the eigenvectors, which can be limiting for the analysis of the excitons. The advantage of this approach is the quadratic scaling with the size of the BSE Hamiltonian .

The time-evolution algorithm can be selected by setting IBSE = 1 in a BSE calculation. The required number of steps in the time-evolution calculation depends on the broadening CSHIFT and the maximum energy OMEGAMAX. The precision can be selected via tag BSEPREC.

Mind: The required number of steps does not depend on the size of the Hamiltonian

The following features are currently supported:

  • Calculating the dielectric function
  • Calculations beyond the Tamm-Dancoff approximation


Lanczos algorithm

The expression for the dielectric function can be re-written as a continued fraction

where is an initial guess vector computed from the dipole moments, . The and coefficients are evaluated iteratively, with the iterative algorithm stopping once the difference between from two consecutive iterations is below a certain threshold selected by BSEPREC.

Using the dipole moments as the starting point means that the iterative algorithm is sensitive only to optically active transitions, i.e. transitions with non-zero dipole moment. As such, the algorithm will ignore optically inactive transitions and can reach convergence faster than other methods for larger matrices.

The following features are currently supported:

  • Calculating the dielectric function

Performing BSE calculations on GPU

As of VASP 6.5, the BSE/TDHF calculations with IBSE = 1 or IBSE = 2 can be fully run on NVIDIA GPUs. To be able to offload the BSE calculations to GPUs one has to compile VASP with the cuSOLVERMp and cuBLASMp libraries provided with NVHPC-SDK 24.7 or newer. To be able to use these libraries VASP has to be compiled with HPC-X (MPI shipped with NVHPC-SDK), which can be loaded via

module load nvhpc-hpcx-cuda12/24.7

To enable these libraries in VASP, make sure to include the following lines in your makefile.include

CPP_OPTIONS+= -DCUSOLVERMP -DCUBLASMP
LLIBS      += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal

To be able to perform the BSE calculation on GPUs, VASP needs to store the full BSE Hamiltonian in the GPU memory, which is often the limiting factor. The memory required to store the BSE Hamiltonian can be estimated as in Gb for ANTIRES = 0. In the case of exact diagonalization IBSE = 2, the eigensolver requires an additional scratch space.

Mind: When running BSE calculations on GPUs, we recommend not setting OMEGAMAX or setting it to a larger value so that all the bands selected in NBANDSV and NBANDSO are included in the kernel. Otherwise, additional data transfers between CPU and GPU might be required, which leads to a serious performance degradation on GPUs.

How to

  • Practical guide for solving the Bethe-Salpeter equation via diagonalization BSE calculations
  • Practical guide for solving the Casida equation via diagonalization TDDFT calculations

References