VASP 5.3.3 terminating
Posted: Mon Jun 10, 2013 3:38 am
Hi Everyone,
I purchased my version of VASP 5.3.3 through materials design but my question is applicable even when I submit jobs to VASP through the command line exclusively. I am getting premature job terminations before the calculation finishes that I think might be related to a hangup. I am running redhat linux on a 2 processor, 12 core total machine with 64 gigs of ram.
When I initially had everything set up, the jobs ran and completed as expected. Within a few days when I submit a job however I got the following error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
I believe this error is telling me that the mpd daemon is not running. I added mpdboot and mpd to my submission script and the problem went away but about 3 minutes into the run the job terminates abruptly. In an effort to fix this I added walltime and nohup before all the executables that I run to try and help it. This allowed the job to run for 30-60 minutes but it still finishes before it is done. The issue with this termination is that I don't see the errors in the error file or in OUTCAR.
Here is the script I use to submit to a PBS job queue. Things in bold are things I added to try to address this early termination issue.
#PBS
#PBS -l walltime=168:00:00
#PBS -q mainq
#PBS -l nodes=1:ppn=1
#PBS -o VASP.out
#PBS -e stderror
cd /data1/opt/MD/2.0/TaskServer/Tasks/task00141
cp VASP_INCAR INCAR
cp VASP_KPOINTS KPOINTS
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
nohup mpdboot -n 1 -f ~/mpd.hosts -r ssh
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
touch finished
Does anyone know why I need to add the mpdboot and mpd to get this to work? I thought that they took care of the mpd daemon so I didn't have to. I am using mpiexec/run version 1.6.4 and mpd version 4.1. I know that the time out is not related to the queue because I was able to submit day long jobs before and other people routinely run other jobs (non VASP) that run for days.
Thanks so much for your help!
Stephen
I purchased my version of VASP 5.3.3 through materials design but my question is applicable even when I submit jobs to VASP through the command line exclusively. I am getting premature job terminations before the calculation finishes that I think might be related to a hangup. I am running redhat linux on a 2 processor, 12 core total machine with 64 gigs of ram.
When I initially had everything set up, the jobs ran and completed as expected. Within a few days when I submit a job however I got the following error:
mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
I believe this error is telling me that the mpd daemon is not running. I added mpdboot and mpd to my submission script and the problem went away but about 3 minutes into the run the job terminates abruptly. In an effort to fix this I added walltime and nohup before all the executables that I run to try and help it. This allowed the job to run for 30-60 minutes but it still finishes before it is done. The issue with this termination is that I don't see the errors in the error file or in OUTCAR.
Here is the script I use to submit to a PBS job queue. Things in bold are things I added to try to address this early termination issue.
#PBS
#PBS -l walltime=168:00:00
#PBS -q mainq
#PBS -l nodes=1:ppn=1
#PBS -o VASP.out
#PBS -e stderror
cd /data1/opt/MD/2.0/TaskServer/Tasks/task00141
cp VASP_INCAR INCAR
cp VASP_KPOINTS KPOINTS
export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH
export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib
nohup mpdboot -n 1 -f ~/mpd.hosts -r ssh
nohup mpd &
/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel
touch finished
Does anyone know why I need to add the mpdboot and mpd to get this to work? I thought that they took care of the mpd daemon so I didn't have to. I am using mpiexec/run version 1.6.4 and mpd version 4.1. I know that the time out is not related to the queue because I was able to submit day long jobs before and other people routinely run other jobs (non VASP) that run for days.
Thanks so much for your help!
Stephen