Compiling VASP

From URCFwiki
Revision as of 05:23, 21 August 2015 by Dwc62 (Talk | contribs) (Example job script snippet)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

VASP is the Vienna Ab-initio Simulation Package[1] for performing ab-initio quantum mechanical molecular dynamics simulations.

Installed Version

There is no generally installed version as VASP is licensed to research groups directly.

General Guidelines

There are some general guidelines:

And information about compiling on Proteus:

About the makefiles below:

  • makefiles require the TAB character in the rule line. Copy and paste from this page will not work.

Memory Issues

If a certain preprocessor flag is given, VASP allocates memory on the stack[2] as opposed to using dynamic memory on the heap.[3] A common problem that arises is that system limits on stack size cause VASP to crash.[4]

The fix is to modify the source code, adding a new function/routine, which increases the stacksize limit at run time on all nodes involved in a computation.

Adding a limit or ulimit statement in the job script does not work because that statement is only executed on the "master" node of a multi-node MPI job. The limit remains in effect for all the worker nodes.

Alternatively, delete the "-Davoidalloc" option. This rolls in code to allocate memory on the heap.

Add a new file stacksize.c to the source

This file defines a C function which increases the stack limit at run time:

#include <sys/time.h> 
#include <sys/resource.h> 
#include <stdio.h> 

/* NOTE there is an underscore at the end of the function name */
void stacksize_() 
{ 
    int res; 
    struct rlimit rlim; 
  
    getrlimit(RLIMIT_STACK, &rlim); 
    printf("Before: cur=%d,hard=%d\n",(int)rlim.rlim_cur,(int)rlim.rlim_max); 
    
    rlim.rlim_cur=RLIM_INFINITY; 
    rlim.rlim_max=RLIM_INFINITY; 
    res=setrlimit(RLIMIT_STACK, &rlim); 
   
    getrlimit(RLIMIT_STACK, &rlim); 
    printf("After: res=%d,cur=%d,hard=%d\n",res,(int)rlim.rlim_cur,(int)rlim.rlim_max); 
}

Add stacksize.o to the variable SOURCE

In the Makefile, add "stacksize.o" to the end of the line defining the "SOURCE" variable.

Add call to stacksize() in main.F

In the file main.F, after the section

 !=========================================== 
 ! initialise / set constants and parameters ... 
 !===========================================

add the call to the function stacksize() defined above -- NB there is no underscore at the end of the function name here:

       CALL stacksize()

The business with the underscore is due to Fortran's name mangling.[5]

Compiling with Intel Composer XE 2015 + MKL + OpenMPI 1.8.1 - multi-node MPI parallel

Seems like only MPI-sequential works, i.e. cannot have both MPI and OpenMP enabled. "MPI-sequential" refers to the setting in the Intel MKL Link Line Advisor: select the "sequential layer", and the MPICH2 MPI library.

  • The original makefiles have been modified to use newer pattern rules rather than suffix rules.
  • We do not use the ALLOC-free stuff to avoid having to add a new source file, and modifying main.F. No testing has been done to see how this affects speed.

NOTE Copying and pasting the makefile contents from this wiki page will result in a broken makefile. Makefiles require a TAB character at the start of each "action" line. Or, fork the code from GitHub:

    https://github.com/prehensilecode/vasp_makefiles_proteus

Environment

The following modules must be loaded to build VASP 5.3.5 using these makefiles and instructions:

 1) shared                                       4) sge/univa
 2) proteus                                      5) intel/composerxe/2015.1.133
 3) gcc/4.8.1                                    6) proteus-openmpi/intel/2015/1.8.1-mlnx-ofed

Makefile for vasp.5.lib using MPI sequential

###
### makefile.vasp5lib.proteus
### -- 2015-05-05
###
### Makefile by D. Chin <dwc62@drexel.edu> for vasp.5.lib 
### on URCF Proteus using Intel Composer XE 2015 + MKL 11.2
###
### proteus-specific comments start with "###"

.NOTPARALLEL : clean

###
### Replace old suffix rules with pattern rules. 
###

### cancel producing .o from .F and .s
%.o : %.F

%.o : %.s

%.o : %.f
        $(FC) $(FFLAGS) -free $(OPT) -c $<

%.f : %.F
        $(FPP) $(FPPFLAGS) $< $@

%.o : %.c
        $(CC) $(CFLAGS) $(OPT) -c $<

# preprocessor
FPP      = fpp
FPPFLAGS = -P -C -free -f_com=no -w0

CPP      = icc -E
CPPFLAGS = -P -C

CC = icc
FC = ifort

OPT    = -O3 -xHost
NOOPT  = -O0 -xHost
CFLAGS = -mkl
FFLAGS = -names lowercase -mkl

DOBJ    = preclib.o timing_.o derrf_.o dclock_.o diolib.o dlexlib.o drdatab.o

### Select appropriate precision linpack (see bottom of makefile)
LINPACKOBJ = linpack_double.o


#-----------------------------------------------------------------------
# general rules
#-----------------------------------------------------------------------

all: libdmy.a

clean:
        -/bin/rm -f libdmy.* *.o

libdmy.a: $(DOBJ) linpack_double.o
        ar ruv $@ $^

###
### These next 3 are a bundled subset of LAPACK.
### Comment out as we use MKL's LAPACK.
###
# files which do not require autodouble 
#lapack_min.o: lapack_min.f
#       $(FC) $(FFLAGS) -c $^

#lapack_double.o: lapack_double.f
#       $(FC) $(FFLAGS) -c $^

#lapack_single.o: lapack_single.f
#       $(FC) $(FFLAGS) -c $^

### We use LAPACK MKL, rather than LAPACK ATLAS
#lapack_atlas.o: lapack_atlas.f
#       $(FC) $(FFLAGS) -c $^

linpack_double.o: linpack_double.f
        $(FC) $(FFLAGS) $(OPT) -nofree -c $<

#linpack_single.o: linpack_single.f
#       $(FC) $(FFLAGS) -c $^

Makefile for vasp executable using MPI sequential

### 
### makefile.vasp53.mpiseq.proteus
### -- 2015-05-05
###
### Makefile by D. Chin <dwc62@drexel.edu> for Proteus URCF 
### Intel Composer XE 2015 + MKL 11.2 + OpenMPI 1.8.1
### 
### NB proteus-specific comments use "###"

### preprocessor
FPP      = fpp
FPPFLAGS = -free -f_com=no -w0

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf             charge density   reduced in X direction
# wNGXhalf            gamma point only reduced in X direction
# avoidalloc          avoid ALLOCATE if possible
# PGF90               work around some for some PGF90 / IFC bugs
# CACHE_SIZE          1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV        use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV        use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn                 MD package of Tomas  Bucko
#-----------------------------------------------------------------------

### Using "avoidalloc" requires adding a new source file to set stacksize limit,
### and modifying main.F
###
### DGEMM uses cache tiling and reuse i.e. faster than DGEMV
### See http://stackoverflow.com/questions/18410162/dgemm-or-dgemv-for-matrix-multiplication
### This contradicts advice from original VASP makefile 
### A single test indicated that DGEMV is actually faster than DGEMM

# setting -DRPROMU_DGEMV  -DRACCMU_DGEMV in the CPP lines usually speeds up program execution
CPPFLAGS = -DHOST=\"LinuxIFC\" \
           -DCACHE_SIZE=12000 \
           -DNGXhalf \
           -DRPROMU_DGEMV -DRACCMU_DGEMV

###
### pattern rules to replace old suffix rules
###

### The old macro "SUFFIX=.f90" has been removed.

### cancel %.o dependency of %.F since we require pre-processed .f90
%.o : %.F

### rule to generate a .f90 from a .F
%.f90 : %.F
    $(FPP) $(FPPFLAGS) $(CPPFLAGS) $< $@

### %.f90 files are compiled into %.o and %.mod module files
%.o %.mod : %.f90
    $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $<
    

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC = mpifort

# fortran linker
FCL = $(FC)

#-----------------------------------------------------------------------
# general fortran flags  (there must a trailing blank on this line)
# byterecl is strictly required for ifc, since otherwise
# the WAVECAR file becomes huge
#-----------------------------------------------------------------------

### Check the Intel Link Line Advisor for MKL-specific includes
FFLAGS = -free -names lowercase -assume byterecl \
         -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include 

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK  SSE1 optimization,  but also generate code executable on all mach.
#       xK improves performance somewhat on XP, and a is required in order
#       to run the code on older Athlons as well
# -xW   SSE2 optimization
# -axW  SSE2 optimization,  but also generate code executable on all mach.
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

### -xHost may be replaced with the appropriate -xFEATURE flag if your 
### compilation host differs from your execute host
OFLAG=-O3 -xHost -mkl

OFLAG_HIGH = $(OFLAG)
DEBUG  = -free -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS  and LAPACK
# we recommend to use mkl, that is simple and most likely 
# fastest in Intel based machines
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------

# options for linking, nothing is required (usually)
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.5.2 can use fftw.3.1.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

# you may also try to use the fftw wrapper to mkl (but the path might vary a lot)
# it seems this is best for AMD based systems

### FFTW3 interface from MKL. libfftw3xf_intel.a must be compiled
### by the end user from Intel-provided source. See the MKL Reference
### Manual for instructions.
FFT3D   = fftw3d.o fft3dlib.o $(MKLROOT)/lib/intel64/libfftw3xf_intel.a
INCS    = -I$(MKLROOT)/include/fftw 

### Check the Intel Link Line Advisor for details
### Combine all linalg stuff into the BLAS macro
### MKL sequential dynamic
BLAS = $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a \
       $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a \
       -Wl,-rpath,$(MKLROOT)/lib/intel64 -L$(MKLROOT)/lib/intel64 \
       -lmkl_scalapack_lp64 \
       -lmkl_intel_lp64 \
       -lmkl_core \
       -lmkl_sequential \
       -lmkl_blacs_intelmpi_lp64 \
       -lpthread -lm
    

#-----------------------------------------------------------------------
# libraries
#-----------------------------------------------------------------------

### libdmy.a makefile was fixed to include linpack_double.o properly
LIB   = -L../vasp.5.lib -ldmy  \
      $(LAPACK) $(BLAS)

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC=   symmetry.o symlib.o   lattlib.o  random.o   

### the order of .o files here matters
SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o  mgrid.o  xclib.o  vdw_nl.o  xclib_grad.o \
         radial.o   pseudo.o   gridq.o     ebs.o  \
         mkpoints.o wave.o     wave_mpi.o  wave_high.o  spinsym.o \
         $(BASIC)   nonl.o     nonlr.o    nonl_high.o dfast.o    choleski2.o \
         mix.o      hamil.o    xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         constrmag.o cl_shift.o relativistic.o LDApU.o \
         paw_base.o metagga.o  egrad.o    pawsym.o   pawfock.o  pawlhf.o   rhfatm.o  hyperfine.o paw.o   \
         mkpoints_full.o       charge.o   Lebedev-Laikov.o  stockholder.o dipol.o    pot.o \
         dos.o      elf.o      tet.o      tetweight.o hamil_rot.o \
         chain.o    dyna.o     k-proj.o    sphpro.o    us.o  core_rel.o \
         aedens.o   wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    hamil_high.o  rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         chgloc.o   fast_aug.o fock_multipole.o  fock.o  mkpoints_change.o sym_grad.o \
         mymath.o   internals.o npt_dynamics.o   dynconstr.o dimer_heyden.o dvvtrajectory.o subdftd3.o \
         vdwforcefield.o nmr.o      pead.o     subrot.o   subrot_scf.o  paircorrection.o \
         force.o    pwlhf.o    gw_model.o optreal.o  steep.o    davidson.o  david_inner.o \
         electron.o rot.o  electron_all.o shm.o    pardens.o  \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o elpol.o    \
         hamil_lr.o rmm-diis_lr.o  subrot_cluster.o subrot_lr.o \
         lr_helper.o hamil_lrf.o   elinear_response.o ilinear_response.o \
         linear_optics.o \
         setlocalpp.o  wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
         gauss_quad.o m_unirnk.o minimax_tabs.o minimax.o \
         mlwf.o     ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o \
         local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o \
         bse_te.o bse.o acfdt.o chi.o sydmat.o \
         lcao_bare.o wnpr.o dmft.o \
         rmm-diis_mlr.o  linear_response_NMR.o wannier_interpol.o linear_response.o  auger.o getshmem.o \
         dmatrix.o

### the order of dependencies here matters
### and the order of linking also matters - must be the same order as the dependencies
vasp: $(SOURCE) $(FFT3D) main.o
    $(FCL) $(FFLAGS) -o $@ $^ $(LIB) $(LINK)

makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F
    $(FCL) -o $@  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)

zgemmtest: zgemmtest.o base.o random.o
    $(FCL) -o $@ $(LINK) zgemmtest.o random.o base.o $(LIB)

dgemmtest: dgemmtest.o base.o random.o
    $(FCL) -o $@ $(LINK) dgemmtest.o random.o base.o $(LIB) 

ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D)
    $(FCL) -o $@ $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)

kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F
    $(FCL) -o $@ $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:  
    -rm -f *.g *.f *.o *.L *.mod

#main.o: main.f90
#   $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $<

xcgrad.o: xcgrad.f90
    $(FC) $(FFLAGS) $(INLINE) $(INCS) -c $<

xcspin.o: xcspin.f90
    $(FC) $(FFLAGS) $(INLINE) $(INCS) -c $<

makeparam.o: makeparam.f90
    $(FC) $(FFLAGS) $(DEBUG)  $(INCS) -c $<

makeparam.f90: makeparam.F main.F 

#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

# special rules
#-----------------------------------------------------------------------
# these special rules have been tested for ifc.11 and ifc.12 only

fft3dlib.o : fft3dlib.f90
    $(FC) $(FFLAGS) -O2 -c $<

fft3dfurth.o : fft3dfurth.f90
    $(FC) $(FFLAGS) -O1 -c $<

fftw3d.o : fftw3d.f90
    $(FC) $(FFLAGS) -O1 $(INCS) -c $<

fftmpi.o : fftmpi.f90
    $(FC) $(FFLAGS) -O1 -c $<

fftmpiw.o : fftmpiw.f90
    $(FC) $(FFLAGS) -O1 $(INCS) -c $<

wave_high.o : wave_high.f90
    $(FC) $(FFLAGS) -O1 -c $<

# the following rules are probably no longer required (-O3 seems to work)
#wave.o : wave.F
#   $(CPP)
#   $(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

#paw.o : paw.F
#   $(CPP)
#   $(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

#cl_shift.o : cl_shift.F
#   $(CPP)
#   $(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

#us.o : us.F
#   $(CPP)
#   $(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

#LDApU.o : LDApU.F
#   $(CPP)
#   $(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

Example job script snippet

Since OpenMPI has Grid Engine integration, the number of nodes need not be specified.

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -pe fixed16 128
#$ -l vendor=intel
...

. /etc/profile.d/modules.sh
module load shared
module load proteus
module load sge/univa
module load gcc/4.8.1
module load intel/composerxe/2015.1.133
module load proteus-openmpi/intel/2015/1.8.1-mlnx-ofed

### this job script assumes that it lives in the same directory as the inputs

export VASPEXE=/mnt/HA/groups/myrsrchGrp/bin/vasp_5.3.5

### do not specify no. of processes: openmpi integrates with Grid Engine and 
### pulls that information from the environment
$MPI_RUN $VASPEXE

Test Suite

A third party test suite, authored by Peter Larsson at Linköping University, is available: https://www.nsc.liu.se/~pla/vasptest/

While the official VASP documentation[6] mentions a test suite, the link leads to a nonexistent article.

See Also

References

  1. VASP official website
  2. Stack-based memory allocation article on Wikipedia
  3. Heap (programming) article on Wikipedia
  4. VASP Support Forums: Bugreports: (SOLVED) VASP 5 crashes when using several computing nodes (large memory)
  5. Name mangling article in Wikipedia
  6. VASP Manual