error in numerical_gradient

gilbert · 2017-07-20 09:35:59

Dear MOLCAS Users,

I am currently trying to obtain a caspt2 optimized geometry of the Fe[H2O]_6^2+ complex.

A parallel run using multiple processes when calculating the numerical gradients is obligatory due to the size of the system.
However, if I run the calculation using the following input file:

 >export MOLCASMEM=1500
 &GATEWAY
Title= [Fe(H2O)6]2+ initial geometry for caspt2 geometry optimization
Basis set
Fe.ano-rcc...4s2p1d..
Spherical all
 Fe     0.000000000      0.000000000      0.000000000     /Angstrom
End of basis set
Basis set
O.ano-rcc...2s1p..
Spherical all
 O1     0.000000000      2.244015000      0.000000000      /Angstrom
 O2    -2.241856000      0.000000000      0.000000000      /Angstrom
 O3     2.231856000      0.000000000      0.000000000      /Angstrom
 O4     0.000000000      0.000000000      2.235921000      /Angstrom
 O5     0.000000000      0.000000000     -2.135921000      /Angstrom
 O6     0.000000000     -2.544015000      0.000000000      /Angstrom.
End of basis set
Basis set
H.ano-rcc...1s..
Spherical all
 H1    -0.684075000     -2.624840000      0.000000000      /Angstrom
 H2     0.784075000      2.724840000      0.000000000      /Angstrom
 H3    -0.884075000      2.824840000      0.000000000      /Angstrom
 H4    -2.926742000      0.000000000     -0.679604000      /Angstrom
 H5    -2.726742000      0.000000000      0.979604000      /Angstrom
 H6     2.826742000      0.000000000     -0.779604000      /Angstrom
 H7     2.886742000      0.000000000      0.729604000      /Angstrom
 H8     0.000000000      0.885178000      2.815072000      /Angstrom
 H9     0.000000000     -0.985178000      2.715072000      /Angstrom
 H10    0.000000000      0.785178000     -2.615072000      /Angstrom
 H11    0.000000000     -0.685178000     -2.415072000      /Angstrom
 H12    0.784075000     -2.524840000      0.000000000      /Angstrom
End of basis set.
End of input

** start the optimzation
>>>  Do while

* generate integrals
 &SEWARD
 &RASSCF; Nactel=6 0 0; Inactive=39; Ras2=5;  Spin=5; CIroot=5 5 1
 &CASPT2; Imaginary=0.1; NoMult; Multistate = 1 1.
*>> export MOLCAS_PRINT=4
 &SLAPAF &END

>>>  EndDo
** end of the optimization loop

I always get MPI errors as soon as numerical gradients is invoked:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
                            MOLCAS executing module NUMERICAL_GRADIENT with 1500 MB of memory
                                              at 10:18:19 Thu Jul 20 2017
                                Parallel run using   4 nodes, running replicate-data mode
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
             


     Gradient is rotational variant!
  
 Root to use:                      1
 Number of internal degrees                                54
 Number of constraints                                      0
 Number of "hard" constraints                               0
 Number of displacements                                  108
 Effective number of displacements are                    108
  
MOLCAS error: Terminating!, code = 112
MOLCAS error: Terminating!, code = 112
MOLCAS error: Terminating!, code = 112
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 112.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

The calculation will continue without errors if I run MOLCAS in serial mode., i.e. MOLCAS_CPUS=1, albeit being very slow.

Changing to MOLCAS 8.2 did not really fix the issue. The calculations do not abort on entering numerical gradients anymore but rather give the following error messages:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
                   
                                        &NUMERICAL_GRADIENT
              
             launched 6 MPI processes, running in PARALLEL mode (work-sharing enabled)
                       available to each process: 2.5 GB of memory, 1 thread
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
 
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
 Root to use:                      1
 Number of internal degrees                                51
 Number of constraints                                      0
 Number of displacements                                  102
 Effective number of displacements                        102
 
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_

The process will continue, but only one out of the 6 threads will be used actually and I wind up again with a serial calculation.

Note however, that the following input file for a water optimization:

>>> export MOLCAS_MEM=2500
  &GATEWAY
Title=  H2O  caspt2  minimum  optimization
Basis  set
O.ANO-rcc...3s2p1d..
O  0.000000  0.000000  0.000000  Angstrom
End  of  basis
Basis  set
H.ANO-rcc...2s1p..
H1  0.000000  0.758602  0.504284  Angstrom
H2  0.000000  -0.758602  0.504284  Angstrom
End  of  basis

>>>  EXPORT  MOLCAS_MAXITER=100
>>>  Do  while

  &SEWARD
  &RASSCF;  Symmetry=1; MaxOrb=1; nActEl=8  0  0;  Inactive=1;  Ras2=6; CIroot=1 1 1
  &CASPT2;  Imaginary=0.1; MaxIt=80; Frozen=1; noMult; noProp;  MultiState= 1 1
  &SLAPAF  &END

>>>  EndDo.

Produces a converged geometry both with MOLCAS 8.2 and MOLCAS 8.0 without problems in the numerical gradients module.

Can you suggest any help with the optimization of the iron cluster?

Note1: I have already tried different compiler versions (gnu/intel) and different versions of the openMPI library (1.6.5, 1.8.8., 2.0.1) on MOLCAS 8.0, all to no avail. - MOLCAS 8.2. is compiled with a gcc/ifort mixture and openmpi 2.0.1.
Theses installations are working flawless in our daily routine.

Note2: Above input files are working very fast, since the basis set is quite small. So, if you can afford the time I encourage you to try them out on your machines to see if you can reproduce the error. The error does not depend on the basis set. I tried with a larger basis set (DZP) as well, leading to the same result.

Note3: I also tried different initial geometries with no different result. The above one was created by distorting an otherwise symmetric (d2h) geometry.

Note4: the release numbers of MOLCAS8.0 and 8.2 are "Molcas 8 service pack 1" and "170401", respectively.

Thank you very much,
Gilbert Grell

Ignacio · 2017-07-20 16:45:00

I can't reproduce it with intel tools:

module load intel/17.0.4.196 mkl/2017.u3 impi/5.1.3

./configure -compiler intel \
            -blas MKL \
                  -blas_lib BEGINLIST -Wl,--start-group -Nmkl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -Wl,--end-group ENDLIST \
            -parallel \
            -mpirun `which mpprun`

Your input file runs smoothly and I get a gradient in less than 20 minutes (MOLCAS_NPROCS=4).

This part is suspicious:

[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_

I wouldn't expect the same process failing several times... How are you running the calculation? Are you by any chance running something like "mpirun -n 4 molcas ..."?

gilbert · 2017-07-21 07:51:41

Dear Ignacio,

Thank you for the response and for putting the input files in code tags. I forgot it.
It is interesting that you mention that you cannot reproduce it with your setup.
In our group we are lacking intel MPI, so I have tried it on the university cluster using intel parallel studio 2015 and intel MPI, but the result is still the same.

I am currently compiling a version that uses MKL libraries as well to match your setup.
I will come iup with an edit of this post as soon as I have the results from the MOLCAS 8.2 version compiled with MKL and all intel tools.
The execution command from the slurm script is:

nohup molcas geom_Fe_H2O6_2+_5.inp > geom_Fe_H2O6_2+_5.out 2>&1

however, in the Symbols file it says something like:

RUNBINARY='/cluster/intel/impi/4.1.3.049//bin/mpirun -np $MOLCAS_NPROCS $program'

EDIT:

So I managed to compile MOLCAS 8.2 using intel parallel studio 2015, MKL and intel mpi version 4:

module load intel-parallel-studio-2015 intel/4.1.3.049 intel/mkl/2015.2.164

The result however ist still the same:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

                                        &NUMERICAL_GRADIENT
   
             launched 4 MPI processes, running in PARALLEL mode (work-sharing enabled)
                       available to each process: 1.5 GB of memory, 1 thread
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
  
 Root to use:                      1
 Number of internal degrees                                51
 Number of constraints                                      0
 Number of displacements                                  102
 Effective number of displacements                        102
  
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_
[ process      0]: xquit (rc =    112): _INPUT_ERROR_

I must however admit that on this cluster all numerical gradient processes run still with 100% of cpu load.
On our group cluster, I had the behavior that only one process was running with 100% and the others were idling at 5 %

I can upload the symbols file if you want to further diagnose the _INPUT_ERROR_ messages. At the moment I think that it is calculating a gradient although it has printed this _INPUT_ERROR_ messages.

Thank you very much,

Gilbert

Last edited by gilbert (2017-07-21 09:31:18)

Ignacio · 2017-07-21 12:39:56

With some gcc versions I got "_INPUT_ERROR_", but in CASPT2, not in NUMERICAL_GRADIENT, due to the extra period after "Multistate = 1 1". Try removing it and see if it helps. Still, the error was from processes 0, 1, 2, 3 (not just 0). Other than that, gcc 4.7.2 and openmpi 1.6.5 (no MKL) finished all right as well.

module load gcc/4.7.2

export PATH=$PATH:/software/mpi/openmpi/1.6.5/g472/bin

./configure -compiler gf \
            -parallel \
            -mpirun `which mpprun`

gilbert · 2017-07-26 22:34:31

Dear Ignacio,

the dot in the caspt2 input was an artifact since my editor shows spaces as dots.
I removed it and it did not change the result.

I devoted some time to this subject again in the last three days, however I had still no luck but discovered some new features.

I tried to get a MOLCAS 8.2 version running with gcc/4.7.2 and openmpi 1.6.5, however, this was not possible.
I needed to compile both gcc and openmpi from scratch myself, as both versions are not present on the university cluster.
MOLCAS 8.2 won't compile past

/home/gg114/lib/openmpi1.6.5/bin/mpif77 -c -O2 -Wuninitialized -fdefault-integer-8 -D_I8_ -DEXT_INT -I. -I../../src/Include -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_ -D_HYPER_ -D_MSYM_ -D_MOLCAS_ -I../../src/dga_util/ -I../../src/Include -cpp xabort.f
xabort.f:4.9:

      use mpi                                                           
         1
Fatal Error: Can't open module file 'mpi.mod' for reading at (1): No such file or directory
gmake[1]: *** [../../lib/libsystem_util.a(xabort.o)] Error 1
gmake[1]: Leaving directory `/home/gg114/bin/molcas_82/molcas82_gnu47_ompi165/src/system_util'
make: *** [src/system_util] Error 2

The file mpi.mod is located in /home/gg114/lib/openmpi1.6.5/lib, but somehow the compiler does not find it although the LD_LIBRARY_PATH variable is set correctly and the correct wrappers are used.

after configure came up with the following symbols file:

# ./configure options, DO ONLY CHANGE BY RERUNNING CONFIGURE.
OS='Linux-x86_64'
PLATFORM='LINUX64'
COMPILER='gf'
SPEED='safe'
TASKFARM='no'
PARALLEL='yes'
SHARED='no'
CONNECT=''
LUSTRE=''
MPI_ROOT=''
MPI_LAUNCHER='/home/gg114/lib/openmpi1.6.5/bin/mpirun'
MPI_LAUNCHER_ARGS=''
ADRMODE='64'
USEOMP=''
USEDFLAGS='-compiler gf -parallel -mpirun which mpirun'

# Machine.
HW='x86_64'

# Standard commands.
SH='/usr/bin/sh'
MAKE='/usr/bin/gmake'
CP='/usr/bin/cp'
MV='/usr/bin/mv'
RM='/usr/bin/rm'
LS='/usr/bin/ls'
TR='/usr/bin/tr'
AWK='/usr/bin/awk'
SED='/usr/bin/sed'
GREP='/usr/bin/grep'
HEAD='/usr/bin/head'
CHMOD='/usr/bin/chmod'
FIND='/usr/bin/find'
MKDIR='/usr/bin/mkdir'
LN='/usr/bin/ln'
SOFTLINK='-L'
WC='/usr/bin/wc'
MORE='/usr/bin/more'
CAT='/usr/bin/cat'
MKAR='/usr/bin/ar crU'
UUENCODE=''
PERL='/usr/bin/perl'
UNALIAS='unalias -a'
RANLIB='/usr/bin/ranlib'

# Compilers.
PPFLAGS='-D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_HYPER_ -D_MSYM_ -D_MOLCAS_  -I../../src/dga_util/ -I../../src/Include'
CPP='/home/gg114/compiler/gcc47/bin/cpp'
CPPFLAGS='-P -C -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_HYPER_ -D_MSYM_ -D_MOLCAS_  -I../../src/dga_util/ -I../../src/Include'
F77='/home/gg114/lib/openmpi1.6.5/bin/mpif77'
F77FLAGS=' -O2 -Wuninitialized  -fdefault-integer-8 -D_I8_ -DEXT_INT  -I. -I../../src/Include -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_HYPER_ -D_MSYM_
F77NOWARN=''
F77STATIC='-fno-automatic -finit-local-zero'
F90='/home/gg114/lib/openmpi1.6.5/bin/mpif77'
F90FLAGS=' -O2 -Wuninitialized  -fdefault-integer-8 -D_I8_ -DEXT_INT  -I. -I../../src/Include -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_HYPER_ -D_MSYM_
F90MOD='-J'
F90ENABLE=''
FPREPROC='f'
PCOMPILER='/home/gg114/lib/openmpi1.6.5/bin/mpicc'
CC='/home/gg114/lib/openmpi1.6.5/bin/mpicc'
CFLAGS=' -std=gnu99 -O2 -Wuninitialized -D_I8_ -m64  -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_HYPER_ -D_MSYM_ -D_MOLCAS_  -I../../src/dga_util/ -I../.
SFLAGS=''
LDFLAGS=' '
CLDFLAGS=''
OGLFLAGS='-lGL -lglut -lGLU -lm'
PLDFLAGS=''
PCFLAGS=''
DEMO=''
GARBLE=''
BOUND=''
GCOV=''
MOLCASWIN32='no'

# External libraries.
XLIB=''

# Molcas.
INCDIR='../../src/Include'
EXTRAINCDIR=''
PRGM_LIST='  src/alaska src/averd src/caspt2 src/ccsdt src/chcc src/check src/cht3 src/cmocorr src/cpf src/dynamix src/embq src/expbas src/falcon
SUPER_MOD='   src/casvb src/espf src/last_energy src/loprop src/numerical_gradient  '
UTIL_LIST='  src/alaska_util src/amfi_util src/aniso_util src/blas_util src/casvb_util src/ccsd_util src/ccsort_util src/cct3_util src/cholesky_ut
UTIL_LIBS=' -lalaska_util -lamfi_util -laniso_util -lblas_util -lcasvb_util -lccsd_util -lccsort_util -lcct3_util -lcholesky_util -ldft_util -ldga
EXTERNAL_LIST=''
MANUALS='manual'
MOLCASDRIVER='/users/gg114/bin'
DEFMOLCASMEM='2048'
DEFMOLCASDISK='20000'

# Global arrays.
GAINC='../../src/dga_util/'
GALIB=' '

# Commands for running executables.
RUNSCRIPT='$program  $input'
RUNBINARY='/home/gg114/lib/openmpi1.6.5/bin/mpirun -np $MOLCAS_NPROCS $program'

# Quietness.
QUIET='no'

interestingly a cmake build worked out, however then the program execution hangs in caspt2 after three iterations and just stops producing anything for like an hour after which I aborted the run, since it should converge in a matter of seconds.

I also tried to compile MOLCAS 8.0 with the aforementioned setup, which worked fine and gave me the following symbols file:

# Molcas build symbols generated by ./configure on Tue Jul 25 11:40:37 CEST 2017 for Molcas version 8.0 patch level 15-06-18.
# ./configure options, DO ONLY CHANGE BY RERUNNING CONFIGURE.
OS='Linux-x86_64'
PLATFORM='LINUX64'
COMPILER='gf'
SPEED='safe'
PARALLEL='yes'
SHARED='no'
MSGPASS='ompi'
CONNECT=''
LUSTRE=''
PAR_ROOT=''
PAR_LIB='/home/gg114/lib/openmpi1.6.5/lib/'
PAR_INC='/home/gg114/lib/openmpi1.6.5/include/'
PAR_RUN='/home/gg114/lib/openmpi1.6.5/bin/mpirun'
PAR_ARGS=' '
ADRMODE='64'
USEOMP=''
USEDFLAGS='-parallel ompi -par_run /home/gg114/lib/openmpi1.6.5/bin/mpirun -par_inc /home/gg114/lib/openmpi1.6.5/include/ -par_lib /home/gg114/lib

# Machine.
HW='x86_64'

# Standard commands.
SH='/usr/bin/sh'
MAKE='/usr/bin/gmake'
CP='/usr/bin/cp'
MV='/usr/bin/mv'
RM='/usr/bin/rm'
LS='/usr/bin/ls'
TR='/usr/bin/tr'
AWK='/usr/bin/awk'
SED='/usr/bin/sed'
GREP='/usr/bin/grep'
HEAD='/usr/bin/head'
CHMOD='/usr/bin/chmod'
FIND='/usr/bin/find'
MKDIR='/usr/bin/mkdir'
LN='/usr/bin/ln'
SOFTLINK='-L'
WC='/usr/bin/wc'
MORE='/usr/bin/more'
CAT='/usr/bin/cat'
AR='/usr/bin/ar'
UUENCODE=''
PERL='/usr/bin/perl'
UNALIAS='unalias -a'
RANLIB='/usr/bin/ranlib'

# Compilers.
PPFLAGS='-cpp -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_MOLCAS_ -I${GAINC} -I${INCDIR} -I/home/gg114/lib/openmpi1.6.5/include/'
CPP='/home/gg114/compiler/gcc47/bin//cpp'
CPPFLAGS='-P -C -cpp -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_MOLCAS_ -I${GAINC} -I${INCDIR} -I/home/gg114/lib/openmpi1.6.5/include/'
F77='/home/gg114/compiler/gcc47/bin//gfortran'
F77FLAGS=' -I/home/gg114/lib/openmpi1.6.5/include/ -O2 -Wuninitialized  -fdefault-integer-8 -D_I8_ -DEXT_INT  -I. -I../Include -cpp -D_GNU_ -D_LIN
F77NOWARN=''
F77STATIC='-fno-automatic -finit-local-zero'
F90='/home/gg114/compiler/gcc47/bin//gfortran'
F90FLAGS=' -I/home/gg114/lib/openmpi1.6.5/include/ -O2 -Wuninitialized  -fdefault-integer-8 -D_I8_ -DEXT_INT  -I. -I../Include -cpp -D_GNU_ -D_LIN
F90MOD='-I'
F90ENABLE='YES'
FPREPROC='f'
PCOMPILER='/home/gg114/compiler/gcc47/bin//gcc'
CC='/home/gg114/compiler/gcc47/bin//gcc'
CFLAGS=' -I/home/gg114/lib/openmpi1.6.5/include/ -O2 -Wuninitialized -D_I8_ -m64  -cpp -D_GNU_ -D_LINUX_ -D_MOLCAS_MPP_  -D_MOLCAS_ -I${GAINC} -I$
SFLAGS=''
LDFLAGS=' '
CLDFLAGS=''
OGLFLAGS='-lGL -lglut -lGLU -lm'
PLDFLAGS=''
PCFLAGS=''
DEMO=''
GARBLE=''
BOUND=''
MOLCASWIN32='no'

# External libraries.
XLIB='-L/home/gg114/lib/openmpi1.6.5/lib/ -lmpi -lmpi_f77 '

# Molcas.
INCDIR='../Include'
PRGM_LIST='  gateway seward scf rasscf check  alaska averd caspt2 ccsdt chcc cht3 ciiscmng cmocorr cpf dimerpert dynamix embq expbas falcon ffpt f
SUPER_MOD='   casvb espf loprop numerical_gradient last_energy  '
UTIL_LIST='  alaska_util amfi_util blas_util casvb_util ccsd_util ccsort_util cct3_util cholesky_util clones_util dft_util dkh_old_util dkh_util e
UTIL_LIBS=' -lalaska_util -lamfi_util -lblas_util -lcasvb_util -lccsd_util -lccsort_util -lcct3_util -lcholesky_util -lclones_util -ldft_util -ldk
MANUALS='manual'
MOLCASDRIVER='/users/gg114/bin'
DEFMOLCASMEM='2048'
DEFMOLCASDISK='20000'

# Global arrays.
GAINC='../../src/mpi_util/'
GALIB=' '

# Commands for running executables.
RUNSCRIPT='$program  $input'
RUNBINARY='/home/gg114/lib/openmpi1.6.5/bin/mpirun -np $CPUS $program'

# Quietness.
QUIET='no'

This molcas 8.0 installation showed the same behaviour as before, i.e. MOLCAS error 112 and termination of the MPI processes.
If you find it useful, here are the slurm submission scripts that I used to run the calculations:

#!/bin/bash
#SBATCH --partition=compute
#SBATCH -J geom_Fe_H2O6_2+_5.$SLURM_JOB_ID
#SBATCH -o geom_Fe_H2O6_2+_5.info
#SBATCH -e geom_Fe_H2O6_2+_5.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --mem=64000
#SBATCH --time=0-1:0:0
#module load intel-parallel-studio-2015 intel/4.1.3.049
export PATH=/home/gg114/compiler/gcc47/bin:$PATH
source /home/gg114/gcc-4.7.2-ompi-1.6.5.sh
export CALC_DIR=/scratch/$USER-$SLURM_JOB_ID
#export MOLCAS=/home/gg114/bin/molcas_80/molcas80_SP1_gcc472_ompi_165
export MOLCAS=/home/gg114/bin/molcas_82/molcas82_gnu47_ompi165_build
#export MOLCAS=/home/gg114/bin/molcas_82/molcas82_intel
export WorkDir=$CALC_DIR
export MolcasCurrDir=$SLURM_SUBMIT_DIR
export MOLCAS_NPROCS=4
#export MOLCAS_CPUS=4
echo "Job was started on Node: $SLURM_NODELIST"
echo "JobID was: $SLURM_JOB_ID"
echo "User id used is: $USER"
echo "Submit Directory is $SLURM_SUBMIT_DIR"
# perform calculations
molcas geom_Fe_H2O6_2+_5.inp > geom_Fe_H2O6_2+_5.out 2>&1
mkdir /data/$USER/tmp/Molcas_tmp_$SLURM_JOB_ID
cp -rp $CALC_DIR/* /data/$USER/tmp/Molcas_tmp_$SLURM_JOB_ID
#end-run-section

I have the impression that at some point I must be doing something very different from you when installing or running molcas, as this error seems to persist through different versions and compilers as well as machines on my side, while it does not show up at all on your side.
Can you suggest any further things to try or investigate?

Thanks in Advance,

Gilbert

Last edited by gilbert (2017-07-26 22:49:18)

Ignacio · 2017-07-27 13:25:59

gilbert wrote:

The file mpi.mod is located in /home/gg114/lib/openmpi1.6.5/lib, but somehow the compiler does not find it although the LD_LIBRARY_PATH variable is set correctly and the correct wrappers are used.

I don't think LD_LIBRARY_PATH helps in finding module files at compile time. Run "/home/gg114/lib/openmpi1.6.5/bin/mpif77 -showme" and see if it has "/home/gg114/lib/openmpi1.6.5/lib" in an -I or -J flag.

Can you suggest any further things to try or investigate?

Well, this is my slurm script:

#!/bin/bash
#SBATCH -t 0-10:00:00
#SBATCH -J "test"
#SBATCH -n 4
##SBATCH --tasks-per-node=1
#SBATCH --mem-per-cpu=4000

module load gcc/4.7.2
export MOLCAS=/home/x_ignfe/molcas-8.2

[ -n "$SLURM_SUBMIT_DIR" ] || SLURM_SUBMIT_DIR=$PWD
cd $SLURM_SUBMIT_DIR
export OMP_NUM_THREADS=1

export MOLCAS_OUTPUT=Save
export MOLCAS_REDUCE_PRT=NO
export MOLCAS_MOLDEN=ON
export WorkDir=$SNIC_TMP
export MOLCAS_NPROCS=$SLURM_NTASKS
export MOLCAS_MEM=$SLURM_MEM_PER_CPU

export Project=Fe

molcas Fe.input >& Fe.output

Do you get, in your $WorkDir (you may need to check while the job is running), subdirectories named tmp_*? If so, try to save all the contents and see if there's anything in the stdout files that should be there. If not, somehow your parallel environment is failing very early.

gilbert · 2017-07-27 15:18:52

Investigating the stdout files was a very good hint!

I looked into the tmp_ directories and investigated the stdout files for molcas 8.0 and molcas 8.2.
The behaviour of the calculations was as described before, i.e. MOLCAS 8.0 did abort and MOLCAS 8.2 continues but prints errors.
Here are the last lines from the stdout outputs taken from the $WorkDir/tmp_ directories:

Everything seems nice until for MOLCAS 8.0:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
                            MOLCAS executing module NUMERICAL_GRADIENT with 2048 MB of memory
                                              at 23:33:41 Wed Jul 26 2017
                                Parallel run using   4 nodes, running replicate-data mode
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
      
      
 Root to use:                     1
 Number of internal degrees                               51
 Number of constraints                                     0
 Number of "hard" constraints                              0
 Number of displacements                                 102
 Effective number of displacements are                   102
             
EOF reached for file=fort.17
      
 ###############################################################################
 ###############################################################################
 ###                                                                         ###
 ###                                                                         ###
 ###    Error in Get_Ln                                                      ###
 ###                                                                         ###
 ###                                                                         ###
 ###############################################################################
 ###############################################################################

and for MOLCAS 8.2, compiled with intel, no MKL:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

                                        &NUMERICAL_GRADIENT

             launched 4 MPI processes, running in PARALLEL mode (work-sharing enabled)
                       available to each process: 1.5 GB of memory, 1 thread
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

 Root to use:                      1
 Number of internal degrees                                51
 Number of constraints                                      0
 Number of displacements                                  102
 Effective number of displacements                        102

Error reading unit=      17
Line:
H   3   1   1.007825   2   2.014102   3   3.016049   1
EOF reached for unit=      17
 ###############################################################################
 ###############################################################################
 ###                                                                         ###
 ###                                                                         ###
 ###    Error in Get_Ln                                                      ###
 ###                                                                         ###
 ###                                                                         ###
 ###############################################################################
 ###############################################################################

I have also discovered that I need to issue the intel compiled MOLCAS 8.2 behind the nohup command as otherwise, caspt2 would not converge. This seems odd.

Edit: The error in numerical gradients is not touched by this behaviour, if I dismiss nohup and increase the imaginary shift to 0.2 it converges again and results in the same error.

When I compare the SLURM files I find no difference that seems to be inherently important. But I will also try to set OMP_NUM_THREADS=1

Edit concerning the compiler flags:

indeed the path is missing for mpif77. I do not understand why I must admit.
mpif77 -showme
gfortran -I/home/gg114/lib/openmpi1.6.5/include -pthread -L/home/gg114/lib/openmpi1.6.5/lib -lmpi_f77 -lmpi -ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil -lm -ldl

Interestingly, it is present on mpif90.
mpif90 -showme
gfortran -I/home/gg114/lib/openmpi1.6.5/include -pthread -I/home/gg114/lib/openmpi1.6.5/lib -L/home/gg114/lib/openmpi1.6.5/lib -lmpi_f90 -lmpi_f77 -lmpi -ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil -lm -ldl

I will add the additional include flag to the compiler flags in the Symbols file. this should help then

Best, Gilbert

Last edited by gilbert (2017-07-27 15:40:32)

Ignacio · 2017-07-27 16:18:41

gilbert wrote:

indeed the path is missing for mpif77. I do not understand why I must admit.
mpif77 -showme
gfortran -I/home/gg114/lib/openmpi1.6.5/include -pthread -L/home/gg114/lib/openmpi1.6.5/lib -lmpi_f77 -lmpi -ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil -lm -ldl
Interestingly, it is present on mpif90.
mpif90 -showme
gfortran -I/home/gg114/lib/openmpi1.6.5/include -pthread -I/home/gg114/lib/openmpi1.6.5/lib -L/home/gg114/lib/openmpi1.6.5/lib -lmpi_f90 -lmpi_f77 -lmpi -ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil -lm -ldl

Doh! I had actually seen that a few days ago due to some other problem, but failed to make the connection. The reason seems to be that modules are an f90 feature and not f77... As a workaround you could modify the cfg/gf.comp file and add "mpif90" before "mpif77", so the mpif90 wrapper will be used instead of "mpif77".

For the rest... I'm clueless at the moment.

Molcas Forum

Announcement

#1 2017-07-20 09:35:59

error in numerical_gradient

#2 2017-07-20 16:45:00

Re: error in numerical_gradient

#3 2017-07-21 07:51:41

Re: error in numerical_gradient

#4 2017-07-21 12:39:56

Re: error in numerical_gradient

#5 2017-07-26 22:34:31

Re: error in numerical_gradient

#6 2017-07-27 13:25:59

Re: error in numerical_gradient

#7 2017-07-27 15:18:52

Re: error in numerical_gradient

#8 2017-07-27 16:18:41

Re: error in numerical_gradient

Board footer