Data not defined in ChoVec Address

LucaBabetto · 2019-02-26 09:53:24

Hello,

I'm running OpenMolcas on a cluster with openMPI-2.1.5 for parallelisation.

I'm having problems in running a series of CASSCF calculations. I'm studying Eu(III) complexes, I ran a GUESSORB calculation first in order to have a guess function to use for the CASSCF calculation, rotating the orbitals appropriately so the 4f electrons are in the active space. This is the generalised input file for the GUESSORB calculation:

&GATEWAY
  Title = Eu1T | Guess
  Coord = $Project.xyz
  Basis = ANO-RCC-VTZP
  Douglas-Kroll
  AMFI
  ANGMOM; 8.358049    9.499589    5.761545
&SEWARD
  Cholesky
&GUESSORB
  PrMO = 3
  PrPopulation

After that, I take the GssOrb file and run the CASSCF calculation, with an input file of the type:

&GATEWAY
  Title = Eu1T | CAS(6,7) | Septuplets
  Coord = $Project.xyz
  Basis = ANO-RCC-VTZP
  Douglas-Kroll
  AMFI
  ANGMOM; 8.358049    9.499589    5.761545
&SEWARD
  Cholesky
> COPY $CurrDir/$Project.GssOrb GUESS
&RASSCF
  Title = CAS(6,7) | SEPTUPLETS
  FileOrb = GUESS 
  Alter = 7
  1 347 357
  1 340 358
  1 343 359
  1 345 360
  1 342 361
  1 346 362 
  1 348 363  
  Charge = 0
  Spin = 7
  nActEl = 6
  RAS2 = 7
  CIRoot = 7 7 1
> COPY $Project.JobIph $CurrDir/CASSCF_SEPTUPLETS

However, the program always fails. In the output file the last lines I see are:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

                                              &RASSCF

             launched 8 MPI processes, running in PARALLEL mode (work-sharing enabled)
                        available to each process: 10 GB of memory, 1 thread
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 5 in communicator MPI_COMM_WORLD
with errorcode 128.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--- Stop Module: rasscf at Tue Feb 26 01:59:17 2019 /rc=_RC_INTERNAL_ERROR_ ---
*** files: xmldump
    saved to directory /local/home/carlotto/luca/Eu2T/molcas/cas/septuplets
--- Module rasscf spent 9 seconds ---

If I look at the stdout.* files, in SOME of them (not all) I see this error:

 ###############################################################################
 ###############################################################################
 ###                                                                         ###
 ###                                                                         ###
 ###    Location: get_iScalar                                                ###
 ###                                                                         ###
 ###                                                                         ###
 ###    Data not defined ChoVec Address                                      ###
 ###                                                                         ###
 ###                                                                         ###
 ###############################################################################
 ###############################################################################

I also tried adding the RICD keyword in GATEWAY, but the same problem occurs.

Does anyone have any idea what the problem could be? I need the Cholesky decomposition otherwise the program needs too much RAM and I can't run the calculation.

Ignacio · 2019-02-26 14:42:17

I also tried adding the RICD keyword in GATEWAY, but the same problem occurs.

Did you try with RICD in GATEWAY without Cholesky in SEWARD?

LucaBabetto · 2019-02-27 14:32:04

Yes, I tried RICD without Cholesky and it gives the same error, which is strange, since it shouldn't produce any ChoVec files, no?

Ignacio · 2019-02-27 16:07:11

No, RICD still generates Cholesky vectors (and accesses the "ChoVec Address" field). Are the MPI processes running on a single node or on separate nodes? Is the WorkDir in a shared directory or is it local to each node? Other than a bug or limitation (which is always possible), I'd look for a latency/timing/cache problem, maybe the RunFile is not fully up-to-date for all processes when RASSCF starts. Does it work with fewer processes?

LucaBabetto · 2019-02-27 16:33:58

The MPI processes are running on a single node, and the WorkDir is the local scratch directory of the node. I will try running the same calculation with 1 process and 8 threads instead of 8 processes and 1 thread each, the fact that this error only shows up on certain stdout.X files makes me think it might indeed be a timing issue where the MPI processes cannot properly communicate with each other.

LucaBabetto · 2019-03-03 17:15:15

I did run the job with only 1 process, but now the problem is I don't have enough disk space. In the scratch directory there is a 260GB purge.RVec00 file and so I run out of reserved disk space.

Is there a way to reduce the disk usage further? Some other approximation I might consider?

Ignacio · 2019-03-04 09:26:07

Maybe the problem was the same with 8 processes. Parallelization does not reduce the use of resources, quite the opposite. What could help is running processes on different nodes, with local scratch directories, such that even if the total resource needs increase, the use per node decreases.

LucaBabetto · 2019-03-04 09:40:47

Unfortunately, due to the way our cluster is set up, it's hard to request more than 1 node per job so I can't really apply your solution. I noticed that if I run several processes with 1 thread each, a bunch of large tmp folders are created, so using only 1 process with multiple threads seems to help in that regard with disk usage. Obviously the calculation time takes a big hit since most of the time the program is running in single-thread (from top), but at least I can have a bit of a boost using multithreading.

I ultimately need to run CASPT2 calculations for septuplet-quintuplet transitions on the lanthanide centre, so my workflow is:

- guessorb run to have starting orbitals, which I appropriately rotate to get the 4f orbitals in the active space
- CAS(6,7) calculating 7 state-averaged septuplet states (7F term)
- CAS(6,7) calculating 5 quintuplets and 3 triplets (5D and 3P terms, which are the ones that I saw significantly mix in test runs with smaller systems)
- CASPT2 run on all the states
- SO-RASSI to calculate the transition energies

In the literature the ANO-RCC-VTZP basis set is the one I've always seen used. Would it help saving disk space using a smaller basis set for the ligands and the VTZP set for the lanthanide? Are there other approximations I might consider?

Molcas Forum

Announcement

#1 2019-02-26 09:53:24

Data not defined in ChoVec Address

#2 2019-02-26 14:42:17

Re: Data not defined in ChoVec Address

#3 2019-02-27 14:32:04

Re: Data not defined in ChoVec Address

#4 2019-02-27 16:07:11

Re: Data not defined in ChoVec Address

#5 2019-02-27 16:33:58

Re: Data not defined in ChoVec Address

#6 2019-03-03 17:15:15

Re: Data not defined in ChoVec Address

#7 2019-03-04 09:26:07

Re: Data not defined in ChoVec Address

#8 2019-03-04 09:40:47

Re: Data not defined in ChoVec Address

Board footer