Molcas MPI Calculation Restart

chiteo · 2016-01-11 12:21:59

Hello,

I compiled and installed the MPI version of Molcas 8.0. It works well, but when I try to restart or continue a calculation using a different number of CPUs, the program stops with an I/O error. In particular I did CASSCF with 8 CPUs and I tried a CASPT2 with 2 CPUs and I got:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
                                 MOLCAS executing module CASPT2 with 28000 MB of memory
                                              at 12:07:26 Mon Jan 11 2016
                                Parallel run using   2 nodes, running replicate-data mode
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()


++ I/O STATISTICS

  I. General I/O information
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Unit  Name          Flsize      Write/Read            MBytes           Write/Read
                      (MBytes)       Calls              In/Out           Time, sec.
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
   1  RUNFILE          15.44 .      21/      66 .      0.1/      0.5 .       0/       0
   2  LUSOLV            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   3  LUSBT             0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   4  LUHLF1            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   5  LUHLF2            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   6  LUHLF3            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   7  DRARR             0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   8  DRARRT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   9  RHS_01            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  10  RHS_02            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  11  RHS_03            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  12  RHS_04            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  13  RHS_05            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  14  RHS_06            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  15  H0T_01            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  16  H0T_02            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  17  H0T_03            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  18  H0T_04            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  19  LUDMAT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  20  JOBIPH            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  21  JOBMIX            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  22  LUCIEX            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  23  MOLONE            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  24  MOLINT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
   *  TOTAL            15.44 .      21/      66 .      0.1/      0.5 .       0/       0
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  II. I/O Access Patterns
  - - - - - - - - - - - - - - - - - - - -


  II. I/O Access Patterns
  - - - - - - - - - - - - - - - - - - - -
  Unit  Name               % of random
                         Write/Read calls
  - - - - - - - - - - - - - - - - - - - -
   1  RUNFILE             28.6/   7.6
   2  LUSOLV               0.0/   0.0
   3  LUSBT                0.0/   0.0
   4  LUHLF1               0.0/   0.0
   5  LUHLF2               0.0/   0.0
   6  LUHLF3               0.0/   0.0
   7  DRARR                0.0/   0.0
   8  DRARRT               0.0/   0.0
   9  RHS_01               0.0/   0.0
  10  RHS_02               0.0/   0.0
  11  RHS_03               0.0/   0.0
  12  RHS_04               0.0/   0.0
  13  RHS_05               0.0/   0.0
  14  RHS_06               0.0/   0.0
  15  H0T_01               0.0/   0.0
  16  H0T_02               0.0/   0.0
  17  H0T_03               0.0/   0.0
  18  H0T_04               0.0/   0.0
  19  LUDMAT               0.0/   0.0
  20  JOBIPH               0.0/   0.0
  21  JOBMIX               0.0/   0.0
  22  LUCIEX               0.0/   0.0
  23  MOLONE               0.0/   0.0
  24  MOLINT               0.0/   0.0
  - - - - - - - - - - - - - - - - - - - -
--
 ###############################################################################
 ###############################################################################
 ###                                                                         ###
 ###                                                                        ###
 ###    Location: AixRd                                              ###
 ###    File: JOBIPH                                                   ###
 ###                                                                         ###
 ###                                                                         ###
 ###    Premature abort while reading buffer from disk ###
 ###    End of file reached                                          ###
 ###                                                                         ###
 ###                                                                         ###
 ###                                                                         ###
 ###                                                                         ###
###############################################################################
 ###############################################################################

This not happens when I use the same number of CPUs I used in the CASSCF calculation, i.e. 8 CPUs. How to solve that?

Thanks

Francesco

valera · 2016-01-11 12:51:57

In parallel run, Molcas creates temporary files in specific directories $WorkDir/tmp_001, _002 etc. I would say that in order to "continue" a calculation, one should have identical structure of directories - not only the total number, but the same order.

Steven · 2016-01-11 13:04:03

In parallel, certain data is spread over different processes, which means you need the same data layout to continue a calculation.

In practice, the only data that is spread out is integrals from seward. So if you want to run a CASSCF and a CASPT2 with a different amount of processes, you would need to rerun gateway/seward.

chiteo · 2016-01-11 14:40:39

I did a test copying the .JobIph file from the previous CASSCF calculation in the scratch directory and rerunning gataway/seward, but I got the same error in the tmp_1/stderr._______1 file. Here is my input.

&GATEWAY
 BASLIB = /users/p0880/talotta/rupy4clno
 coord= geom.xyz
 Basis= VTZP,Ru.ECP.STUTTGART.8s7p6d1f.6s5p3d1f.
 Group= nosym
&SEWARD
 LOW Cholesky
&CASPT2
 Title
  Multi State CASPT2 starting from CASSCF(16,13) stv3 wavefuntion
 Multistate
  3 1 2 3

Maybe I'm missing something?

Vicente · 2016-05-02 15:56:01

I did, no exactly the same, but like that:

- First a parallel calculation with 8 CPUS
- Later, I restarted my calculation in serial mode just for CASPT2.

In order to restart the calculation, I copied Work directory, and used this input:

&gateway
   coord=$HomeDir/koko.xyz
   basis=ano-s-vdzp
   group=nosymm
   ricd

&seward

&caspt2
   multi=1 1
   nomulti
   ipea=0.00

And it worked fine.

I was supossed that we don't need all the files, only JopIph, RasOrb and Cholesky ones ... but just in case I copied all of them.

C U

Vicente.

Molcas Forum

Announcement

#1 2016-01-11 12:21:59

Molcas MPI Calculation Restart

#2 2016-01-11 12:51:57

Re: Molcas MPI Calculation Restart

#3 2016-01-11 13:04:03

Re: Molcas MPI Calculation Restart

#4 2016-01-11 14:40:39

Re: Molcas MPI Calculation Restart

#5 2016-05-02 15:56:01

Re: Molcas MPI Calculation Restart

Board footer