Support and discussions for Molcas and OpenMolcas users and developers
You are not logged in.
Please note: The forum's URL has changed. The new URL is: https://molcasforum.univie.ac.at. Please update your bookmarks!
You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".Hello,
I compiled and installed the MPI version of Molcas 8.0. It works well, but when I try to restart or continue a calculation using a different number of CPUs, the program stops with an I/O error. In particular I did CASSCF with 8 CPUs and I tried a CASPT2 with 2 CPUs and I got:
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
MOLCAS executing module CASPT2 with 28000 MB of memory
at 12:07:26 Mon Jan 11 2016
Parallel run using 2 nodes, running replicate-data mode
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
++ I/O STATISTICS
I. General I/O information
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Unit Name Flsize Write/Read MBytes Write/Read
(MBytes) Calls In/Out Time, sec.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 RUNFILE 15.44 . 21/ 66 . 0.1/ 0.5 . 0/ 0
2 LUSOLV 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
3 LUSBT 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
4 LUHLF1 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
5 LUHLF2 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
6 LUHLF3 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
7 DRARR 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
8 DRARRT 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
9 RHS_01 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
10 RHS_02 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
11 RHS_03 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
12 RHS_04 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
13 RHS_05 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
14 RHS_06 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
15 H0T_01 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
16 H0T_02 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
17 H0T_03 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
18 H0T_04 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
19 LUDMAT 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
20 JOBIPH 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
21 JOBMIX 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
22 LUCIEX 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
23 MOLONE 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
24 MOLINT 0.00 . 0/ 0 . 0.0/ 0.0 . 0/ 0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
* TOTAL 15.44 . 21/ 66 . 0.1/ 0.5 . 0/ 0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
II. I/O Access Patterns
- - - - - - - - - - - - - - - - - - - -
II. I/O Access Patterns
- - - - - - - - - - - - - - - - - - - -
Unit Name % of random
Write/Read calls
- - - - - - - - - - - - - - - - - - - -
1 RUNFILE 28.6/ 7.6
2 LUSOLV 0.0/ 0.0
3 LUSBT 0.0/ 0.0
4 LUHLF1 0.0/ 0.0
5 LUHLF2 0.0/ 0.0
6 LUHLF3 0.0/ 0.0
7 DRARR 0.0/ 0.0
8 DRARRT 0.0/ 0.0
9 RHS_01 0.0/ 0.0
10 RHS_02 0.0/ 0.0
11 RHS_03 0.0/ 0.0
12 RHS_04 0.0/ 0.0
13 RHS_05 0.0/ 0.0
14 RHS_06 0.0/ 0.0
15 H0T_01 0.0/ 0.0
16 H0T_02 0.0/ 0.0
17 H0T_03 0.0/ 0.0
18 H0T_04 0.0/ 0.0
19 LUDMAT 0.0/ 0.0
20 JOBIPH 0.0/ 0.0
21 JOBMIX 0.0/ 0.0
22 LUCIEX 0.0/ 0.0
23 MOLONE 0.0/ 0.0
24 MOLINT 0.0/ 0.0
- - - - - - - - - - - - - - - - - - - -
--
###############################################################################
###############################################################################
### ###
### ###
### Location: AixRd ###
### File: JOBIPH ###
### ###
### ###
### Premature abort while reading buffer from disk ###
### End of file reached ###
### ###
### ###
### ###
### ###
###############################################################################
###############################################################################
This not happens when I use the same number of CPUs I used in the CASSCF calculation, i.e. 8 CPUs. How to solve that?
Thanks
Francesco
Offline
In parallel run, Molcas creates temporary files in specific directories $WorkDir/tmp_001, _002 etc. I would say that in order to "continue" a calculation, one should have identical structure of directories - not only the total number, but the same order.
Offline
In parallel, certain data is spread over different processes, which means you need the same data layout to continue a calculation.
In practice, the only data that is spread out is integrals from seward. So if you want to run a CASSCF and a CASPT2 with a different amount of processes, you would need to rerun gateway/seward.
Always check the orbitals.
Offline
I did a test copying the .JobIph file from the previous CASSCF calculation in the scratch directory and rerunning gataway/seward, but I got the same error in the tmp_1/stderr._______1 file. Here is my input.
&GATEWAY
BASLIB = /users/p0880/talotta/rupy4clno
coord= geom.xyz
Basis= VTZP,Ru.ECP.STUTTGART.8s7p6d1f.6s5p3d1f.
Group= nosym
&SEWARD
LOW Cholesky
&CASPT2
Title
Multi State CASPT2 starting from CASSCF(16,13) stv3 wavefuntion
Multistate
3 1 2 3
Maybe I'm missing something?
Offline
I did, no exactly the same, but like that:
- First a parallel calculation with 8 CPUS
- Later, I restarted my calculation in serial mode just for CASPT2.
In order to restart the calculation, I copied Work directory, and used this input:
&gateway
coord=$HomeDir/koko.xyz
basis=ano-s-vdzp
group=nosymm
ricd
&seward
&caspt2
multi=1 1
nomulti
ipea=0.00
And it worked fine.
I was supossed that we don't need all the files, only JopIph, RasOrb and Cholesky ones ... but just in case I copied all of them.
C U
Vicente.
Offline