Molcas Forum

Support and discussions for Molcas and OpenMolcas users and developers

You are not logged in.

Announcement

Welcome to the Molcas forum.

Please note: The forum's URL has changed. The new URL is: https://molcasforum.univie.ac.at. Please update your bookmarks!

You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".

#1 2018-11-13 17:35:22

drw
Member
Registered: 2018-11-13
Posts: 6

Parnell Issues

Our users (I am a cluster admin, not a chemist!) have been running openmolcas in parallel using shared storage for workdir, but this causes big problems for the entire cluster.

I've tried to re-write the job scripts so that local storage on each node is used for MOLCAS_WORKDIR.

The job doesnt get anywhere because parnell.exe just seems to lock up.

I see something like: "parnell.exe base /local/storage" in the process table, but I'm not sure how this is controlled or specified in the users configuration.
Any advice on how to troubleshoot this would be very much appreciated.

Dan

Offline

#2 2018-11-13 18:19:01

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 1,011

Re: Parnell Issues

In my configuration I set WorkDir to the local storage, and not MOLCAS_WORKDIR. I'm not sure why, but it could be that I tried and got the same problem as you. A limitation of using local storage is that it must still be the same path name for all processes, you cannot have /local/storage in one node and /mystorage in another, for example (I don't think that's your problem, but just in case). As for parnell.exe, it is a very simple program and I think "parnell.exe base" just creates the WorkDir (and a process-specific subdirectory inside), it should be relatively easy to debug.

Offline

#3 2018-11-14 10:12:08

drw
Member
Registered: 2018-11-13
Posts: 6

Re: Parnell Issues

I'm going to try this. I had assumed that "MOLCAS_WORKDIR" was a synonym for "WorkDir" but I'm interpreting your response to mean that these are two different things! Thanks.

Offline

#4 2018-11-14 11:00:36

drw
Member
Registered: 2018-11-13
Posts: 6

Re: Parnell Issues

I'm getting nowhere fast I'm afraid.
I've got this in my jobscript:

export MOLCAS_WORKDIR=/tmp/molcas
export WorkDir=/tmp/molcas
export MYPROJECTDIR=/data/home/aaw945/weekly/molcas/CI_modified_2_Node_TMPDIR
export MOLCAS_PROJECT=QMMM_TPC_CI_modified

All nodes have a /tmp and can therefore create a "/tmp/molcas" there.

Yet on rank0 its stuck at: /share/apps/centos7/openmolcas/18.0/bin/parnell.exe base /tmp/molcas
No processes have been started on rank 1 at all! (presumably because rank0 wont bother doing so until the never completing parnell copy completes).

My users job does not seem to include a "rte" file could this be a problem?

If I run single node, no problems... it happily goes to town writing to MOLCAS_WORKDIR.

Last edited by drw (2018-11-14 11:01:15)

Offline

#5 2018-11-15 12:46:25

drw
Member
Registered: 2018-11-13
Posts: 6

Re: Parnell Issues

I've got this working!
My next question is, since I cannot seem to use uge $TMPDIR (maybe because it exists on MPI rank 0 before parnell tries to create it), is there a way to get parnell to clean up the scratch directory on all nodes when the job completes?

Offline

#6 2018-11-15 15:47:02

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 1,011

Re: Parnell Issues

Hmm... Are you by any chance calling pymolcas through mpirun/mpiexec? If that's the case, you shouldn't. You should instead call "pymolcas -np" or set the MOLCAS_NPROCS environment variable. The molcas.rte file should be in the $MOLCAS directory, and contain something like:

RUNBINARY='/usr/bin/mpiexec -n $MOLCAS_NPROCS $program'

which is how the different programs, including parnell.exe, will be run under the hood. You could manually run this, replacing "$program" with "/share/apps/centos7/openmolcas/18.0/bin/parnell.exe base /tmp/molcas" and see how this goes...

Offline

#7 2018-11-15 17:46:57

drw
Member
Registered: 2018-11-13
Posts: 6

Re: Parnell Issues

I'm calling it using "pymolcas -np ${NUMSLOTS}" as you suggest.
Furthermore, I'm (as you suggested) using something like:

WorkDir=/tmp/molcas.

This enables parnell to work beautifully and creates a /tmp/molcas folder on local disk on each node and I currently have a 120 core job chugging away happily.

When that job completes, I'll have loads of stuff left in /tmp/molcas on each node, and I want to use (I think) EMIL commands to delete /tmp/molcas once the job completes.

This is something we want to teach our users how to do and document for them.

Offline

#8 2018-11-15 17:54:55

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 1,011

Re: Parnell Issues

The way it works in my cluster is that when the queueing system starts a job it creates a local directory in each node, and removes it at the end. I set WorkDir to that directory, so I don't have to care about removing the directory (rather the opposite, I have to make sure I make a copy of anything I may need later). There is a MOLCAS_KEEP_WORKDIR variable and a -clean argument to pymolcas that could be used to remove the WorkDir after a calculation, but I don't think it works in parallel (yet).

Offline

#9 2018-11-16 10:04:41

drw
Member
Registered: 2018-11-13
Posts: 6

Re: Parnell Issues

Our cluster seems to work that way for single node jobs but maybe not in parallel.
In any case, I'm beginning to doubt that this users job is sane. smile

My 5 nodes are currently in a peculiar state - the master has a load of qrsh processes and parnells and is still running pymolcas (but doesn't seem to be doing much at all).

I can see qrsh_starter processes on some of the slaves, some slaves seem to be doing nothing.
The scheduler thinks that the 120 core job is still running and the status file says "Happy landing".

Last edited by drw (2018-11-16 10:11:47)

Offline

#10 2019-01-17 13:15:59

valera
Administrator
Registered: 2015-11-03
Posts: 124

Re: Parnell Issues

drw,
export MOLCAS_WORKDIR=/tmp/molcas
export WorkDir=/tmp/molcas
is a bad idea. MOLCAS_WORKDIR is a parent for all WorkDir's
So, if you do not define Project, it is taken from the name of your input file:
export MOLCAS_WORKDIR=/scratch/molcas
molcas -f a.inp
molcas -f b.inp
will use /scratch/molcas/a as a WorkDir for the first run, and /scratch/molcas/b for the second

Offline

Board footer

Powered by FluxBB 1.5.11

Last refresh: Today 15:24:07