Molcas Forum

Support and discussions for Molcas and OpenMolcas users and developers

You are not logged in.

Announcement

Welcome to the Molcas forum.

Please note: The forum's URL has changed. The new URL is: https://molcasforum.univie.ac.at. Please update your bookmarks!

You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".

#1 2020-02-24 16:33:47

lijingbai2009
Member
Registered: 2019-08-28
Posts: 29

Some issue of running Molcas with Slurm and on AMD CPU.

Hi Molcas developers,

I have been noticing an issue that the MCLR.exe processes were always traped in status D (uninterruptable sleep) when I run multiple Molcas calculations (such 28) through Slurm on our cluster. Usually, this only happens when I run multiple molcas on the same machine and when this happens, all Molcas are running slow. The system administrators have found that it could be some conflict between Molcas and the GPFS system because of the I/O process, where all my calculations were actually running in /scratch folder instead of the local machine. Currently, I can transfer my calculations to a local machine folder like /tmp, so Molcas can run normally. I am wondering if there are some useful hints for running Molcas without having conflicts with the GPFS system?

The second issue happens when we run Molcas on our new 64-treads AMD CPUs (EPYC 7452 32-Core Processor). The single Molcas calculation runs well. But when I run multiple calculations, for instance, I wrapped one Molcas calculation inside a Shell function and then looped the function 64 times in the background, the pymolcas process showed up and disappeared but the gateway.exe never started and all my calculations hanged, which also halted the machine. We have to reboot the machine to stop it. Later, I added a 5-second delay in the loop and it seems to make Molcas back to work. Does anyone know what happened?

Thanks!

Last edited by lijingbai2009 (2020-02-24 16:38:09)

Offline

#2 2020-02-27 10:08:53

valera
Administrator
Registered: 2015-11-03
Posts: 124

Re: Some issue of running Molcas with Slurm and on AMD CPU.

lijingbai2009,
when it comes to parallelization Molcas and OpenMolcas are quite different. So, please, be specific which code and which version do you use.

For your first question - I just would like to double check - do you use WorkDir? if not - your description is quite correct. Please, read documentation.

For the second question - be sure that you have molcas inputs in separate directories. If it still will be the case - please, report a bug.

Offline

#3 2020-05-19 16:26:23

lijingbai2009
Member
Registered: 2019-08-28
Posts: 29

Re: Some issue of running Molcas with Slurm and on AMD CPU.

Hi Valera,

Thanks for your response and sorry for my late reply.

1. I haven't used WorkDir but used MOLCAS_PROJECT and MOLCAS_WORKDIR. Did that make any trouble?

2. Yes, all molcas inputs are in separate directories with their orbitals files. They were just run by one script.

Offline

#4 2020-05-28 08:54:08

valera
Administrator
Registered: 2015-11-03
Posts: 124

Re: Some issue of running Molcas with Slurm and on AMD CPU.

Hi,
if you use MOLCAS_WORKDIR, WorkDir is created as $MOLCAS_WORKDIR/$Project. So, if MOLCAS_WORKDIR is pointing to a shared (not local disk) - all temporary files are created via network. Not a good idea for performance..

Offline

Board footer

Powered by FluxBB 1.5.11

Last refresh: Today 12:03:40