Molcas Forum

Support and discussions for Molcas and OpenMolcas users and developers

You are not logged in.

Announcement

Welcome to the Molcas forum.

Please note: The forum's URL has changed. The new URL is: https://molcasforum.univie.ac.at. Please update your bookmarks!

You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".

#1 2020-02-19 16:08:53

andrewshyichuk
Member
Registered: 2020-02-13
Posts: 80

Single-threaded RASSCF on Ryzen hexacore

Dear Users,

I've moved this topic from "Running Calculations" section.
The question below is not much of MOLCAS question, but more of a general performance question.
However, maybe some of you did have a similar problem, or just happen to know the solution.
Any comments on possible optimizations are welcome.

Thank you in advance.

So.

I've noticed a peculiar behaviour of OpenMolcas RASSCF (compiled without openMPI, single-threaded).

I run a group of similar tasks on a Ryzen AMD Ryzen 5 2600X (6 cores, 12 threads, 8192K L3 cache).
I use two NVMe disks in a RAID 0 as scratch.
System sometimes reads at 1600M/s, rasscf.exe processes read at 200-300 M/s, and I use random sleep when running them, i.e. they do not read simultaneously.
I do not use disk swap.
OpenMOLCAS compiled with GCC 7.4 and uses OpenBLAS.

When I run 9 jobs at the same time, they take about 20-24 min per iteration.
With 7 and 8 jobs, the time something like 13-18 min per iteration.
With 5 or 6 jobs at the same time, they take 10-12 min per iteration.
With 3 jobs, its 6-7 min per iteration, with 4 jobs - 7-9 min.
Single job takes 4-5 min per iteration, two jobs do the same.

The jobs threadhop like crazy, bare seconds between switching thread - as seen using ps -mo pid,tid,fname,user,psr -p <pid>

Apparently, my setup is not optimized for many threads.

Hypotheses:
1. Hyperthreads are inefficient, 1 job per physical core should fix it.
Result: nope, as seen in the time per iteration above. There is no rapid decrease in performance when switching from 6 to 7 jobs.
The efficiency gradually decreases with the number of jobs.

2. Threadhopping causes overheads.
Result: nope. I used taskset to bind processes, with exactly zero effect, or maybe even some slowdown.

3. Disk reads are not fast enough.
Result: maybe, but rather not. I gave some processes a realtime priority using ionice -c 1, which did not change much. Maybe it makes sense to reduce priority for others?

4. RAM is thrashed, kswap0 takes way too much CPU (10-30-70%).
Result: well yes, but actually no.
With 9 jobs with 7GB RAM per job they basically take the whole RAM, and take 20-24 min per iteration.
Once I had 4 jobs, and ran another one with 20GB RAM, just to cause thrashing - I did get some decrease in performance.
With MOLCASMEM=4000 MB and job having 4100 MB RAM, I've got half of the RAM empty and stable 19 min per iteration.
Most of the performance loss must thus originate from somewhere else.

5. Motherboard not fast enough.

6. Resource management is lousy.
The system is Fedora Server, without any tweaks.

7. Other options?

Offline

#2 2020-02-20 16:54:56

andrewshyichuk
Member
Registered: 2020-02-13
Posts: 80

Re: Single-threaded RASSCF on Ryzen hexacore

An update.

After some observation and thinking, the current hypothesis is the following.
rasscf.exe reads a lot of stuff from disk, resulting in large virtual memory (https://www.linuxatemyram.com).
In my case, after dropping caches, rasscf.exe recreates something like 4GB of memory cache in about 10 s.
Many instances of rasscf.exe result in an overhead in virtual memory management: empty cashe for the new stuff, cache stuff, repeat.

Solution: prevent caching for rasscf.exe whatsoever!
For that, there is a neat program, nocache, https://github.com/Feh/nocache.

But, it does not work with child processes, i.e. nocache pymolcas rasscf.input still results in caching for the rasscf.exe instance.

Thus, how can I tweak pymolcas to make it call "nocache rasscf.exe" instead of just "rasscf.exe"?

Thank you.
Andrew

Offline

#3 2020-02-21 09:05:11

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 1,080

Re: Single-threaded RASSCF on Ryzen hexacore

andrewshyichuk wrote:

Thus, how can I tweak pymolcas to make it call "nocache rasscf.exe" instead of just "rasscf.exe"?

You could try with MOLCAS_DEBUGGER=nocache, or modify molcas.rte: RUNBINARY='nocache $program'

Offline

#4 2020-02-21 11:09:08

andrewshyichuk
Member
Registered: 2020-02-13
Posts: 80

Re: Single-threaded RASSCF on Ryzen hexacore

Ignacio wrote:

... modify molcas.rte: RUNBINARY='nocache $program'

Dear Ignacio,

Thank you.
I did that, and am waiting for the result.

Is it enough to change molcas.rte in the install path, or should I change it in the build directory and run "make install" again?

Last edited by andrewshyichuk (2020-02-21 11:09:35)

Offline

#5 2020-02-21 11:13:49

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 1,080

Re: Single-threaded RASSCF on Ryzen hexacore

Just in the install path is enough.

Offline

#6 2020-06-18 12:41:13

shuoshuo
Member
From: Beijing Normal University
Registered: 2017-10-18
Posts: 35

Re: Single-threaded RASSCF on Ryzen hexacore

1.Hyper-Threading Technology increases the maximum computing speed by 30%. When you use more than 6 cores, the speed drop of a single task is inevitable.
2.When there are 6 tasks, the CPU is most likely to run at the base frequency. When the task is one, the CPU can run at a higher frequency, and the task time is naturally shorter.
Here comes my suggestions
When running tasks, always use the command "cat /proc/cpuinfo " to view the CPU frequency. You can find that the more tasks, the lower the CPU frequency
Turn off hyper-threading and use liquid nitrogen to cool the CPU to make your life easier.

Offline

Board footer

Powered by FluxBB 1.5.11

Last refresh: Today 10:46:23