vmc memory usage

Katharina Doblhoff · Post by **Katharina Doblhoff** » Wed Mar 02, 2016 3:24 pm

Dear Casino users and developers,

I am running a vmc calculation with shared memory.

Unfortunately it crashes, because it tries to take more memory than available on the node. However, I do not understand why it wants that much memory at all. My blip file is about 4G and my nodes have 65G of storage for 24procs. I can run the identical calculation using a smaller blip multiplicity without problems.

When logging into the node while running, and typing top, I can see one process taking about 6% of the memory - this is most likely the one having the blip file. But then during the vmc equilibration (or if equilibration is short a bit later), I see the memory use by the other nodes increasing and increasing until at some point the whole thing crashes. I understand that each walker (i.e. proc) needs to store some stuff (its config, the energies obtained so far,...) but that should not depend on the size of the blip file, should it? But while for the smaller blip multiplicity the memory of the procs not carrying the blip only increases up to 2.3% and then stays there stably, for the larger blip multiplicity it increases up to about 5% - and then the program crashes, since 24*5>100...

Why is more storage necessary for information other than the blips when using a the larger blip expansion and using shm? Can I do something else than using a smaller blip file, a different cluster with more storage or less procs per node?

Thanks for your help,
Katharina

Post by **Mike Towler** » Wed Mar 02, 2016 4:01 pm

Hi Katharina,

Can you describe exactly how you are running the calculation (i.e. the exact runqmc command, or whatever..). Can you also post your out file (you'll need to gzip it to attach it to a post..).

Best,
Mike

PS: haven't forgotten the other stuff you asked me privately. Hoping to get round to it tonight.

Katharina Doblhoff · Post by **Katharina Doblhoff** » Thu Mar 03, 2016 8:21 am

Hi Mike!

So here comes my submitscript:

Code: Select all

#!/bin/bash                                                                                  
#SBATCH -p normal
#SBATCH --constraint=haswell
#SBATCH -n 24
#SBATCH -c 1
#SBATCH -t 0-4:00:00
#SBATCH -N 1                                                                                 
#SBATCH -J casino                                                                            
module load mpi/impi; module load blas/mkl; module load mkl; module load fortran/intel; module load c/intel
export OMP_NUM_THREADS=1
export CASINO_NUMABLK=24

srun -n 24 /home/kdd/Software/CASINO_beta_2014-11-11/CASINO/bin_qmc/linuxpc-intel-slurm-parallel.cartesius/Shm/opt/casino 

 node=0 ; while ((node<192-1)) ; do node=$((node+1))
  if [ -s ".out_node$node" ] ; then
   echo >> "out"
   echo "--Output from node #$node--" >> "out"
   echo >> "out"
   cat ".out_node$node" >> "out"
   echo >> "out"
  fi
  rm -f ".out_node$node" >& /dev/null
 done
 if [ -s ".err" ] ; then
  echo >> "out"
  echo "--Job's stderr--" >> "out"
  echo >> "out"
  cat ".err" >> "out"
 fi
 rm -f ".err" >& /dev/null
 echo >> "out"
 echo "Job finished: $(date)" >> "out"

rm .runlockAutogen

The out-file is attached. But here is where it stops (I can shift this point by using a short equilibration):

Code: Select all

Running VMC equilibration (50000 moves).
  Performing timestep optimization.
  [CPU time: 1m elapsed, 2m15s remaining]

Job finished: Thu Mar 3 09:07:19 CET 2016

The error I get in the job-out file is:

Code: Select all

slurmstepd: Step 2014942.0 exceeded memory limit (69143672 > 65561600), being killed

Thanks for helping.
Katharina

Post by **Mike Towler** » Thu Mar 03, 2016 2:16 pm

Hi Katharina,

OK - I'm going to need to run it myself. Can you mail me all the input files I need to do the calculation (no need to send the large bwfn.data file - I'll recreate it myself from your PWSCF input).

Best,
Mike

Post by **Mike Towler** » Fri Mar 04, 2016 2:27 pm

OK - I ran your calc with 24 processes in Shm mode using the current 2.13.561 version of CASINO on my personal machine (which has 12 physical cores and 24Gb memory i.e. much less memory than yours). The job ran normally: total energy = -1864.8(2). So there are a number of possibilities for the issue you're seeing:

(1) something was wrong with the 2014 version of CASINO that you're using (2.13.429) that has since been fixed. Can you run with the latest version of the code to see if you have the same issue?

(2) this is a compiler error. What compilers (with version numbers) are you using? I'm using Intel ifort 14.0.1 and gcc 4.8.1 (the low-level Shm stuff in alloc_shm.c is one of the few CASINO routines written in C). Do you get the same error with a different set of compilers?

(3) Some Shm issue with your Cartesius machine, which is uh.. a bullx cluster ( https://userinfo.surfsara.nl/systems/ca ... escription ). The alloc_shm.c stuff can sometimes be a bit flaky as different architectures/machines have different ideas of how shared memory should be implemented - see e.g. the extended rant about Blue Gene machines in section 39.4 of the CASINO manual.. The shalloc_smp.f90 routine uses non-standard fortran95 (Cray pointers) which can cause issues with certain compilers. I have never had access to a Bullx machine so I've never had to officially check whether it works or not.

(4) Some problem with the arch file setup on this machine (you wrote this, yes?). You probably said or implied this already but have you run *any* calculations on Cartesius that would have been impossible if it were *not* actually running in Shm mode? i.e. any calculations where the blip file (or more accurately, the number given for "Maximum shared memory required" in the CASINO out file) is more than 1/24 of the total memory available on the node.

Cheers,
Mike

PS: do you not use runqmc at all? Highly recommended. Detects lots of errors.

PPS: if this turns out to be (3) then I can investigate further if someone could give me a temporary account on Cartesius. However, they almost certainly won't.. Dutch national security and all that.

PPPS: which column of the output of top are you looking at when you talk about the 'memory use of the other nodes increasing'?

Katharina Doblhoff · Post by **Katharina Doblhoff** » Fri Mar 04, 2016 4:33 pm

Hi Mike,

thanks for looking into it.
Here come a few answers:
1.) I tried running the code from 2016-03-01 with only a few minor fixed (the ones I sent you - I never touched on memory. May have invented a new integer somewhere, but nothing more) -> did not help
2.) I used mpiifort, mpiicc and mpiicpc, but I cannot figure out which versions are running on our cluster. I will try the gnu compilers as soon as possible.
3.) Possible - but the support team said it should work
4.) Very possible, but since it was me doing it, I will have a hard time finding what I did wrong, because I think I did it right,...
PS.) I do run runqmc to check once before I then submit with my script. The reason I do this is that I have an automatic jobchain gathering all my files and linking the right stuff together and it was kind of easier to treat casino in the same way I treat all other codes.
PPS.) Trying to get you an account on cartesius...
PPPS.) I am looking at the %MEM column and the memory usage in the header. I am aware of the fact that that may double count in some cases. But since sacct did not catch the problem, I just tried to to something...

thank you and best regards,
Katharina

Post by **Mike Towler** » Fri Apr 08, 2016 11:11 am

Just in case anyone was following this thread, I'm updating it with the final solution
that was found.

Katherina's memory problem on Cartesius turned out to be an issue with the
SLURM job scheduler - see below.

Essentially it looked at how much shared memory was required, then
multiplied that by the number of processes per node(!), and if that was
greater than the amount of memory available on the node it automatically
killed the job, i.e. like a warning light on your car coming on because the warning
light is broken.

Note that this seems to imply that shared memory didn't work at all on the machine
except in the case where you didn't actually need it. You'd think someone
would have noticed..

Mike

Dear Katharina, Mike,

Sorry for the delay in coming back to you.

We created a standalone job, reproducing your problem and we reported this
problem to ATOS/BULL and the SLURM developers to see if they could come up
with a solution.

It is highly likely that Mike's analysis is correct.

The suggested solution in the link that Mike provided is for a higher SLURM
version than we are currently running, but we may have a workaround for the
current SLURM version. We are currently investigating if we can apply this
workaround in a live SLURM configuration or that we have to do this in a
maintenance.

Met vriendelijke groeten/ Kind regards,

Wim

Dear Katharina, Mike,

My colleague just informed me that the workaround has been configured in the
current SLUM implementation.

Out testcase is working fine now.

Could you test if you can run your CASINO application now?

Met vriendelijke groeten/ Kind regards,

Wim

Dear Wim,
I tested and the job seems to run. That is great!
Thank you,
Best regards,
Katharina

The CASINO forum

vmc memory usage

vmc memory usage

Re: vmc memory usage

Re: vmc memory usage

Re: vmc memory usage

Re: vmc memory usage

Re: vmc memory usage

Re: vmc memory usage