CASINO on scratch directory

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Post Reply
Katharina Doblhoff
Posts: 84
Joined: Tue Jun 17, 2014 6:50 am

CASINO on scratch directory

Post by Katharina Doblhoff »

Does CASINO profit if it is run directly in the scratch directory as compared to the home directory? i.e. does it make sense to add somethin like
homedir=$(pwd)
cp * /scratch/
cd /scratch/
srun ...
rm /scratch/*wfn.data
rm /scratch/*pp.data
cp /scratch/* homedir
cd homedir

to the arch file?
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: CASINO on scratch directory

Post by Mike Towler »

Hi Katharina,

Depends on the nature of the disks in question. Here in the Cambridge TCM group, the /home directories live on a central file server and are NFS exported to individual machines, whilst the /scratch disk is always a local disk. Thus, reading and writing to /home will be a lot slower because the data has to be sent over the network. However in that case your proposed solution (send from home to scratch, do the calculation, send from scratch to home) already incurs twice that cost before you consider anything that CASINO does, and so perhaps isn't really a good idea (in this case just stick the files on the scratch disk from the beginning).

That said, it won't really make that much difference as CASINO doesn't spend that much time doing disk i/o anyway. For really massive blip (bwfn.data) files, it can take a minute or so to read them (always make sure you use the unformatted binary kind i.e. bwfn.data.b1 or bwfn.data.bin) - and obviously you don't want to send multi-Gb files over any sort of network if you can avoid it. The main disk i/o during the calculation itself is reading / writing the config.in/config.out files at the end of each block. If you're having trouble with this, you can use longer blocks (keyword block_time is useful), and the amount of writing can also be significantly reduced using the checkpoint keyword. Config writes are only absolutely necessary if the job has a time limit and you're not sure how long the job will take, or if you want to use DMC catastrophe recovery (dmc_trip_weight).

Code: Select all

% casinohelp checkpoint
 CASINO HELP SYSTEM
 ==================

 Keyword : checkpoint
 Title   : Checkpointing level
 Type    : Integer
 Level   : Expert

 DESCRIPTION
 -----------
 This integer-valued keyword determines how much CASINO should worry about
 saving checkpoint data to config.* files (which can take a  significant
 amount of time, especially with large systems done on many cores and can
 reduce the parallel efficiency - since the slower blocking redistribution
 algorithm must be used at the end of every block when we write out a config
 file). CHECKPOINT can take four values:
 
 '2' : save data after every block in both VMC and DMC, and save the state
 of the random number generator in OPT runs.
 
 '1' [default] : as '2', but  save data in VMC only after the last block when
 RUNTYPE=vmc_opt, opt_vmc or vmc_dmc (still after every block if RUNTYPE=vmc).
 
 '0' : only save data at the end of the run, for continuation purposes. This
 is safe only if used in conjunction with the MAX_CPU_TIME keyword (since then
 the config file will be automatically written out if CASINO sees the job is
 about to run into an imposed time limit, even if we have not completed the
 full number of requested blocks).
 
 '-1' : do not write config file at all, ever. Note this value should be
 chosen only if you *know* that the job will fit in any imposed time limit ,
 and that such a run will be long enough to give an acceptably small error
 bar, since it will be impossible to subsequently continue the run.
 
 CHECKPOINT=0 or -1 clashes with the DMC catastrophe-recovery facility, for
 which each DMC block needs to be checkpointed. The value of CHECKPOINT is
 thus set to 1 regardless of the input value if DMC_TRIP_WEIGHT > 0 .
Cheers,
Mike
Post Reply