Does CASINO profit if it is run directly in the scratch directory as compared to the home directory? i.e. does it make sense to add somethin like
homedir=$(pwd)
cp * /scratch/
cd /scratch/
srun ...
rm /scratch/*wfn.data
rm /scratch/*pp.data
cp /scratch/* homedir
cd homedir
to the arch file?
CASINO on scratch directory
-
- Posts: 239
- Joined: Thu May 30, 2013 11:03 pm
- Location: Florence
- Contact:
Re: CASINO on scratch directory
Hi Katharina,
Depends on the nature of the disks in question. Here in the Cambridge TCM group, the /home directories live on a central file server and are NFS exported to individual machines, whilst the /scratch disk is always a local disk. Thus, reading and writing to /home will be a lot slower because the data has to be sent over the network. However in that case your proposed solution (send from home to scratch, do the calculation, send from scratch to home) already incurs twice that cost before you consider anything that CASINO does, and so perhaps isn't really a good idea (in this case just stick the files on the scratch disk from the beginning).
That said, it won't really make that much difference as CASINO doesn't spend that much time doing disk i/o anyway. For really massive blip (bwfn.data) files, it can take a minute or so to read them (always make sure you use the unformatted binary kind i.e. bwfn.data.b1 or bwfn.data.bin) - and obviously you don't want to send multi-Gb files over any sort of network if you can avoid it. The main disk i/o during the calculation itself is reading / writing the config.in/config.out files at the end of each block. If you're having trouble with this, you can use longer blocks (keyword block_time is useful), and the amount of writing can also be significantly reduced using the checkpoint keyword. Config writes are only absolutely necessary if the job has a time limit and you're not sure how long the job will take, or if you want to use DMC catastrophe recovery (dmc_trip_weight).
Cheers,
Mike
Depends on the nature of the disks in question. Here in the Cambridge TCM group, the /home directories live on a central file server and are NFS exported to individual machines, whilst the /scratch disk is always a local disk. Thus, reading and writing to /home will be a lot slower because the data has to be sent over the network. However in that case your proposed solution (send from home to scratch, do the calculation, send from scratch to home) already incurs twice that cost before you consider anything that CASINO does, and so perhaps isn't really a good idea (in this case just stick the files on the scratch disk from the beginning).
That said, it won't really make that much difference as CASINO doesn't spend that much time doing disk i/o anyway. For really massive blip (bwfn.data) files, it can take a minute or so to read them (always make sure you use the unformatted binary kind i.e. bwfn.data.b1 or bwfn.data.bin) - and obviously you don't want to send multi-Gb files over any sort of network if you can avoid it. The main disk i/o during the calculation itself is reading / writing the config.in/config.out files at the end of each block. If you're having trouble with this, you can use longer blocks (keyword block_time is useful), and the amount of writing can also be significantly reduced using the checkpoint keyword. Config writes are only absolutely necessary if the job has a time limit and you're not sure how long the job will take, or if you want to use DMC catastrophe recovery (dmc_trip_weight).
Code: Select all
% casinohelp checkpoint
CASINO HELP SYSTEM
==================
Keyword : checkpoint
Title : Checkpointing level
Type : Integer
Level : Expert
DESCRIPTION
-----------
This integer-valued keyword determines how much CASINO should worry about
saving checkpoint data to config.* files (which can take a significant
amount of time, especially with large systems done on many cores and can
reduce the parallel efficiency - since the slower blocking redistribution
algorithm must be used at the end of every block when we write out a config
file). CHECKPOINT can take four values:
'2' : save data after every block in both VMC and DMC, and save the state
of the random number generator in OPT runs.
'1' [default] : as '2', but save data in VMC only after the last block when
RUNTYPE=vmc_opt, opt_vmc or vmc_dmc (still after every block if RUNTYPE=vmc).
'0' : only save data at the end of the run, for continuation purposes. This
is safe only if used in conjunction with the MAX_CPU_TIME keyword (since then
the config file will be automatically written out if CASINO sees the job is
about to run into an imposed time limit, even if we have not completed the
full number of requested blocks).
'-1' : do not write config file at all, ever. Note this value should be
chosen only if you *know* that the job will fit in any imposed time limit ,
and that such a run will be long enough to give an acceptably small error
bar, since it will be impossible to subsequently continue the run.
CHECKPOINT=0 or -1 clashes with the DMC catastrophe-recovery facility, for
which each DMC block needs to be checkpointed. The value of CHECKPOINT is
thus set to 1 regardless of the input value if DMC_TRIP_WEIGHT > 0 .
Mike