VMC energies explode

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Post Reply
Christopher_Eames
Posts: 8
Joined: Sun Dec 22, 2013 1:36 pm

VMC energies explode

Post by Christopher_Eames »

I'm looking at lithium peroxide to calculate the formation energy. O2 and Li metal run fine with vmc but with Li2O2 during vmc_opt the energies keep exploding;

Reblocked VMC energy: -62.087003136444 +/- 0.011315534317

--
Reblocked VMC energy: 2300.389442556838 +/- 0.000000000000
Reblocking not converged. Too few data points? Using unreblocked standard
--
Reblocked VMC energy: ********************* +/- *********************

I've tried various choices of nblock, and npoints and nconfig_write but nothing seems to work. It's been doing this for days and so I'm stuck. I think the trial wavefunction could be wrong somehow but I can't spot anything.

If anyone can offer any help I would be most grateful.

Sincerely

Chris Eames.
Attachments
input.tgz
(399.38 KiB) Downloaded 852 times
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: VMC energies explode

Post by Mike Towler »

Hi Chris,

Before I answer - why are you using version 2.6 of CASINO (from about 5 years ago) - especially since you're a new user?
You really need to install the current version and verify that you still have a problem with that. All sorts of issues and problems have been fixed and improvements made since then..

You don't seem to be using the runqmc script to run the code either; we really recommend this as it does extensive error
detection at the moment of submission of your job (and adds lots of informative stuff to the output file that is helpful to us).

Anyway, the basic problem with your input file is that you're trying to do a variance minimization with only 100 configs (vmc_config_write : 100). You really need 30000-100000 or it will bugger up spectacularly.

Also there's no need to use 1000 VMC blocks. Just use 1 (look up the definition of vmc_nblock to see why).

But seriously, you really need to install an up-to-date version of the code. CASINO 2.6 is like Windows98 or something.
You've got a CASINO Login ID - use it! :D

http://vallico.net/casinoqmc/update-casino/

Mike
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: VMC energies explode

Post by Mike Towler »

Chris,

I think one of the reasons why you're confused is that 5 years ago, the keyword definitions internal to CASINO 2.6 were somewhat terse one- or two-sentence definitions, and I later greatly expanded them to make it absolutely clear what is going on. Perhaps it would help if you read the modern definitions:

vmc_nstep

Code: Select all

CASINO HELP SYSTEM
 ==================

 Keyword : vmc_nstep
 Title   : Number of (main) VMC steps
 Type    : Integer
 Level   : Basic

 DESCRIPTION
 -----------
 Total number of VMC steps summed over all processors; this corresponds to
 the total number of particle configurations for which the energy (and other
 quantities to be averaged) are calculated.
 
 Note that because adjacent moves are likely to be serially correlated, there
 is also an inner decorrelation loop of length VMC_DECORR_PERIOD, so the total
 number of configuration moves attempted in a VMC run following equilibration
 is  VMC_NSTEP*VMC_DECORR_PERIOD.
 
 On parallel machines, each core will do the same number of steps and for
 each step the energy is averaged over the cores and written to the vmc.hist
 file (which will ultimately contain VMC_NSTEP/NCORES lines - though
 VMC_AVE_PERIOD adjacent lines may be averaged over to reduce the file size).
 This means that if VMC_NSTEP is not divisible by the number of cores then it
 will internally be rounded up to the nearest multiple of the number of cores
 (example: on a 12-core machine, given VMC_NSTEP=20 in input, CASINO will
 round up VMC_NSTEP to 24; each core will then do two steps and a total of two
 records will be written to vmc.hist, each of which is an average of 12
 energies). On a single-core machine with VMC_NSTEP=20, CASINO will move the
 single config 20 times, and 20 records will be written to vmc.hist.
 
 Note the VMC_NBLOCK keyword may be used to vary the frequency with which
 checkpointing is done i.e. how often we write the data to disk; it does not
 affect the total number of VMC steps and expectation values such as average
 energy should be independent of it.
vmc_nblock

Code: Select all

CASINO HELP SYSTEM
 ==================

 Keyword : vmc_nblock
 Title   : Number of blocks in VMC
 Type    : Integer
 Level   : Intermediate

 DESCRIPTION
 -----------
 Number of blocks into which the total VMC run is divided
 (post-equilibration). The purpose of VMC_NBLOCK is to determine how often the
 output, history and configuration/checkpoint files are written to. More
 specifically, at the end of each block:
 
 (1) the node- and block-averaged energies and a short 'report' are written
 to out.
 
 (2) the node-averaged energies for each step in the current block are
 appended to vmc.hist (and other quantities to expval.data).
 
 (3) the current VMC state plus any accumulated configs are written to the
 config.out file (this latter only if the CHECKPOINT input keyword is
 increased to 2 from its default value of 1 - otherwise config.out is only
 written after the end of the final block).
 
 Note that the total energy and error bar should be effectively independent
 of VMC_NBLOCK (provided it is ensured that the random number sequence is
 independent of the number of blocks - which it has not been at various
 periods in CASINO's history, though it should be now). Note also that the
 value of VMC_NBLOCK is ignored if VMC_NTWIST>0. The default value of
 VMC_NBLOCK is 1.
vmc_nconfig_write

Code: Select all

CASINO HELP SYSTEM
 ==================

 Keyword : vmc_nconfig_write
 Title   : Number of configs to write in VMC
 Type    : Integer
 Level   : Basic

 DESCRIPTION
 -----------
 Total number of configurations to be written out in VMC for later use
 (wave-function optimization or DMC). This number must be <= VMC_NSTEP (though
 you may want to set VMC_NSTEP to be significantly greater than
 VMC_NCONFIG_WRITE to get an acceptable error bar on the energy; this is
 useful for e.g. judging the success of an optimization after each stage).
 Since each processor always does the same number of steps, then
 VMC_NCONFIG_WRITE (and VMC_NSTEP) will be rounded up to the nearest multiple
 of the number of processors (e.g. VMC_NCONFIG_WRITE=20 will be rounded up
 internally to 24 on 12 cores, and 24 configs will be written to config.out -
 2 from each core). Note that the config.out file will still be written even
 if VMC_NCONFIG_WRITE is zero, since this file is used to store the current
 state of the system at the end of every VMC block (equivalent to writing one
 config, though of course multiple cores write multiple configs to save the
 state). Writing of config.out may be suppressed completely with an
 appropriate value for the CHECKPOINT keyword, and the data will be held in
 memory between different stages of the calculation.
Christopher_Eames
Posts: 8
Joined: Sun Dec 22, 2013 1:36 pm

Re: VMC energies explode

Post by Christopher_Eames »

Thanks Mike for the replies. This really clears things up a lot. For some reason (stupidity) I didn't realise nconfig_write was the number of configurations - I thought it was related to checkpointing and later continuation into dmc. When I ran O2 and Li with nconfig_write : 50000 I thought they were being slowed by writing to disk.

I get the error "Bad reblock convergence. Too few data points? Standard error in standard error..." after each reblocked energy is written and I thought this could be prevented by increasing nblock (even though the examples have nblock=1).

I was using casino2.6 because it's the most up-to-date version installed on HECTOR and since HECTOR is being switched off next week and since I'm only playing around I thought not to bother compiling my own. But I take your point completely about the large improvements made since 2.6 and will compile 2.12 tonight. Future production runs on Archer will be casino2.12+

Best Wishes,

Chris.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: VMC energies explode

Post by Mike Towler »

Hi Chris,
This really clears things up a lot.
Cool.
I didn't realise nconfig_write was the number of configurations - I thought it was related to checkpointing and later continuation into dmc.
Better make sure your previous optimizations did actually work then - run the 'envmc' utility in the directory to see the results of the optimizations. But wait, you can't because..
I was using casino2.6 because it's the most up-to-date version installed on HECTOR and since HECTOR is being switched off next week and since I'm only playing around I thought not to bother compiling my own.
See, I didn't even know HECTOR had a centrally-installed version of CASINO (and I use HECTOR myself). God knows where they got CASINO 2.6 from - I didn't realize the computer was that old! Googling reveals: http://www.hector.ac.uk/support/documen ... re/casino/ - whoah! keep your teeth in, Grandpa..

This is an ideal example of why it says the following in the installation instructions (see CASINO/README_INSTALL, or on the http://vallico.net/casinoqmc/how-to-install/ page, and question A6 on the FAQ.)

"Note for sysadmins: CASINO is not currently designed to be installed
system-wide by the root user; rather, a separate copy should be installed by
the user under his or her home directory. Amongst other reasons, this is
because the CASINO distribution contains a huge number of utilities (with
large numbers of executable files and scripts which most users of a multi-user
machine will not require) along with examples and documentation which the user
will wish to access.
"

The main point is that almost all sysadmins believe that 'installing a program' means compiling a single binary executable and sticking it in a directory somewhere. They ignore the fact that CASINO comes as a distribution with loads of tools and other bits and pieces that need to be provided as a whole. And they hardly ever get the idea of the runqmc script. We've written things so you can just type essentially the same command (e.g. runqmc -p 120 -T3h -s )on any computer in the world, and it just works. Forget about batch scripts and loading and unloading modules and qsubbing. The runqmc utility writes the batch script itself and submits it for you, as well as checking everything for errors, cleaning up etc. Providing the machine has been set up properly, it knows the time limits on particular queues and stuff like that; it handles the shared memory/OpenMP stuff required in the batch script that you might easily forget. You'll thank God for it on complex machines like Blue Gene/Qs. But the sysadmins don't think you need it, so you just get the CASINO binary executable (which of course won't include Shm or Openmp or OpenmpShm support, as these require different executables) and a stupid standard batch script that won't actually work in most cases. Sigh, I don't know why we bother.
But I take your point completely about the large improvements made since 2.6 and will compile 2.12 tonight. Future production runs on Archer will be casino2.12+
You need to install the CASINO current_beta version (the only one to support Archer) - this will become the official CASINO 2.14 distribution sometime in the next couple of weeks. Don't bother with the supposedly official 2.12.1 - this is verging on obsolescence already (things evolve very fast these days..)

To install on Archer, use the 'Auto detect' option of the install script, accept the three suggested CASINO_ARCHs with an 'archer' suffix, sort them into an order of preference - the Gnu compiler is best - using the [s] option, save your configuration using the [q] option, source ~/.bashrc, and you're done.

To compile, remember that on a machine like this (and indeed on most modern multicore machines) you'll be wanting to run the code in Shm shared memory mode (see chapter 38 of the current_beta manual. ( '1:Shm' in the [c] compile mode of the install script).

Run it with the runqmc script, having first typed 'runqmc --help' to see what options you have (this will change depending on the machine, and includes special machine-specific options e.g. on Archer there is a special flag to use the large memory nodes ). The -s flag should be used to run the shared memory executable.

Any problems, let me know.
I get the error "Bad reblock convergence. Too few data points? Standard error in standard error..." after each reblocked energy is written and I thought this could be prevented by increasing nblock
I'm aware that the on-the-fly reblocking output lacks clarity (amongst other problems); funnily enough the thing I'm doing right now is rewriting that bit of the code in preparation for the new release. If only people wouldn't keep asking silly questions on the forum I might actually have finished by now.. (joke!) ;)
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: VMC energies explode

Post by Mike Towler »

And Chris, I'm getting the impression you might benefit from attending the summer school next August?

http://vallico.net/casinoqmc/summer-schools/

We can teach you all this boring stuff, and as a bonus you get to drink beautiful black wine and eat yummy Tuscan food in gorgeous sunshine in one of the most beautiful places anywhere whilst meeting loads of lovely people from around the world and getting some much-needed exercise.

And while you're at it, see the second photograph on the page above? Click on it to make it larger. See those little dots on the top? That'll be you..

M.
Post Reply