QMC on large systems

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Post Reply
Shiv Upadhyay
Posts: 5
Joined: Thu Dec 17, 2015 3:40 pm

QMC on large systems

Post by Shiv Upadhyay »

Hi all,

Let me preface this with a warning: I am a first year graduate student working with QMC so if I ask a stupid question that can be easily answered with some reading please direct me towards the resources!

First, I wanted to draw some attention to a python alternative to the graphdmc utility. Further information is available here: http://shivupa.github.io/blog/modifying ... -plotting/ I hope this may be useful.

Second, upon reading this http://www.psi-k.org/newsletters/News_1 ... ht_103.pdf, I had some questions on the implementation of something along the lines of http://qmcchem.ups-tlse.fr/index.php?title=QMCChem. In the US we have a resource, Open Science Grid, which is similar to European Grid Infrastructure used by the QMC=Chem program. I am wondering if this seems viable:
  • 1. Obtain a trial wave function
    2. Run the VMC portion followed by DMC equilibration locally on a moderate sized cluster
    3. Submit Multiple DMC stats accumulating runs to the grid using the stop method = ‘small error’ keyword and setting the block time and stop time appropriately given the wall time
    4. Average the results of each DMC stats job
    5. Submit more jobs to the grid if the error bar is not reached
Theoretically, I understand that many short jobs are equivalent to one long job, but in practice I'm worried it may not work this way. I'm particularly worried that starting from the same equilibrated distribution of walkers will cause the resulting runs to not be independent. My initial answer to this is if a different random seed is used for each run then the jobs are independent. Its possible that this may even be better than one long job given the serial correlation of local energies?

In short my question is:
  • What do the more experienced members of the community think of this of this workflow?
Third, if the trial wave function is written to correlation.data then is it possible to read the output of a correlation.data file to use as the trial wave function of a different QMC calculation? This isn’t really related to my current work, but I was curious.

Lastly, in the manual (pg. 30 section 6.7 How to run coupled DFT-DMC molecular dynamics calculations: the runqmcmd script) it says:
There is also a big question over whether the configurations read in at the restart can be properly equilibrated in so few moves in the case when the DMC wave functions involve genuinely new physics. MDT has a discussion document in circulation which covers these issues in more detail.
I don’t do any DFT-DMC MD, but I’d be interested in reading this if anyone has this available.

I apologize for the wall of text and all the questions. Thanks for the help!

-Shiv
Neil Drummond
Posts: 113
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: QMC on large systems

Post by Neil Drummond »

Dear Shiv,

Thanks very much for the new hist-plotting program!

Some suggestions / advice for your suggested strategy (which looks fine):

* For large systems, you may find that equilibration is more expensive than statistics accumulation. There are tricks for reducing this; e.g., if twist averaging you can do a long equilibration at one twist, copy the configurations to all your twists, and do a short re-equilibration at each twist.

* Be slightly careful about halting when the error bar falls below a target. The estimated error bar fluctuates randomly (on top of the 1/sqrt(N_steps) behaviour) and hence the error bar will fall below the target due to these fluctuations. You could instead halt when the DMC error bar remains below a target for many steps (about a correlation period).

* Note that if you are submitting multiple identical DMC jobs with a view to averaging them to reduce error bars, note that you must set random_seed to "timer". (Or manually give each calculation a different seed.)

* Averaging lots of small DMC simulations is perfectly OK, so long as population-control bias is negligible for each of your simulations. Performing one big calculation means that you have a larger target population.

I don't understand your question about reading correlation.data. At the start of a calculation, CASINO will read the contents of correlation.data and (try to) use the corresponding wave function.

Mike Towler can probably help more with the MD question.

Best wishes,

Neil.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: QMC on large systems

Post by Mike Towler »

Hi Shiv,

Thanks for this - interesting question.

I'm happy to incorporate your modified graphdmc and python alternative in the CASINO distribution if you want. Let me know.

I echo Neil's comments on your proposed workflow; I'll add only one annoying comment, which is that the target_error/small_error stuff -- whereby CASINO automatically decides when to finish the calculation based on an on-the-fly analysis of the reblocked error bar -- has actually not yet been incorporated into the current beta public distribution, even though the manual implies that it has. This is so even though I implemented it more than a year ago; as I recall I realized just before I submitted that there was an annoying niggle that needed fixing and I never got round to it. A change in personal circumstances has meant that I've had to basically take around 6 months off CASINO development, but I intend to be back with a vengeance shortly - possibly as early as next week - and I'll make the target_error stuff a priority.

I'll send a copy of my DMC-MD discussion document to you by private email in a minute..

Cheers,
Mike
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: QMC on large systems

Post by Mike Towler »

Hi Shiv,

Actually here's a link to the DMC-MD discussion document:

http://www.tcm.phy.cam.ac.uk/~mdt26/gm_towler.pdf

I thought it would be a bit too undiplomatic to publish but I guess it's OK.

Cheers
Mike
Shiv Upadhyay
Posts: 5
Joined: Thu Dec 17, 2015 3:40 pm

Re: QMC on large systems

Post by Shiv Upadhyay »

Dr. Drummond and Dr. Towler,

Thanks for your quick replies. You've both given me a lot of insight on how to implement this, and I'll get started on it. I don't foresee any major issues (but who does?)

I'd be extremely happy if the graphing was implemented into CASINO provided its not a hassle. Let me know if you need anything from my end. The python code is available at my github https://github.com/shivupa/CASINO_PLOTTING. The modified lines in the plotdmc utility are included below. It may make sense to neglect the first line.

Code: Select all

set term svg font "/Users/shiv/Library/Fonts/cmunrm.ttf,24"
set term svg size 1024,768
set output "output.svg"
set multiplot layout 2,1
set title 'DMC Energy and population'
set size 1, 0.5
set format y '%12.6g'
set ylabel 'Population' font ',24'
set xrange [0:1000000]
plot '$dmchist' using 1:3 with lines title ''
set title 'Black: average local energy,\
Red: reference energy,\
Green: Best estimate of energy' font ',24'
set size 1, 0.5
set format y '%12.6g'
set xlabel 'Iteration Number' font ',24'
set ylabel 'Energy (a.u.)' font ',24'
set yrange [-17.6:-17]
set xrange [0:1000000]
plot '$dmchist' using $plotcols1a with lines title '' linecolor rgb 'black',\
'$dmchist' using $plotcols1b with lines title '' linecolor rgb 'red',\
'$dmchist' using $plotcols1c with lines title '' linecolor rgb 'green'
unset multiplot
Thanks,
Shiv
Vladimir_Konjkov
Posts: 165
Joined: Wed Apr 15, 2015 3:14 pm

Re: QMC on large systems

Post by Vladimir_Konjkov »

Hi Shiv

It's fine that you want to incorporate your python code in the CASINO project, but it is advisable to follow Python Code Style Guide aka PEP8 and PEP257

best regards

Vladimir.
In Soviet Russia Casino plays you.
Shiv Upadhyay
Posts: 5
Joined: Thu Dec 17, 2015 3:40 pm

Re: QMC on large systems

Post by Shiv Upadhyay »

Vladimir,

Thanks for the feedback. Using autopep8 https://pypi.python.org/pypi/autopep8 I formatted to the pep8 conventions. Using pep257 https://pypi.python.org/pypi/pep257 I formatted to the pep257 conventions. I believe it should be good now, but I would appreciate any recommendations.

Thank you,
Shiv
Kevin_Gasperich
Posts: 7
Joined: Wed Mar 18, 2015 7:46 am

Re: QMC on large systems

Post by Kevin_Gasperich »

Hi Neil,
Neil Drummond wrote: * Note that if you are submitting multiple identical DMC jobs with a view to averaging them to reduce error bars, note that you must set random_seed to "timer". (Or manually give each calculation a different seed.)
A minor note regarding random seeds:
If one wishes to start several identical DMC jobs from the same set of VMC configs, the only option is to use random_seed = timer.
If an integer seed is given manually, it will be ignored, and the random number sequence will continue from the saved state (see description below).
Keyword : random_seed

This keyword determines which random seed to use for the RANLUX random
number generator. The default value of RANDOM_SEED is 'default', which uses
the seed 314159265. If RANDOM_SEED is set to 'timer', the system timer will
be used as the seed. If the value of RANDOM_SEED is an integer, that integer
will be used as the random seed. The seed is printed to the output file so
that calculations using RANDOM_SEED='timer' can be reproduced afterwards.

Note that when restarting from a previous calculation the value of
RANDOM_SEED is ignored and the random number sequence will generally be
continued from the saved state of the random number generator stored in the
config file.
The user may in fact override this behaviour in DMC restarts by
setting RANDOM_SEED = 'timer', in which case the generator will be
re-initialized from the system clock after the config file is read. This
might be useful, for example, if a prior test has revealed that the standard
sequence will lead to a configuration giving rise to a population explosion.
Kevin
Katharina Doblhoff
Posts: 84
Joined: Tue Jun 17, 2014 6:50 am

Re: QMC on large systems

Post by Katharina Doblhoff »

Sorry to take this thread up again so much later, but I only just read it...
I am confused about the following:
Shiv says:
I'm particularly worried that starting from the same equilibrated distribution of walkers will cause the resulting runs to not be independent.
Nobody seems to worry about this, suggesting that the runs would be independent, but the way I understand this would be that Shiv wants to equilibrate N walkers. Each of these N equilibrated walkers he then wants to use in a short DMC run. If the error is not small enough, he suggest to perform a second DMC calculation using the same set of N equilibrated walkers, right?

In my eyes this cannot lead to completely independent runs especially if the runs are really short. Imagine the extreme case that only one step is performed for each walker. Then redoing a DMC calculation (even with a new random seed) will give basically the identical answer to the previous run, since the wakers did not have time to move significantly.

Shouldn't you rather generate N1 walkers, equilibrate those and then propagate them long enough to generate N2 independent walkers, where the decorrelation time between configs chosen as walkers in the N2 set should be sufficiently long to allow diffusion of the walkers through the system (just in the same way as one chooses independent walkers from VMC even if those walkers still have to be equilibrated and thus still have some time to diffuse randomly)?! The only advantage of doing this would be that one can skip over the equilibration for the large N2 set, allowing to use a large number of CPUs without increasing the ratio of equilibration time to statistics accumulation time.
Post Reply