Long equilibration

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
varelse
Posts: 44
Joined: Mon Jun 10, 2013 10:17 pm

Re: Long equilibration

Post by varelse »

A little update. First - I can run envmc, reblock and so on from an interactive job, and it works then (it's obvious, because then it runs from computational nodes). And secondly - I have beta installed now, but still without solving this cross-compiler issue. But I think that if I can use the interactive jobs, it is no longer needed.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Long equilibration

Post by Mike Towler »

Surely this growth is really small and what I wonder about is not the size, but rather the fact that it is persistent. I mean, it looks like there is a kind of trend, like I could fit to it a straight line (of nonzero slope of course).
You can't fit a straight line with a non-zero gradient to it. About halfaway across your graph, Ebest starts to oscillate around a constant value, as it should do. Before that it moves around well within error bars (mainly going up, but including some oscillations); this really isn't something you need to worry about. There are many more things in the world which are much more worth worrying about.

Once you get graphdmc working, it would be nice to run it to see the population fluctuations as well.
This is a problem for me: to define whether there is a clear plateau or not. I would be sure only when there is a really flat part. The definition "by eye" is not always clear. (in one of Your articles there was an objective criterion on the reblocked error, which you use in the automatic analysis in CASINO, so I should probably use this).
If you always want to have a perfect plateau, you're going to be disappointed. It pretty much always looks like that unless you run for a 'really' long time. And as I keep saying, you really need to rerun this with dmc_ave_period=1 (though am I missing something? Your plots appear to have 15 million points in them, which if you are running with dmc_ave_period=100, as you say, means you did 150 million actual moves, which seems excessive.)

The objective criterion is to automatically determine the best block size is based on U. Wolff, Comput. Phys. Commun. 156, 143-153 (2004) doi:10.1016/S0010-4655(03)00467-3, Eqn. (47).

Code: Select all

! The reblocked stderr has a systematic and a statistical error.  !
! The optimal block size is                                       !
!   B = ncorr (2*nstep / ncorr)**(1/3)                            !
! where ncorr is the integrated serial correlation.               !
! For this block size, the systematic error in the error is       !
! half as large as the statistical error in the error             !
!                                                                 !
! Here, the optimal block size is determined self-consistently    !
! assuming that ncorr(B)=errfac**2 at this block size             !
! is a good measure for the true serial correlation.              !
!                                                                 !
! We choose the smallest B such that                              !
!   B**3 >= 2 * nstep * ncorr(B)**2                               !
btw, when I ran auto-detect of arch files, it detects linuxpc-gcc-pbs-parallel as a valid arch. I compiled CASINO for it, but the same problem with envmc appears ("vmc helper won't talk to me"). Suprisingly, at first it didn't compile, then I tried to make my own arch, which also did not compile (not surprising, without access to the documentation). And then I tried compilation of linuxpc-gcc-pbs-parallel just to copy the final error message here, and it worked. Maybe because i didn't run the [r] option before compilation.
OK, when you say 'valid arch', what you mean is that it said 'generic arch'. I think I've realized that too many people don't know what 'generic' means - that in this sense it means 'this arch might or might not work but it has some similarity to your machine, so perhaps you can base your setup on it'. So I've therefore just changed the install script in the current_beta version of CASINO as follows:

Code: Select all

---[v2.13.284]---
* Clarified what is meant by 'explicit matches' and 'generic matches' when
  using the auto-detect option of the install script.
  -- Mike Towler, 2014-02-13

  I have witnessed various people getting extremely confused when CASINO_ARCHs
  flagged as 'generic matches' by the auto-detect procedure failed to compile
  CASINO on their machine. As the meaning of 'generic' may be unclear to people
  with English as their second language, I have therefore renamed them as
  'possible matches' and added the following text to the output of the install
  script:

  "NOTE: 'Explicit matches' were almost certainly designed for this machine,
  and will almost certainly work. 'Possible matches' represent closely related
  machines, which might or might not work. If they don't, then it is possible
  that they can be made to work by minor tweaking of the corresponding arch
  file in CASINO/arch/data. If not, try creating a new CASINO_ARCH for this
  machine interactively (option [n] in the initial install menu)."
Note also that a compilation can appear to work, but on machines with cross-compilers, the resulting binaries intended for use on the login nodes won't actually execute, because they've been compiled for the wrong architecture.
A little update. First - I can run envmc, reblock and so on from an interactive job, and it works then (it's obvious, because then it runs from computational nodes). And secondly - I have beta installed now, but still without solving this cross-compiler issue. But I think that if I can use the interactive jobs, it is no longer needed.
OK. Look, the fundamental problem here is that:

(1) You don't know how your computer works.

(2) The system administrator of you computer probably knows how it works (though that isn't necessarily the case on complicated machines) but he either hasn't bothered to tell anyone else, or has put the information in a place where you haven't looked.

Because of Android and Apple phone apps, people these days expect software to just work when you download it. The thing is, because of the very careful way in which e.g. Google designed the Android platform, app developers can make very clear assumptions about what hardware and software environment will be used to run their code.

CASINO is different. It is supposed to run on hundreds of different kinds of hardware, from little tablets to the biggest supercomputers in the world, on one to a million processors, with multiple different operating systems, with about ten different Fortran, C, and C++ compilers along with Python, Perl, and bash interpreters, and even expected to cope with different Makefile syntaxes.. To be honest, it's amazing it works at all, and I think it does do a pretty good job. What it isn't, though, is psychic. If it requires a special incantation to compile utilities for use on the login nodes, you need to tell it what that is.

Now here, you know what that incantation is, because you made my 'Blazej is a Genius' program work on the login nodes. Reblock and the other utilities are no different to 'Blazej is a Genius', except they're probably a bit longer. So, take that incantation, and insert it into the arch file as I outlined in my previous post:

It's perfectly possible for the CASINO arch system to handle cross compilers - it just requires someone who knows what they're doing to set it up (the 'automatic arch file designer' inside the install script is aware of this and will ask you about cross compilers). As an example, take a look at :

CASINO/arch/data/linuxpc-gcc-pbs-parallel.titan.arch

All that stuff at the bottom defining environment variables with a '_NATIVE' suffix: that's the stuff that handles the Fortran utilities.

Thus :

FFLAGS_opt = -O3
FFLAGS_opt_NATIVE = $(FFLAGS_opt) -target=native

says that the Fortran compiler should use a flag '-O3' to turn on optimization for both the CASINO executable and the Fortran utilities, but when compiling utilities it should additionally use '-target=native' which means 'Please compile this program for use on the login nodes rather than the compute nodes'.


If you still can't figure it out, run the 'Create a new CASINO_ARCH for this machine interactively' option of the install script, then write down the questions you don't know the answer to, and send them to the system administrator, or alternatively post them here and we'll see if our collective skill in Polish is enough to figure out the answers from the website.

And don't let him tell you that Fortran programs must always be submitted to the batch queue system. Some of these utilities do stuff like extracting a single number from a file, which can be done using e.g. a bash or perl script or a Fortran program. Are we saying that just because it's written in Fortran it has to sit in a queue for hours? That's just silly. He'd better watch out or I'll rewrite CASINO in bash; then we'll see how fast it is(n't).

Cheers,
Mike
varelse
Posts: 44
Joined: Mon Jun 10, 2013 10:17 pm

Re: Long equilibration

Post by varelse »

Mike Towler wrote:And don't let him tell you that Fortran programs must always be submitted to the batch queue system. Some of these utilities do stuff like extracting a single number from a file, which can be done using e.g. a bash or perl script or a Fortran program. Are we saying that just because it's written in Fortran it has to sit in a queue for hours? That's just silly. He'd better watch out or I'll rewrite CASINO in bash; then we'll see how fast it is(n't).
They responded like: It is important that you should not run even the simple programs on the login node. if you want to have control over them during execution, you can use the interactive jobs. On the cluster, there is no need to use cross-compiler, because the computational nodes have the same architecture (x86_64). So if you want to compile something to use on the cluster (i.e. not on the login node) such a compiler will not be needed.

So they are a bit paranoid. Although this may be reasonable (my friend who also has an account there explained: they have only one login node, if all the users would run Gnuplot at the same time it may get blocked, even if Gnuplot itself is a little harmless program. Maybe it makes sense).
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Long equilibration

Post by Mike Towler »

Many of CASINO's utilities are essentially the equivalent of simple bash scripts that happen to be written in Fortran, and take up hardly more system resources than listing your files. A blanket ban on running programs based on the language they are written in is just silly. If the advice amounted to 'Please do not run any programs which consume excessive system resources on the login node', then I suppose that would be perfectly sensible if the login node has poor quality hardware and too many users. The trouble is that they are making a blanket assumption that their users are stupid, and can't be trusted not to do this. Well, if they think you're stupid, they shouldn't let you use the machine in the first place! This really does sound like excessive paranoia..

So, if I understand you correctly, the same Fortran compiler is suitable for use on the login nodes and on the compute nodes? if that's the case, I'm not sure why we're having this discussion - just set your CASINO distribution up to use it. At the wrist of bringing down the wrath of the Polish supercomputing industry on my head, you should then just go ahead and use utilities like 'envmc' or whatever that you know consume zero system resources. They would never know, unless they read this forum (Hi there!) and you're actually doing them a favour - if we rewrote it in Perl or something it would probably consume more resources! Stuff which needs graphics like graphdmc requiring gnuplot or whatever, well OK, if that's what they want I would just transfer the dmc.hist file back to your own workstation and analyze it there rather than sitting in a batch queue for hours just to plot a graph..

Mike
varelse
Posts: 44
Joined: Mon Jun 10, 2013 10:17 pm

Re: Long equilibration

Post by varelse »

No, they are saying something like "You should not install anything on login node, and then you do not need the cross compiler because all the other nodes are the same. The login node is different, but we won't tell you how to install anything there, because you shouldn't do it".
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Long equilibration

Post by Mike Towler »

OK, so:

(1) you know how to compile Fortran programs for the login nodes, as you were able to compile the 'Blazej is a genius' program.

(2) As I keep saying, just do it! Run the install script setup, answer the bloody questions, including the word 'Yes' in answer to its question about whether you want support for a cross-compiler, and it will magically produce a working arch file with support for the utilities.

(3) In practice, only use the utilities that don't hog any system resources, so that the system administrators don't tan your buttocks with a birch twig in a hot sauna (or is that Finland?). If you want to run graphical utilities, pull the hist file or whatever back to your home workstation first.

This thread is getting far longer than it need be... :)
varelse
Posts: 44
Joined: Mon Jun 10, 2013 10:17 pm

Re: Long equilibration

Post by varelse »

Ok, simply gfortran, without any additional flags. It was that simple. This thread really shouldn't be that long.
Post Reply