Error PPOTS

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

Also - it would help to have the exact error code so we know what the thing is whining about.

Find the following block of code in CASINO/src/ppots.f90

Code: Select all

open(io_pp,file=psp_filename,status='old',iostat=ierr)
if(ierr/=0)call errstop('READ_PPOTS','Error opening '//trim(psp_filename)&
 &//' file.')
and insert the following line after the open statement.

Code: Select all

if(ierr/=0)write(my_node+400,*)'ierr=',ierr ; call flush(my_node+400)
Recompile the code, then run it again in a case where you expect to see the error.
In fact, as you are up against a 48h time limit, you need to find a shorter run which exhibits the same error.

To do this set max_cpu_time = 10 min in the input file, then use 'runqmc --auto-continue' - this will force a restart after 10 minutes.

If you encounter the error, one or more files called e.g. fort.4xx should appear, containing something like 'ierr=123'. Could you let me know what the actual value of 123 is?

M.
elaheh
Posts: 10
Joined: Mon Jun 03, 2013 10:24 am

Re: Error PPOTSca

Post by elaheh »

> Describe to me exactly the (presumably manual) process you use to restart the calculation.
To restart for example a DMC calculation, all the required files are present in my directory. I only change config.out to "config.in", newrun to "F" and runtype to "dmc_stats". Then I restart running it. Sometimes it running without problem and occasionally it stops with error I mentioned before.

It is not at all a big problem, but when I submit my job, wait on a queue and it suddenly stops with the error, then it is bothering.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

To restart for example a DMC calculation, all the required files are present in my directory. I only change config.out to "config.in", newrun to "F" and runtype to "dmc_stats". Then I restart running it. Sometimes it running without problem and occasionally it stops with error I mentioned before.
OK - that's fine.
It is not at all a big problem
Yes it is, and we need to find out what's causing it. Let me have the error code when you find out what it is..

M.
elaheh
Posts: 10
Joined: Mon Jun 03, 2013 10:24 am

Re: Error PPOTS

Post by elaheh »

I am running other jobs in the cluster. I was wondering If I change the code and compile, it does not affect other jobs.

Elaheh.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

No problem - just use a different binary name.

Make the change to ppots.f90 then:

cd CASINO/src
make EXECUTABLE=casino_test

then use

runqmc --binary=casino_test -p 512 -T 48h -s etc..
elaheh
Posts: 10
Joined: Mon Jun 03, 2013 10:24 am

Re: Error PPOTS

Post by elaheh »

There is
After following change in ppots.f90:
open(io_pp,file=psp_filename,status='old',iostat=ierr)
if(ierr/=0)write(my_node+400,*)'ierr=',ierr ; call flush(my_node+400)
if(ierr/=0)call errstop('READ_PPOTS','Error opening pseudopotential file ' &
&//trim(psp_filename)//'.')

and typing "make EXECUTABLE=casino_test", there is an error:
/home/polaris_lan1/lanem/CASINO/src/ppots.f90(172): error #6404: This name does not have a type, and must have an explicit type. [MY_NODE]
if(ierr/=0)write(my_node+400,*)'ierr=',ierr ; call flush(my_node+400)
------------------------^
compilation aborted for /home/polaris_lan1/lanem/CASINO/src/ppots.f90 (code 1)
make: *** [/home/polaris_lan1/lanem/CASINO/src/zlib/linuxpc-ifort-sge-parallel.polaris//opt/ppots.o] Error 1
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

Sorry.

At the top of the read_ppots.d90 routine, change the line:

USE parallel, ONLY : am_master

to

USE parallel, ONLY : am_master,my_node
Neil Drummond
Posts: 113
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: Error PPOTS

Post by Neil Drummond »

Dear Elaheh & Mike,

I have occasionally encountered this problem on the Polaris cluster in Leeds. The problem is not reproducible (a job that fails can simply be resubmitted). I don't think it is a problem with CASINO; it is more likely to be a problem with the filesystem on Polaris. We could perhaps reduce the chance of encountering the issue by reading the pseudopotentials on the master node and broadcasting.

Best wishes,

Neil.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

I have occasionally encountered this problem on the Polaris cluster in Leeds. The problem is not reproducible (a job that fails can simply be resubmitted). I don't think it is a problem with CASINO; it is more likely to be a problem with the filesystem on Polaris. We could perhaps reduce the chance of encountering the issue by reading the pseudopotentials on the master node and broadcasting.
Yes - the thought had occured to me - but I was trying to be systematic about it (it took a while to establish exactly what the problem was and that nothing silly was going on).

However, given that Polaris has a Lustre file sytem and you're using the Intel Fortran compiler I think the problem might be this:

http://www.nas.nasa.gov/hecc/support/kb ... n_276.html

In which case we ought to be able to cure it by putting "action=read" in the open statement (which we probably ought to be doing anyway).

I'll submit a patch with that fix and let's see how we get on.

M.
Mike Towler
Posts: 239
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: Error PPOTS

Post by Mike Towler »

OK - the current beta CASINO v2.13.95 (downloadable from the website) now contains this fix. If my suspicion is correct, your problem should go away). Let me know if it does or not.

M.
Post Reply