Regarding walltime for long queue

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Post Reply
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Regarding walltime for long queue

Post by vinod_ashokan »

Dear CASINO users,
I have done the installation on a cluster and trying to run the larger size program. In this process I have a query regarding Walltime
for long queue. The suggestion received from cluster support team is appended below:

My Question to the cluster support was:

$ runqmc -n 8 --ppn=8 -T 160h

It prompted with an error message

qsub: Job exceeds queue resource limits MSG=cannot satisfy server max
walltime requirement

There Answer:

"just add #PBS -q long at the top of your 'casino' script file and change #PBS -l walltime=26:00:00 to #PBS -l walltime=240:00:00"

While doing modification in the casino script file, it is not solving the issue which are given as under.

1) "#PBS -q long" always to be added to the head of the job-submission script
2) Depending upon the Number of cores requested
3) Depending upon the time requested

I would like to add the above depending upon the need of the problem

Attached: casino script & arch file

Thanks & Regards

Vinod
Attachments
Script and Arch file.tar
casino script & arch file
(6 KiB) Downloaded 1459 times
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Re: Regarding walltime for long queue

Post by vinod_ashokan »

Dear CASNO users,
I have been able to put my CASINO program in a long queue (8nodes x 8 cores). It has accepted the program. By login directly on one of the nodes and using top command it is showing only one node i.e fb11-nx-node18 is working and others are not. The program is reserving all the nodes correctly but
is only calculating on the first node ( fb11-nx-node18) from which it was started. Could you please clarify how should I enable the CASINO to use all the resources.

Command output is appended below:

nxfb1120@fb11-nx-main:~$ qstat -n

fb11-nx-main.fh-muenster.de:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
4473.fb11-nx-mai nxfb1120 long casino 16182 8 64 -- 240:0 R 68:39
fb11-nx-node18/7+fb11-nx-node18/6+fb11-nx-node18/5+fb11-nx-node18/4
+fb11-nx-node18/3+fb11-nx-node18/2+fb11-nx-node18/1+fb11-nx-node18/0
+fb11-nx-node34/7+fb11-nx-node34/6+fb11-nx-node34/5+fb11-nx-node34/4
+fb11-nx-node34/3+fb11-nx-node34/2+fb11-nx-node34/1+fb11-nx-node34/0
+fb11-nx-node33/7+fb11-nx-node33/6+fb11-nx-node33/5+fb11-nx-node33/4
+fb11-nx-node33/3+fb11-nx-node33/2+fb11-nx-node33/1+fb11-nx-node33/0
+fb11-nx-node32/7+fb11-nx-node32/6+fb11-nx-node32/5+fb11-nx-node32/4
+fb11-nx-node32/3+fb11-nx-node32/2+fb11-nx-node32/1+fb11-nx-node32/0
+fb11-nx-node31/7+fb11-nx-node31/6+fb11-nx-node31/5+fb11-nx-node31/4
+fb11-nx-node31/3+fb11-nx-node31/2+fb11-nx-node31/1+fb11-nx-node31/0
+fb11-nx-node30/7+fb11-nx-node30/6+fb11-nx-node30/5+fb11-nx-node30/4
+fb11-nx-node30/3+fb11-nx-node30/2+fb11-nx-node30/1+fb11-nx-node30/0
+fb11-nx-node29/7+fb11-nx-node29/6+fb11-nx-node29/5+fb11-nx-node29/4
+fb11-nx-node29/3+fb11-nx-node29/2+fb11-nx-node29/1+fb11-nx-node29/0
+fb11-nx-node28/7+fb11-nx-node28/6+fb11-nx-node28/5+fb11-nx-node28/4
+fb11-nx-node28/3+fb11-nx-node28/2+fb11-nx-node28/1+fb11-nx-node28/0


Thanks & Regards
Vinod
Sharma_Omprakash
Posts: 15
Joined: Fri Feb 27, 2015 11:20 am

Re: Regarding walltime for long queue

Post by Sharma_Omprakash »

Hi Vinod,
Add following line into your Head Scrip of casino.

-pe mpich &NPROC&

e.g.
#-! SCRIPT_HEAD:
#-! #!/bin/bash
#-! #$ -N &SCRIPT&
#-! #$ -pe mpich &NPROC&
#-! #$ -cwd
#-! #$ -S /bin/bash
#-! #$ -e &OUT&
#-! #$ -o &OUT&
#-! export PATH="&ENV.PATH&"
#-! export LD_LIBRARY_PATH="&ENV.LD_LIBRARY_PATH&"
#-! SCRIPT_RUN:
#-! mpirun -np &NPROC& &BINARY&
#-! SUBMIT_SCRIPT: qsub -q long.q &SCRIPT&
Let me know if it works.

Best,
Rajesh
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Re: Regarding walltime for long queue

Post by vinod_ashokan »

Dear CASINO users,
Now I have been able to allocate the available cores of the cluster by qsub casino script which is attached. But this is not generating the output file. Could you please list the possible issues ?.

Attached: casno scipt, CASINO_ARCHs


Thanks & Regards
Vinod
Attachments
linuxpc-gnu-pbs-parallel.fb11-nx-main_KNP.arch.gz
(1.38 KiB) Downloaded 1473 times
casino.tar
(3.5 KiB) Downloaded 1507 times
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Re: Regarding walltime for long queue

Post by vinod_ashokan »

Dear CASNO users,
I have installed CASINO code in high performance computing cluster without any error. When I submit the job with runqmc --nnode=2 --ppn=8 -T 1h, the program is running only in the first node, whereas it engage the other node. I have checked it by login into the node with the top command. The out put of top command in node18 and node34 is attached. The problem with casino is that it is not using the full resource made available to it, could you please list the possible error ?

-The command submitted and output generated with --verbosity=5 is given below:

nxfb1120@fb11-nx-main:~/N99b0.5rs0.3/vmc2$ runqmc --nnode=2 --ppn=8 -T 1h --verbosity=5
Loading tags from linuxpc-mpif90-pbs-parallel.fb11-nx-main_anu.arch
TYPE of machine is 'cluster'
Have set SUBMIT_SCRIPT='qsub &SCRIPT&'
Dependency tree:
tags[2] = ALLOWED_NCORE ALLOWED_NNODE ALLOWED_WALLTIME CORES_PER_NODE
CORES_PER_NODE_CLUSTER MAX_CORETIME MAX_NCORE MAX_NNODE MAX_WALLTIME
MIN_CORETIME MIN_NCORE MIN_NNODE MIN_WALLTIME TIME_FORMAT WALLTIME_CODES
vars[1] = BINARY ENV.LD_LIBRARY_PATH ENV.PATH META.RUN_TOPOLOGY OUT SCRIPT
WALLTIME
tags[1] = SCRIPT_HEAD SCRIPT_RUN SUBMIT_SCRIPT
Evaluated ALLOWED_NCORE=''
Evaluated ALLOWED_NNODE=''
Evaluated ALLOWED_WALLTIME=''
Evaluated CORES_PER_NODE='8'
Evaluated CORES_PER_NODE_CLUSTER='8'
Evaluated MAX_CORETIME=''
Evaluated MAX_NCORE='136'
Evaluated MAX_NNODE=''
Evaluated MAX_WALLTIME='10d'
Evaluated MIN_CORETIME=''
Evaluated MIN_NCORE=''
Evaluated MIN_NNODE=''
Evaluated MIN_WALLTIME=''
Evaluated TIME_FORMAT='H:MM:SS'
Evaluated WALLTIME_CODES=''
Evaluated
ENV.PATH='/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/thinlinc/bin:/usr/local/maui/bin:/home/ms/n/nxfb1120/CASINO/bin_qmc'
CORES_PER_NODE_CLUSTER is defined: CORES_PER_NODE overridden
Evaluated CORES_PER_NODE=8
Evaluated MAX_NCORE=136
Made var_NNODE_TOTAL=2 as var_NJOB=1 and var_NNODE=2
Made var_NPROC=16 as var_PPN=8 and var_NNODE=2
Made var_NPROC_TOTAL=16 as var_NJOB=1 and var_NPROC=16
Made nthread=16 as var_TPP=1 and var_NPROC=16
Made nthread_total=16 as var_NJOB=1 and nthread=16
Made var_TPN=8 as var_TPP=1 and var_PPN=8
Evaluated NPROC=16
Evaluated TPP=1
Evaluated PPN=8
Evaluated NNODE=2
Evaluated MAX_WALLTIME=10d
Evaluated WALLTIME='1:00:00'
Evaluated SCRIPT_HEAD:
#!/bin/bash
#PBS -N casino
#PBS -l nodes=2:ppn=8
#PBS -l walltime=1:00:00
#PBS -j oe
#PBS -o /home/ms/n/nxfb1120/N99b0.5rs0.3/vmc2/.err
#PBS -r n
export PATH="/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/thinlinc/bin:/usr/local/maui/bin:/home/ms/n/nxfb1120/CASINO/bin_qmc"
export LD_LIBRARY_PATH=""
Evaluated SCRIPT_RUN:
mpirun -np 16 /home/ms/n/nxfb1120/CASINO/bin_qmc/linuxpc-mpif90-pbs-parallel.fb11-nx-main_anu/opt/casino
Evaluated SUBMIT_SCRIPT='qsub casino'
4799.fb11-nx-main.fh-muenster.de


-The allocated node output of command qstat -n is mentioned below:

nxfb1120@fb11-nx-main:~/N99b0.5rs0.3/vmc2$ qstat -n

fb11-nx-main.fh-muenster.de:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
4799.fb11-nx-mai nxfb1120 default casino -- 2 16 -- 01:00 R --
fb11-nx-node18/7+fb11-nx-node18/6+fb11-nx-node18/5+fb11-nx-node18/4
+fb11-nx-node18/3+fb11-nx-node18/2+fb11-nx-node18/1+fb11-nx-node18/0
+fb11-nx-node34/7+fb11-nx-node34/6+fb11-nx-node34/5+fb11-nx-node34/4
+fb11-nx-node34/3+fb11-nx-node34/2+fb11-nx-node34/1+fb11-nx-node34/0

-The screen shot of top command in node18, node34 and casino arch file is attached


Hope to hear from you soon

Thanks & Regards
Vinod
Attachments
files.tar
(335.5 KiB) Downloaded 1450 times
Neil Drummond
Posts: 113
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: Regarding walltime for long queue

Post by Neil Drummond »

Dear Vinod,

Could you ask the computer centre to provide you with a sample job-submission script for submitting a job to execute the CASINO binary, which can be found in

CASINO/bin_qmc/$CASINO_ARCH/opt/casino

on a given number of cores?

Once you have a working job-submission script, it will be easier to set up the arch files to generate job-submission scripts automatically.

Best wishes,

Neil.
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Parallelization of casino at Muesnter cluster

Post by vinod_ashokan »

Dear Dr Drummonad,
Please find enclosed the working script of Muenster to calculate pi in cpp attached as pi_mpi.tar.gz. The installed casino arch linuxpc-gnu-pbs-parallel.fb11-nx-main_29March2017.arch.tar is attached for your reference. The problem with casino is that, when I include the line -machinefile $PBS_NODEFILE in the SCRPT RUN

#-! SCRIPT_RUN:
#-! mpirun -machinefile $PBS_NODEFILE -np &NPROC& &BINARY&

than the casino uses all resources (uses all allocated nodes), but casino hang up and not generate any output files, except the out file which hangup in one stage.

Whereas when I remove -machinefile $PBS_NODEFILE in the mpirun, and runqmc, than the casino work perfectly fine only in single node, and not parallelizing.

The CASINO binary available in the location CASINO/bin_qmc/$CASINO_ARCH/opt/casino is attached

Note: for the installation first I tried the automatic installation by slecting the suggested possible match (linuxpc-gcc-pbs-parallel.arch). As this did not resolve the issue, so I did the installatin interactively and edited the CASINO_ARCH file by including (include $(INCBASE)/linuxpc-gcc-pbs-parallel.arch), and than compiled it again.

Could you please correct the CASINO_ARCH file or any other mistake I am making for the installation of casino at Muenster cluster.

Thanks & Regards
Vinod



Neil Drummond wrote:Dear Vinod,

Could you ask the computer centre to provide you with a sample job-submission script for submitting a job to execute the CASINO binary, which can be found in

CASINO/bin_qmc/$CASINO_ARCH/opt/casino

on a given number of cores?

Once you have a working job-submission script, it will be easier to set up the arch files to generate job-submission scripts automatically.

Best wishes,

Neil.
Attachments
pi_mpi.tar.gz
Working script of cluster
(29.1 KiB) Downloaded 1439 times
linuxpc-gnu-pbs-parallel.fb11-nx-main_29March2017.arch.tar.gz
CASINO_ARCH
(1.24 KiB) Downloaded 1441 times
casino.tar.gz
CASINO binary file
(4.53 MiB) Downloaded 1447 times
Neil Drummond
Posts: 113
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: Regarding walltime for long queue

Post by Neil Drummond »

Dear Vinod,

What happens if you take the example submit.cmd script and replace "pi_mpi" with "casino", then copy the CASINO binary from ~/CASINO/bin_qmc/$CASINO_ARCH/opt/casino into the working directory and submit the job?

I can't help much with making a working job-submission script as I don't have access to this cluster. You are probably better off asking the support people to help you get a working job-submission script.

Best wishes,

Neil.
vinod_ashokan
Posts: 6
Joined: Wed May 25, 2016 5:09 pm

Re: Regarding walltime for long queue

Post by vinod_ashokan »

Dear Dr. Drummond,
Thanks for your prompt reply. As per your suggestion I ran the program as follows:
1. Copied the CASINO binary file from ~/CASINO/bin_qmc/$CASINO_ARCH/opt/casino into the working directory
2. Copied the submit.cmd script in the working directory by replacing 'pi_mpi' with 'casino'
3. qsub submit.cmd
4. The generated output file gets hanged at one stage is attached

I get the similar out if I ran with the procedure you suggested and with runqmc --nnode=2 --ppn=4

Thanks & Regards
Vinod

Neil Drummond wrote:Dear Vinod,

What happens if you take the example submit.cmd script and replace "pi_mpi" with "casino", then copy the CASINO binary from ~/CASINO/bin_qmc/$CASINO_ARCH/opt/casino into the working directory and submit the job?

I can't help much with making a working job-submission script as I don't have access to this cluster. You are probably better off asking the support people to help you get a working job-submission script.

Best wishes,

Neil.
Attachments
out.tar.gz
(1.47 KiB) Downloaded 1415 times
Neil Drummond
Posts: 113
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: Regarding walltime for long queue

Post by Neil Drummond »

Sorry, I've no idea what the problem is.

Best wishes,

Neil.
Post Reply