parallel version breaks on multi determinant WFN

General discussion of the Cambridge quantum Monte Carlo code CASINO; how to install and setup; how to use it; what it does; applications.
Post Reply
Vladimir_Konjkov
Posts: 138
Joined: Wed Apr 15, 2015 3:14 pm

parallel version breaks on multi determinant WFN

Post by Vladimir_Konjkov »

Hello CASINO developers. I've found after upgrade to v2.13.673 from v2.13.639 that newest version fails when running with ANY multi determinant WFN in parallel mod with message:
--Job's stderr--

[vladimir-Kubuntu-16:17932] *** An error occurred in MPI_Bcast
[vladimir-Kubuntu-16:17932] *** reported by process [234618881,2]
[vladimir-Kubuntu-16:17932] *** on communicator MPI_COMM_WORLD
[vladimir-Kubuntu-16:17932] *** MPI_ERR_TRUNCATE: message truncated
[vladimir-Kubuntu-16:17932] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[vladimir-Kubuntu-16:17932] *** and potentially your MPI job)
[vladimir-Kubuntu-16:17924] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[vladimir-Kubuntu-16:17924] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
what could be the cause of the error?

Best, Vladimir.

single determinant WFN works fine in both versions.
I compile my CASINO with linuxpc-gcc-parallel.openblas.arch
In Soviet Russia Casino plays you.
Neil Drummond
Posts: 95
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: parallel version breaks on multi determinant WFN

Post by Neil Drummond »

Dear Vladimir,

Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)

Best wishes,

Neil.
Vladimir_Konjkov
Posts: 138
Joined: Wed Apr 15, 2015 3:14 pm

Re: parallel version breaks on multi determinant WFN

Post by Vladimir_Konjkov »

Neil Drummond wrote:Dear Vladimir,

Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)

Best wishes,

Neil.
Hello Neil.

My example is in the attachment. I'm still using the old version, it works completely.

Vladimir.

Happy New Year!!!!
Attachments
MPI_issue.tgz
example
(22.55 KiB) Downloaded 820 times
In Soviet Russia Casino plays you.
Neil Drummond
Posts: 95
Joined: Fri May 31, 2013 10:42 am
Location: Lancaster
Contact:

Re: parallel version breaks on multi determinant WFN

Post by Neil Drummond »

Dear Vladimir,

Thanks very much for reporting the problem and sorry for any inconvenience. The bug was introduced in 2.13.650. The issue is that mdet_max_mods needs to be broadcast before it is used in READGW in gaussians.f90. I've attached the git patch that I've just sent to Mike.

Happy New Year!

Best wishes,

Neil.
Attachments
0001-Fixed-bug-affecting-Gaussian-and-Slater-type-multide.patch.gz
(1.24 KiB) Downloaded 820 times
Mike Towler
Posts: 237
Joined: Thu May 30, 2013 11:03 pm
Location: Florence
Contact:

Re: parallel version breaks on multi determinant WFN

Post by Mike Towler »

Neil's fix is now in the public distribution.

Happy New Year to all!

M.
Post Reply