Page 1 of 1

parallel version breaks on multi determinant WFN

Posted: Fri Dec 29, 2017 7:23 pm
by Vladimir_Konjkov
Hello CASINO developers. I've found after upgrade to v2.13.673 from v2.13.639 that newest version fails when running with ANY multi determinant WFN in parallel mod with message:
--Job's stderr--

[vladimir-Kubuntu-16:17932] *** An error occurred in MPI_Bcast
[vladimir-Kubuntu-16:17932] *** reported by process [234618881,2]
[vladimir-Kubuntu-16:17932] *** on communicator MPI_COMM_WORLD
[vladimir-Kubuntu-16:17932] *** MPI_ERR_TRUNCATE: message truncated
[vladimir-Kubuntu-16:17932] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[vladimir-Kubuntu-16:17932] *** and potentially your MPI job)
[vladimir-Kubuntu-16:17924] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[vladimir-Kubuntu-16:17924] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
what could be the cause of the error?

Best, Vladimir.

single determinant WFN works fine in both versions.
I compile my CASINO with linuxpc-gcc-parallel.openblas.arch

Re: parallel version breaks on multi determinant WFN

Posted: Sat Dec 30, 2017 10:07 pm
by Neil Drummond
Dear Vladimir,

Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)

Best wishes,

Neil.

Re: parallel version breaks on multi determinant WFN

Posted: Sun Dec 31, 2017 1:33 am
by Vladimir_Konjkov
Neil Drummond wrote:Dear Vladimir,

Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)

Best wishes,

Neil.
Hello Neil.

My example is in the attachment. I'm still using the old version, it works completely.

Vladimir.

Happy New Year!!!!

Re: parallel version breaks on multi determinant WFN

Posted: Sun Dec 31, 2017 10:55 pm
by Neil Drummond
Dear Vladimir,

Thanks very much for reporting the problem and sorry for any inconvenience. The bug was introduced in 2.13.650. The issue is that mdet_max_mods needs to be broadcast before it is used in READGW in gaussians.f90. I've attached the git patch that I've just sent to Mike.

Happy New Year!

Best wishes,

Neil.

Re: parallel version breaks on multi determinant WFN

Posted: Mon Jan 01, 2018 11:50 am
by Mike Towler
Neil's fix is now in the public distribution.

Happy New Year to all!

M.