Hello CASINO developers. I've found after upgrade to v2.13.673 from v2.13.639 that newest version fails when running with ANY multi determinant WFN in parallel mod with message:
--Job's stderr--
[vladimir-Kubuntu-16:17932] *** An error occurred in MPI_Bcast
[vladimir-Kubuntu-16:17932] *** reported by process [234618881,2]
[vladimir-Kubuntu-16:17932] *** on communicator MPI_COMM_WORLD
[vladimir-Kubuntu-16:17932] *** MPI_ERR_TRUNCATE: message truncated
[vladimir-Kubuntu-16:17932] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[vladimir-Kubuntu-16:17932] *** and potentially your MPI job)
[vladimir-Kubuntu-16:17924] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[vladimir-Kubuntu-16:17924] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
what could be the cause of the error?
Best, Vladimir.
single determinant WFN works fine in both versions.
I compile my CASINO with linuxpc-gcc-parallel.openblas.arch
Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)
Thanks for the report. Do you have an example that fails? (The MDET examples in CASINO/examples/TEST seem to work OK, at least with the gfortran and NAG compilers.)
Best wishes,
Neil.
Hello Neil.
My example is in the attachment. I'm still using the old version, it works completely.
Thanks very much for reporting the problem and sorry for any inconvenience. The bug was introduced in 2.13.650. The issue is that mdet_max_mods needs to be broadcast before it is used in READGW in gaussians.f90. I've attached the git patch that I've just sent to Mike.