[mvapich-discuss] Fatal error in MPI_Comm_spawn

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Dec 7 14:38:31 EST 2011


Hello,
Thanks for your message.  We're currently investigating the behavior
that you've found related to make installcheck and make testing.

Regarding the installcheck error(s).  We believe that these tests
should fail but perhaps in an unpredictible way unless
--enable-error-checking=all is used.  We'll update you on our findings
here.

Regarding the make testing error(s).  The spawn tests require the
Dynamic Process Management support to be enabled.  In MPICH2 this is
enabled by default however in MVAPICH2 this requires setting the
MV2_SUPPORT_DPM environment variable to 1.

You can also try running the osu benchmarks installed in
$PREFIX/libexec/osu-micro-benchmarks/ if you'd like to verify the
installation of MVAPICH2 ($PREFIX corresponds to the directory you
specified when configuring MVAPICH2).

The first error you experienced seems to be related to your
installation of the infiniband drivers.  It seems that the rdmacm
drivers may not have been installed or running properly.

For more information about the osu benchmarks are DPM please see our userguide.

DPM:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8_alpha1_p1.html#x1-370005.2.5

OMB:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8_alpha1_p1.html#x1-740007

Please let me know if this info helps.
2011/12/7 ZhangXP <soaliap at 126.com>:
> Hi,
>
> I compiled mvapich2-1.7 with default interface in a single computer with 4
> cores, and the system I used is Red Hat Enterprise Linux Server release 6.0.
> It seemed all successful when I executed "./configure", "make", "make
> install". But when I executed "make installcheck" and "make testing", there
> were some errors.
>
>
> 1. make installcheck
>     (1). When I executed "make installcheck", there were some errors:
>
> ========================================================================================
>         CMA: unable to get RDMA device list
>         librdmacm: couldn't read ABI version.
>         librdmacm: assuming: 4
>     ========================================
> ================================================
>
>     (2). I executed "make uninstalled", "make clean" and "make distclean",
> and then executed "./configure --disable-rdma-cm", "make", "make install"
> and "make installcheck" again. But there still were some errors:
>
> ========================================================================================
>         Running installation runtest for C collchk program...
>         *** Test C program with the MPI collective/datatype checking library
> ..... No.
>         The failed command is :
>         /usr/local/mvapich2-1.7/bin/mpiexec -n 4 ./wrong_int_byte
>         Starting MPI Collective and Datatype Checking!
>         Backtrace of the callstack at rank 3:
>         A t [0]: ./wrong_int_byte(CollChk_err_han+0x16f)[0x41e86f]
>
>         [cli_3]: aborting job:
>         Fatal error in MPI_Comm_call_errhandler:
>         Collective Checking: BCAST (Rank 3) --> Inconsistent datatype
> signatures detected between rank 3 and rank 0.
>
>         Running installation runtest for Fortran collchk program...
>         *** Test F77 program with the MPI collective/datatype checking
> library ... Yes.
>
> ========================================================================================
>     I saw some infomation like the errors above from old document
> "mpich2-doc-user.pdf". It says "The error message here shows that the MPI
> Bcast has been used wi! th inconsistent datatype in the program wrong
> reals.f". The code in wrong_int_byte.c:
>
>
>  ========================================================================================
>         if ( rank == size-1 )
>             /* Create pathological case */
>             MPI_Bcast( &ibuff, sizeof(int), MPI_BYTE, 0, MPI_COMM_WORLD );
>         else
>             MPI_Bcast( &ibuff, 1, MPI_INT, 0, MPI_COMM_WORLD );
>
>  ========================================================================================!
>     What I confused were:
>
>     a). Whether the file name "wrong_int_byte" means it would execute
> failed?
>     b). Why "wrong_int_byte" failed and "wrong_reals" not failed?
>
> 2. make testing
> (1). When I executing "make testing", there were many the same errors:
> ========================================================================================
> Unexpected output in spawn1: [cli_0]: aborting job:
> Unexpected output in spawn1: Fatal error in MPI_Comm_spawn:
> Unexpected output in spawn1: Other MPI error
> Unexpected output in spawn1:
> Program spawn1 exited without No Errors
> ========================================================================================
> I found these errors always accured when executing the testing examples in
> "test/mpi/spawn" directory.
> (2). I found if I configure with TCP/IP-CH3 interface, the make "make
> testing" would success. But I saw
>    "the MVAPICH team strongly recommends the use of following interfaces for
> different adapters:
>     ...
>     5) Shared-Memory-CH3 for single node SMP system and laptop"
> from "mvapich2-! 1. 7_user_guide.pdf". And I didn't understand why I
> configuring a build for default interface failed!
>
> Anybody helps me? Thanks!
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list