[mvapich-discuss] MVAPICH2 version 1.8 hangs on MPI_Finalize when using nemesis

Devendar Bureddy bureddy at cse.ohio-state.edu
Thu Aug 16 17:59:28 EDT 2012


Hi Carson

MVAPICH2 internally uses it's own memory module called "ptmalloc".  It
seems there is some weird interaction with shared libraries when Perl is
being used with MVAPICH2. You should not see this behavior with C
application.

You can work around issues with Perl in the following way.

1. With Nemesis:IB :

This hang is surfaced because of the ptmalloc initialization failure. The
attached patch should fix this. Please follow the instructions indicated
below to apply the patch:

[$ cd mvapich2-1.8
$ patch -p1 < diff.nemesis
$ make && make install


2. With  -with-device=ch3:mrail --with-rdma=gen2 :

adding   "--disable-registration-cache" to the configuration should
solve failures with > 64 cores

Please note that, with both of the above fixes, MVAPICH2 internally will
run without "registration cache", a feature for performance optimization.
Thus, you may see some performance degradation. We are looking into this
issue further and will get back to you.

Let us know whether you are able to run larger-scale jobs with the above
fixes for nemesis:ib and -with-device=ch3:mrail --with-rdma=gen2

Thanks
Devendar

On Thu, Aug 16, 2012 at 11:39 AM, Carson Holt <Carson.Holt at oicr.on.ca> wrote:
> MVAPICH2 version 1.8 hangs on MPI_Finalize when using nemesis.  The hanging
> only happens when configured with device=ch3:nemesis:ib.  The changelog for
> version 1.8 says this bug is fixed, but it is still happening.
>
> I am using shared libraries to call MVAPICH2 from perl, but even if the only
> calls are MPI_init() and then immediately MPI_Finalize() it still hangs
> (i.e. no Send or Recv calls).  I have attached a minimal test script I use
> for testing that consistently causes the error (run it from perl with the
> Inline::C module installed).
>
> Here are the results of mpich2version script showing the configuration used
> when installing MVAPICH2 -->
> MVAPICH2 Version:     1.8
> MVAPICH2 Release date: Mon Apr 30 14:56:40 EDT 2012
> MVAPICH2 Device:       ch3:nemesis
> MVAPICH2 configure:   --enable-romio --with-file-system=nfs+ufs
> --enable-sharedlibs=gcc --with-ib-include=/usr/include
> --with-ib-libpath=/usr/lib64 --enable-shared --with-device=ch3:nemesis:ib
> --enable-cxx
> MVAPICH2 CC:   gcc -fPIC -D_GNU_SOURC -D_GNU_SOURCE   -DNDEBUG -DNVALGRIND
> -O2
> MVAPICH2 CXX: g++ -fPIC  -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77: gfortran -fPIC  -O2
> MVAPICH2 FC:   gfortran -fPIC  -O2
>
>
> Also when MVAPICH2 is installed with --with-device=ch3:mrail
> --with-rdma=gen2 (the default Linux settings), everything works correctly
> for my test script; however, I can only launch on up to exactly 64
> processors (anything above that fails).  This is for both mpirun_rsh and
> mpiexec.  Is this a known limitation of the default interface?  Using the
> nemesis configuration on the other hand, I don't hit a processor limit, but
> my application freezes on the MPI_Finalize call.
>
> Thanks,
> Carson
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Devendar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.nemesis
Type: application/octet-stream
Size: 2176 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120816/b030e07c/diff.obj


More information about the mvapich-discuss mailing list