[mvapich-discuss] Using mvapich2 on a partitioned infiniband network

Devendar Bureddy bureddy at cse.ohio-state.edu
Mon Jan 14 10:07:14 EST 2013


Thanks Jesper for verifying the patch. I am cc'ing this note to
MVAPICH-discuss so that we can close this report.  For everyone
information, attached patch will fix this issue. This patch will also
be there in next available MVPAICH2-1.9 release.

-Devendar

On Mon, Jan 14, 2013 at 6:13 AM, Jesper Larsen <jla at fcoo.dk> wrote:
> Hi Devendar
>
> Sorry again for the late reply. I tried the patch again and it seems to work now.
>
> Best regards
> Jesper
>
>> >>> >> > Thanks for your comment and sorry for my late reply. Here is
>> the
>> >>> >> error
>> >>> >> > message:
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > [jla at bifrost mpi_examples]$  mpirun_rsh -n 2 -hostfile
>> >>> >> > ./hostfile.tmp
>> >>> >> > MV2_DEFAULT_PKEY=0x0001 ./hello_mpi_intel
>> >>> >> >
>> >>> >> > [cli_1]: aborting job:
>> >>> >> >
>> >>> >> > Fatal error in MPI_Init:
>> >>> >> >
>> >>> >> > Internal MPI error!
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > [dn003:mpispawn_1][readline] Unexpected End-Of-File on file
>> >>> >> descriptor 5.
>> >>> >> > MPI process died?
>> >>> >> >
>> >>> >> > [dn003:mpispawn_1][mtpmi_processops] Error while reading PMI
>> >>> socket.
>> >>> >> MPI
>> >>> >> > process died?
>> >>> >> >
>> >>> >> > [dn003:mpispawn_1][child_handler] MPI process (rank: 1, pid:
>> >>> 22922)
>> >>> >> exited
>> >>> >> > with status 1
>> >>> >> >
>> >>> >> > [jla at bifrost mpi_examples]$ [dn002:mpispawn_0][read_size]
>> >>> >> > Unexpected End-Of-File on file descriptor 7. MPI process died?
>> >>> >> >
>> >>> >> > [dn002:mpispawn_0][handle_mt_peer] Error while reading PMI
>> socket.
>> >>> >> MPI
>> >>> >> > process died?
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > So it finds the PKEY. When I try with a non-existent PKEY it
>> gives
>> >>> >> > me
>> >>> >> a
>> >>> >> > "Can't find PKEY INDEX..." error. For this test I am using
>> >>> mpirun_rsh.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > I should also remark that it works fine to use the
>> >>> >> MV2_DEFAULT_PKEY=0x0001
>> >>> >> > option when the default partition is not restricted:
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > # With this setting for the default partition it does not work
>> >>> (the
>> >>> >> last
>> >>> >> > GUID is the switch)
>> >>> >> >
>> >>> >> > Default=0x7fff, ipoib, rate=7, defmember=limited: ALL,
>> SELF=full,
>> >>> >> > 0x0008f1050010800c=full;
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > # With this setting it does
>> >>> >> >
>> >>> >> > Default=0x7fff, ipoib, rate=7, defmember=full: ALL;
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Is it correct that MPI will _only_ communicate via
>> >>> >> MV2_DEFAULT_PKEY=0x0001
>> >>> >> > when I specify that? Or is there in all cases some
>> communication
>> >>> on
>> >>> >> the
>> >>> >> > default partition?
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Best regards,
>> >>> >> >
>> >>> >> > Jesper
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Ps. Thanks for your other comments. We are actually only using
>> >>> >> Fortran for
>> >>> >> > our MPI work. And we therefore only have a Fortran license for
>> the
>> >>> >> Intel
>> >>> >> > compiler. Therefore the mix of GNU and Intel compilers. We use
>> the
>> >>> >> OFED
>> >>> >> > package which provides mvapich2-1.7. But if there are
>> compelling
>> >>> >> reasons to
>> >>> >> > upgrade we will of course consider thatJ
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > From: hari.subramoni at gmail.com
>> [mailto:hari.subramoni at gmail.com]
>> >>> On
>> >>> >> Behalf
>> >>> >> > Of Hari Subramoni
>> >>> >> > Sent: Thursday, December 06, 2012 8:00 PM
>> >>> >> > To: Jesper Larsen
>> >>> >> > Cc: mvapich-discuss at cse.ohio-state.edu
>> >>> >> > Subject: Re: [mvapich-discuss] Using mvapich2 on a partitioned
>> >>> >> infiniband
>> >>> >> > network
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Hi Jesper,
>> >>> >> >
>> >>> >> > Sorry to hear that things are not working correctly for you.
>> Your
>> >>> >> > understanding is correct. Setting the "MV2_DEFAULT_PKEY"
>> >>> >> > environment variable should work make MVAPICH2 use the
>> specified
>> >>> >> > PKEY for
>> >>> >> communication.
>> >>> >> >
>> >>> >> > I have a couple of questions:
>> >>> >> >
>> >>> >> > 1. Did the MVAPICH2 library print out any error messages when
>> you
>> >>> >> tried to
>> >>> >> > run it? If the library was not able to use the PKey you
>> specified,
>> >>> >> > it
>> >>> >> should
>> >>> >> > have printed out a message like "Can't find PKEY INDEX
>> according
>> >>> to
>> >>> >> given
>> >>> >> > PKEY". Did you observe something like this? This would mean
>> that
>> >>> >> > the
>> >>> >> HCA on
>> >>> >> > the node where you tried to run the job did not have the
>> specified
>> >>> >> PKey in
>> >>> >> > its table.
>> >>> >> >
>> >>> >> > 2. How are you launching the jobs. Are you using mpirun_rsh or
>> >>> some
>> >>> >> other
>> >>> >> > job launcher? Can you send us the command you are using the
>> launch
>> >>> >> the job?
>> >>> >> >
>> >>> >> > On a separate thread, I would like to make a couple of
>> >>> observations
>> >>> >> >
>> >>> >> > 1. MVAPICH2-1.7 is a little old. In order to get the best
>> >>> >> > performance
>> >>> >> and
>> >>> >> > the latest set of features, we recommend that you move to our
>> >>> newer
>> >>> >> releases
>> >>> >> > (MVAPICH2-1.8.1 or MVAPICH2-1.9a2). You can obtain the
>> tarballs
>> >>> >> > from
>> >>> >> the
>> >>> >> > following page
>> >>> >> > http://mvapich.cse.ohio-
>> state.edu/download/mvapich2/download.php
>> >>> >> >
>> >>> >> > 2. From your configuration, it looks like the library is using
>> a
>> >>> >> > mix
>> >>> >> of GNU
>> >>> >> > and Intel compilers. It might be better if you used either
>> one.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Hari.
>> >>> >> >
>> >>> >> > On Thu, Dec 6, 2012 at 4:06 AM, Jesper Larsen <jla at fcoo.dk>
>> wrote:
>> >>> >> >
>> >>> >> > Hi All
>> >>> >> >
>> >>> >> > We have a new InfiniBand (IB) network which is partitioned
>> into a
>> >>> >> > development part and a production part - both on the same IB
>> >>> switch.
>> >>> >> There
>> >>> >> > are some nodes and a frontend in each part of the network.
>> This
>> >>> >> allows us to
>> >>> >> > change stuff in the development partition without having to
>> worry
>> >>> >> about
>> >>> >> > messing up the production partition (which would be really
>> bad).
>> >>> >> >
>> >>> >> > The way the partitioning works is essentially that we
>> configure
>> >>> the
>> >>> >> subnet
>> >>> >> > manager to disallow communication using the default pkey
>> (except
>> >>> >> > with
>> >>> >> the
>> >>> >> > switch which runs the subnet manager). The development
>> partition
>> >>> >> > then
>> >>> >> has
>> >>> >> > its own pkey with the development nodes and the development
>> >>> >> > frontend
>> >>> >> as
>> >>> >> > members. The same goes for the production partition. IPoIB is
>> >>> >> enabled.
>> >>> >> >
>> >>> >> > As far as I understand the MPI communication is done directly
>> >>> using
>> >>> >> > a
>> >>> >> pkey.
>> >>> >> > Normally the default pkey. I have to use another pkey and have
>> >>> >> > tried
>> >>> >> to set
>> >>> >> > the variable: MV2_DEFAULT_PKEY to the pkey of the development
>> >>> >> partition. But
>> >>> >> > without luck. Any ideas what I am doing wrong? Or how to see
>> which
>> >>> >> pkeys are
>> >>> >> > actually used in the communication?
>> >>> >> >
>> >>> >> > My system is:
>> >>> >> >
>> >>> >> > $ mpiname -a
>> >>> >> > MVAPICH2 1.7 Thu Oct 13 17:31:44 EDT 2011 ch3:mrail
>> >>> >> >
>> >>> >> > Compilation
>> >>> >> > CC: gcc    -DNDEBUG -DNVALGRIND -O2
>> >>> >> > CXX: c++   -DNDEBUG -DNVALGRIND -O2
>> >>> >> > F77: ifort -i-dynamic   -O2
>> >>> >> > FC: ifort -i-dynamic   -O2
>> >>> >> >
>> >>> >> > Configuration
>> >>> >> > --prefix=/usr/mpi/intel/mvapich2-1.7 --enable-shared F77=ifort
>> -i-
>> >>> >> dynamic
>> >>> >> > FC=ifort -i-dynamic
>> >>> >> >
>> >>> >> >
>> >>> >> > Best regards,
>> >>> >> > Jesper
>> >>> >> >
>> >>> >> >
>> >>> >> > _______________________________________________
>> >>> >> > mvapich-discuss mailing list
>> >>> >> > mvapich-discuss at cse.ohio-state.edu
>> >>> >> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-
>> discuss
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > _______________________________________________
>> >>> >> > mvapich-discuss mailing list
>> >>> >> > mvapich-discuss at cse.ohio-state.edu
>> >>> >> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-
>> discuss
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Devendar
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Devendar
>> >
>> >
>> >
>> > --
>> > Devendar
>>
>>
>>
>> --
>> Devendar



-- 
Devendar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.patch
Type: application/octet-stream
Size: 1384 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130114/1882e0ee/diff-0001.obj


More information about the mvapich-discuss mailing list