[mvapich-discuss] Using mvapich2 on a partitioned infiniband network

Jesper Larsen jla at fcoo.dk
Thu Dec 13 06:49:07 EST 2012


Hi Hari

Thanks for your comment and sorry for my late reply. Here is the error message:

[jla at bifrost mpi_examples]$  mpirun_rsh -n 2 -hostfile ./hostfile.tmp MV2_DEFAULT_PKEY=0x0001 ./hello_mpi_intel
[cli_1]: aborting job:
Fatal error in MPI_Init:
Internal MPI error!

[dn003:mpispawn_1][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dn003:mpispawn_1][mtpmi_processops] Error while reading PMI socket. MPI process died?
[dn003:mpispawn_1][child_handler] MPI process (rank: 1, pid: 22922) exited with status 1
[jla at bifrost mpi_examples]$ [dn002:mpispawn_0][read_size] Unexpected End-Of-File on file descriptor 7. MPI process died?
[dn002:mpispawn_0][handle_mt_peer] Error while reading PMI socket. MPI process died?

So it finds the PKEY. When I try with a non-existent PKEY it gives me a "Can't find PKEY INDEX..." error. For this test I am using mpirun_rsh.

I should also remark that it works fine to use the MV2_DEFAULT_PKEY=0x0001 option when the default partition is not restricted:

# With this setting for the default partition it does not work (the last GUID is the switch)
Default=0x7fff, ipoib, rate=7, defmember=limited: ALL, SELF=full, 0x0008f1050010800c=full;

# With this setting it does
Default=0x7fff, ipoib, rate=7, defmember=full: ALL;

Is it correct that MPI will _only_ communicate via MV2_DEFAULT_PKEY=0x0001 when I specify that? Or is there in all cases some communication on the default partition?

Best regards,
Jesper

Ps. Thanks for your other comments. We are actually only using Fortran for our MPI work. And we therefore only have a Fortran license for the Intel compiler. Therefore the mix of GNU and Intel compilers. We use the OFED package which provides mvapich2-1.7. But if there are compelling reasons to upgrade we will of course consider that:)


From: hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] On Behalf Of Hari Subramoni
Sent: Thursday, December 06, 2012 8:00 PM
To: Jesper Larsen
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Using mvapich2 on a partitioned infiniband network

Hi Jesper,

Sorry to hear that things are not working correctly for you. Your understanding is correct. Setting the "MV2_DEFAULT_PKEY" environment variable should work make MVAPICH2 use the specified PKEY for communication.

I have a couple of questions:

1. Did the MVAPICH2 library print out any error messages when you tried to run it? If the library was not able to use the PKey you specified, it should have printed out a message like "Can't find PKEY INDEX according to given PKEY". Did you observe something like this? This would mean that the HCA on the node where you tried to run the job did not have the specified PKey in its table.

2. How are you launching the jobs. Are you using mpirun_rsh or some other job launcher? Can you send us the command you are using the launch the job?

On a separate thread, I would like to make a couple of observations

1. MVAPICH2-1.7 is a little old. In order to get the best performance and the latest set of features, we recommend that you move to our newer releases (MVAPICH2-1.8.1 or MVAPICH2-1.9a2). You can obtain the tarballs from the following page http://mvapich.cse.ohio-state.edu/download/mvapich2/download.php

2. From your configuration, it looks like the library is using a mix of GNU and Intel compilers. It might be better if you used either one.

Thanks,
Hari.
On Thu, Dec 6, 2012 at 4:06 AM, Jesper Larsen <jla at fcoo.dk<mailto:jla at fcoo.dk>> wrote:
Hi All

We have a new InfiniBand (IB) network which is partitioned into a development part and a production part - both on the same IB switch. There are some nodes and a frontend in each part of the network. This allows us to change stuff in the development partition without having to worry about messing up the production partition (which would be really bad).

The way the partitioning works is essentially that we configure the subnet manager to disallow communication using the default pkey (except with the switch which runs the subnet manager). The development partition then has its own pkey with the development nodes and the development frontend as members. The same goes for the production partition. IPoIB is enabled.

As far as I understand the MPI communication is done directly using a pkey. Normally the default pkey. I have to use another pkey and have tried to set the variable: MV2_DEFAULT_PKEY to the pkey of the development partition. But without luck. Any ideas what I am doing wrong? Or how to see which pkeys are actually used in the communication?

My system is:

$ mpiname -a
MVAPICH2 1.7 Thu Oct 13 17:31:44 EDT 2011 ch3:mrail

Compilation
CC: gcc    -DNDEBUG -DNVALGRIND -O2
CXX: c++   -DNDEBUG -DNVALGRIND -O2
F77: ifort -i-dynamic   -O2
FC: ifort -i-dynamic   -O2

Configuration
--prefix=/usr/mpi/intel/mvapich2-1.7 --enable-shared F77=ifort -i-dynamic FC=ifort -i-dynamic


Best regards,
Jesper


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121213/a2167c2a/attachment.html


More information about the mvapich-discuss mailing list