[mvapich-discuss] Using mvapich2 on a partitioned infiniband network

Hari Subramoni subramon at cse.ohio-state.edu
Thu Dec 6 13:59:59 EST 2012


Hi Jesper,

Sorry to hear that things are not working correctly for you. Your
understanding is correct. Setting the "MV2_DEFAULT_PKEY" environment
variable should work make MVAPICH2 use the specified PKEY for communication.

I have a couple of questions:

1. Did the MVAPICH2 library print out any error messages when you tried to
run it? If the library was not able to use the PKey you specified, it
should have printed out a message like "Can't find PKEY INDEX according to
given PKEY". Did you observe something like this? This would mean that the
HCA on the node where you tried to run the job did not have the specified
PKey in its table.

2. How are you launching the jobs. Are you using mpirun_rsh or some other
job launcher? Can you send us the command you are using the launch the job?

On a separate thread, I would like to make a couple of observations

1. MVAPICH2-1.7 is a little old. In order to get the best performance and
the latest set of features, we recommend that you move to our newer
releases (MVAPICH2-1.8.1 or MVAPICH2-1.9a2). You can obtain the tarballs
from the following page
http://mvapich.cse.ohio-state.edu/download/mvapich2/download.php

2. From your configuration, it looks like the library is using a mix of GNU
and Intel compilers. It might be better if you used either one.

Thanks,
Hari.

On Thu, Dec 6, 2012 at 4:06 AM, Jesper Larsen <jla at fcoo.dk> wrote:

> Hi All
>
> We have a new InfiniBand (IB) network which is partitioned into a
> development part and a production part - both on the same IB switch. There
> are some nodes and a frontend in each part of the network. This allows us
> to change stuff in the development partition without having to worry about
> messing up the production partition (which would be really bad).
>
> The way the partitioning works is essentially that we configure the subnet
> manager to disallow communication using the default pkey (except with the
> switch which runs the subnet manager). The development partition then has
> its own pkey with the development nodes and the development frontend as
> members. The same goes for the production partition. IPoIB is enabled.
>
> As far as I understand the MPI communication is done directly using a
> pkey. Normally the default pkey. I have to use another pkey and have tried
> to set the variable: MV2_DEFAULT_PKEY to the pkey of the development
> partition. But without luck. Any ideas what I am doing wrong? Or how to see
> which pkeys are actually used in the communication?
>
> My system is:
>
> $ mpiname -a
> MVAPICH2 1.7 Thu Oct 13 17:31:44 EDT 2011 ch3:mrail
>
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: c++   -DNDEBUG -DNVALGRIND -O2
> F77: ifort -i-dynamic   -O2
> FC: ifort -i-dynamic   -O2
>
> Configuration
> --prefix=/usr/mpi/intel/mvapich2-1.7 --enable-shared F77=ifort -i-dynamic
> FC=ifort -i-dynamic
>
>
> Best regards,
> Jesper
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121206/f9985af7/attachment.html


More information about the mvapich-discuss mailing list