[mvapich-discuss] [SPAM] "PMI Lookup name failed" when RDMA CM is used

Marcus R. Epperson mrepper at sandia.gov
Tue Mar 24 19:30:33 EDT 2009


We have an Infiniband cluster which will require the use of RDMA CM, and
which uses the Slurm resource manager for job launch.  I'm trying to
verify that mvapich2-1.2p1 will work with this combination but I'm not
having much luck so far.

I am able to run successfully when I don't enable mvapich2's RDMA CM
option (this won't be possible long-term though):

$ srun --mpi=none -w 'c1,c3' ./mpi_hello
   Hello, I am node c1 with rank 0
   Hello, I am node c3 with rank 1

But when I enable it I get this:

$ export MV2_USE_RDMA_CM=1
$ srun --mpi=none -w 'c1,c3' ./mpi_hello
   [1] Abort: PMI Lookup name failed
    at line 810 in file rdma_cm.c
   [0] Abort: PMI Lookup name failed
    at line 810 in file rdma_cm.c
   srun: error: c1: task 0: Exited with exit code 253
   srun: error: c3: task 1: Exited with exit code 253

I believe these nodes are configured correctly according to #6.4 here:

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-300006.4

i.e. IPoIB is set up:

# pdsh -w 'c[1,3]' "ifconfig ib0 | grep inet.addr"
c1:   inet addr:192.168.2.1  Bcast:192.168.2.255  Mask:255.255.255.0
c3:   inet addr:192.168.2.3  Bcast:192.168.2.255  Mask:255.255.255.0

and mv2.conf is present on each node:

# pdsh -w 'c[1,3]' "cat /etc/mv2.conf"
c1: 192.168.2.1
c3: 192.168.2.3

Have I missed something, or is this a bug?  If it's a bug, is it with
mvapich2 or should I be looking elsewhere?

Thanks for any help,
-Marcus Epperson



More information about the mvapich-discuss mailing list