[mvapich-discuss] hang while trying to run MPI in a Heterogeneous network (One Node with 2 HCA and the other Node with a single HCA)

Rajeev.c.p rajeevcp at yahoo.com
Thu Feb 27 15:27:19 EST 2014


Hi,

We are facing issues while trying to run MPI in a hetrogeneous IB network. The configuration we have is given below
We have 2 Linux Nodes
Node1 with 2 FDR HCA's  = 192.168.1.4
Node 2 with 1 FDR HCA  = 192.168.1.3  . All the HCA cards are from Mellanox and they are connected to a FDR IB switch.

We are trying to run a bandwidth program using MPI_SEND and MPI_RECEIVE with the following command line
/home/klac/mvapich2-2.2.0/bin/mpiexec -np 2 -hosts 192.168.1.4,192.168.1.3 -env MV2_NUM_HCAS=2,MV2_IB_HCA=mlx4_0:mlx4_1 ./RunMPIBWTest 1024 1024 . This hangs indefinitely and does not returns. Below is the last set of traces from the by enabling verbose

[mpiexec at IMCRecipe00] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at IMCNode001] got pmi command (from 4): barrier_in

[proxy:0:1 at IMCNode001] forwarding command (cmd=barrier_in) upstream

I have 3 questions 
(1) Is the above command line correct to use both the HCA's to send Data to the client node so that we get increased bandwidth
(2) There was earlier posts which was talking about hangs in the heterogeneous networks is that resolved with MVapich 2.2
(3) Are the MV2_NUM_HCA and MV2_IB_HCA environment variables  applicable to MPI_SEND and MPI_RECEIVE command or does it work only with RDMA?

Thanks and Regards
Rajeev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140227/0e479cfc/attachment.html>


More information about the mvapich-discuss mailing list