[mvapich-discuss] RDMA connection establishment

Hari Subramoni subramon at cse.ohio-state.edu
Thu Jun 24 10:49:17 EDT 2010


Hi,

There is nothing in MVAPICH2 that is not operable on 32-bit machines.
We probably pushed up the 32-bit value to 64-bit value to handle
counter overflows.

We have tried running IMB-collecctive tests at 128 process size multiple
times without any such connection issues. We use the Chelsio T3 iWARP RNIC
for this. If you could give us your application we can try running it on
our cluster and see if the problem is reproducible.

Thx,
Hari.

On Thu, 24 Jun 2010, TJC Ward wrote:

> I'm trying to debug my RDMA connection problem under mvapich2-1.5rc2 , and
> I have the following observations
> 1) If I force 'rdma.c' to rebuild, I get messages
>   CC              rdma_cm.c
> rdma_cm.c: In function ?ib_cma_event_handler?:
> rdma_cm.c:171: warning: cast from pointer to integer of different size
> rdma_cm.c:198: warning: cast to pointer from integer of different size
> rdma_cm.c:220: warning: cast from pointer to integer of different size
> rdma_cm.c:267: warning: cast from pointer to integer of different size
> These are generally related to 'uint64_t'. I'm running on a 32-bit PowerPC
> cluster, and I'm wondering if there's anything in mvapich which is only
> operable on 64-bit machines.
>
> 2) 'stderr' seems to be dummied, in that I can't make the DEBUG_PRINT
> lines in rdma.c display their output anywhere (also 'fprintf(stderr,...'
> lines do not come out). ibv_va_error_abort lines do come out on the
> 'stderr' of each rank, though. I've tried adding '-DDEBUG' to the CFLAGS
> when configuring, also tried editing 'rdma.c' to set up the definition of
> DEBUG_PRINT unconditionally.
>
> 3) Setting MV2_ON_DEMAND_THRESHOLD higher than the number of nodes in the
> job looks as if it ought to try doing what I want to try, i.e. establish
> all the RDMA connections without the need for handling 'crossover';
> however I get similar failures to connect at 128 nodes, but at startup
> rather than when the test case tries an all-to-all .
>
> 4) I can't see where the logging that would be enabled by '--enable-g=log'
> should come out. I have previously used mpich/smpd, where this logging is
> controlled by environment variables for smpd; but now I'm using
> mvapich/hydra/slurm .
>
> I'm suspicious that mvapich2 1.5rc2 might not be handling 'crossover'
> correctly; i.e. the case where 2 nodes decide to connect to each other
> nearly simultaneously, and mvapich wants to end up with 1 queue-pair
> rather than 2. Depending on details of nodes and network, I think there's
> a case where a node an attempt an 'rdma_connect' and receive back a
> 'reject' response before it gets any other communication from its proposed
> peer, and I'm not sure what the node then does with the data that needs to
> be sent to the peer.
>
> I suppose what would help me most (short of a magic solution to my
> problem) would be a guide as to how to turn on the various diagnostic
> traces/logs within mvapich, and where the trace output could be found.
>
> T J (Chris) Ward, IBM Research
> Scalable Data-Centric Computing - Active Storage Fabrics - IBM System
> BlueGene
> IBM United Kingdom Ltd., Hursley Park, Winchester, Hants, SO21 2JN
> 011-44-1962-818679
> IBM Intranet http://hurgsa.ibm.com/~tjcw/
>
> IBM System BlueGene Research
> IBM System BlueGene Marketing
>
> IBM Resources for Global Servants
> IBM Branded Products IBM Branded Swag
>
>
> UNIX in the Cloud - Find A Place Where There's Room To Grow, with the
> original Open Standard. Free Trial Here Today
> New Lamps For Old - Diskless Remote Boot Linux from National Center for
> High-Performance Computing, Taiwan
>



More information about the mvapich-discuss mailing list