[mvapich-discuss] non-uniform IB connectivity

Dhabaleswar Panda panda at cse.ohio-state.edu
Fri Nov 6 08:46:36 EST 2009


Hi Bron,

Thanks for your note.

> I'm new to this list, so sorry if this has been discussed before.
> My site has two different clusters, each of which has dual
> fabrics of connectivity, i.e. the IB cards have 2 ports, and
> each cluster has 2 separate fabrics: ib0 (connected to the first
> port), and ib1 (connected to the second port), connecting the
> nodes within each cluster.
>
> We are now connecting the two clusters together, but for various
> reasons, the inter-cluster connections are only available over the
> ib1 fabric from each of the sub-clusters.
>
> I have been able to get mvapich2 1.4 to run across the two
> clusters, but only by disabling the first port on the IB cards,
> so that all the processes uniformly use only the ib1 fabrics.

Good. MVAPICH2 has support for multi-rail. You can use both ports.
However, you are indicating that inter-cluster connectivity is not there
for both clusters on your fabric using both ports. In this case, you
can not use the multi-rail feature.

> Is there some way to launch a single job across both clusters,
> and have mvapich use both fabrics for communications within
> a sub-cluster, but only a single fabric for communications
> across the sub-clusters?  It looks to me as if when the
> ibv_query_port function returns IBV_PORT_ACTIVE that
> the mvapich software assumes it can use that port to reach
> the target process.  That is not true in my case, or rather it
> is only selectively true, depending on which target process
> we are talking about, on a case by case basis.

It is not as easy as you are indicating. All underlying MPI protocols
(pt-to-pt, collectives, etc.) need to be modified to take care of such
assymetric connections. I think what you are doing (using ib1 port only)
sounds good to me. If you can get your inter-cluster network to work on
both ports, you can use the multi-rail features of MVAPICH2.

Thanks,

DK

> Any thoughts/advice welcome .. advTHANKSance !
>
> --
> Bron Campbell Nelson    bron.c.nelson at nasa.gov
>
> With the first link, the chain is forged. The first speech censored,
> the first thought forbidden, the first freedom denied, chains us all
> irrevocably... The first time any man's freedom is trodden on,
> we're all damaged.
>      Captain Jean-Luc Picard,  Star Trek TNG, "The Drumhead"
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list