[mvapich-discuss] non-uniform IB connectivity

Nelson, Bron C. (ARC-TNE)[SILICON GRAPHICS] bron.c.nelson at nasa.gov
Wed Nov 4 17:07:36 EST 2009


I'm new to this list, so sorry if this has been discussed before.
My site has two different clusters, each of which has dual
fabrics of connectivity, i.e. the IB cards have 2 ports, and
each cluster has 2 separate fabrics: ib0 (connected to the first
port), and ib1 (connected to the second port), connecting the
nodes within each cluster.

We are now connecting the two clusters together, but for various
reasons, the inter-cluster connections are only available over the
ib1 fabric from each of the sub-clusters.

I have been able to get mvapich2 1.4 to run across the two
clusters, but only by disabling the first port on the IB cards,
so that all the processes uniformly use only the ib1 fabrics.

Is there some way to launch a single job across both clusters,
and have mvapich use both fabrics for communications within
a sub-cluster, but only a single fabric for communications
across the sub-clusters?  It looks to me as if when the
ibv_query_port function returns IBV_PORT_ACTIVE that
the mvapich software assumes it can use that port to reach
the target process.  That is not true in my case, or rather it
is only selectively true, depending on which target process
we are talking about, on a case by case basis.

Any thoughts/advice welcome .. advTHANKSance !

--
Bron Campbell Nelson    bron.c.nelson at nasa.gov

With the first link, the chain is forged. The first speech censored,
the first thought forbidden, the first freedom denied, chains us all
irrevocably... The first time any man's freedom is trodden on,
we're all damaged.
     Captain Jean-Luc Picard,  Star Trek TNG, "The Drumhead"



More information about the mvapich-discuss mailing list