[mvapich-discuss] Dual port HCA back-to-back woes

Abhinav Vishnu vishnu at cse.ohio-state.edu
Wed Apr 18 14:32:47 EDT 2007


Dr Molde,

> Hi,
> 
> Thanks for a very nice program.

Many thanks for your appreciation.

> 
> We use mvapich version 0.9.5 on 24-node cluster (2 single-core Opteron + 
> 1 Mellanox infiniband adapters per node and 1 Mellanox switch) during 2 
> years without any problem.
> 
> But now I'm trying to build new cluster based on nodes with two 
> dual-core Opterons. I tried to build cluster without IB switch with ring 
> topology of IB network (using port to port connection of HCA adapters). 
> I understand that in this case I can use only 2 neighbor node to start 
> one program. The mvapich version 0.9.9-beta2 is the only program which 
> can realize this topology (thanks author to allow users explicitly 
> specify adapter and port). But unfortunately for our program (molecular 
> dynamics simulation) 1 port is sufficient for 2 processes per node (our 
> old single-core Opteron cluster) and is not enough for 4 processes per 
> node (new cluster). In this case we need to buy 2 IB switch to use 
> two-port simultaneously.
> 
> Is it possible to connect 2 node twice by port to port manner? I read a 
> discussion in this list and do not understand this. I do not see any 
> reason why this configuration would not work. But  I tried to use both 
> single-rail gen2 configuration and multi-rail gen 2 configuration 
> without success. May be I need manually specify LID routing table. Does 
> anybody have some idea about this?
> 

Thanks for a detailed description of the configuration and the problem.
For two nodes, configuration mentioned above should work. However, i
have a question with respect to the method of connecting the nodes. Are
you connecting the first port of first node to the second port of the
second node and first port of second node to the second port of first
node (like a crossover configuration).  In this case, MVAPICH may not
work.

Other configuration of connecting the nodes is a straight connection of
the nodes (first port of first node to first port of second node and so
on). This should definitely work, and we have tested this configuration
in our lab.

We are looking forward to your response and helping with the problem.

Thanks,

:- Abhinav

> Sincerely
> Dmitry E. Nolde
> Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry,
> Moscow Russia
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


More information about the mvapich-discuss mailing list