[mvapich-discuss] Master-slave configuration with MVAPICH2 - idb

Tony Ladd tladd at che.ufl.edu
Fri Jul 22 23:00:46 EDT 2011


Hi Jonathan

Thanks for the info. However I think I am using gigabit and ipoib 
together - otherwise I could not get the 1.8GB/s bandwidth. It may be 
that the bandwidth of IPoIB is sufficient for my application - I will 
need to think about that.

It seems to me that what I am trying to do is not so unreasonable. The 
master holds a fair bit of data (of the order of 10-20GB in my case) 
which it occasionally distributes to the slaves (which have 12GB of RAM 
each). The slaves talk to each other on a much more frequent basis, and 
now and then return results to the master. It is possible to program 
this entirely in data parallel mode but that is a less attractive option.

Is there a technical reason why mvapich2 cannot support multiple 
interface types? Is it that only one communication device (like rdma or 
ch3) can be supported within a single mpi job?

But thanks for the reply - it was v. informative.

Tony

On 07/22/2011 10:31 PM, Jonathan Perkins wrote:
> Hello Tony.  Unfortunately, you will not be able to use mvapich2 in
> the way that you intend.  If you want to include your login node in
> the mpi communication with the compute nodes you will only be able to
> use the gigabit network.  If you do not include the login node you may
> choose to use one of the following: gigabit, ipoib, or native
> infiniband depending on how you configure mvapich2 and which hostfiles
> you provide.
>
> In order to use the gigabit network, you will configure using the
> --with-device=ch3:sock option and specify the gigabit interface in
> your hostfiles.
>
> In order to use ipoib, you will again configure using the
> --with-device=ch3:sock but specify the ipoib interface in your
> hostfiles.
>
> In order to use infiniband, you will configure using
> --with-device=ch3:mrail (the default).  In this case you do not need
> to specify a particular interface in the hostfile.
>
> I will also like to mention that the --with-rdma option is only valid
> if you use the ch3:mrail device.  Here this option is used to select
> between gen2 and udapl APIs of communicating over infiniband.  The
> --with-rdma option is ignored when using the ch3:sock device.
>
> Please let me know if this answers any questions about the issue
> you're facing and let me know if there is anything more that you'd
> like to know.
>
> On Fri, Jul 22, 2011 at 7:00 PM, Tony Ladd<tladd at che.ufl.edu>  wrote:
>    
>> I have an application with a master-slave setup. The master node is the
>> login server and has Gigabit ethernet connections to the slaves. The slaves
>> also have an infiniband fabric connecting them to each other but not to the
>> master.
>>
>> I have tested the infiniband interconnect between 2 slaves using the default
>> configuration of mvapich_1.6 and the performance is as expected - a
>> bandwidth of 5.5GB/s on a SendRecv (it is 4xQDR connection). The problems
>> are after configuring MVAPICH2 with the gigabit interconnect as well. I made
>> a new version of mvapich with gen2 and ch3:sock - the output from mpiname -a
>> for this version is:
>>
>> ---------------------------------------------------------------------
>> MVAPICH2 1.6rc1 2010-11-12 ch3:sock
>>
>> Compilation
>> CC: gcc  -DNDEBUG -O2
>> CXX: g++  -DNDEBUG -O2
>> F77: g77  -DNDEBUG -O2
>> F90: gfortran  -DNDEBUG -O2
>>
>> Configuration
>> --prefix=/global/lib/checs/mvapich2-gnu --with-rdma=gen2
>> --with-device=ch3:sock
>> ----------------------------------------------------------------------
>>
>> When I run the communication test I get only about 1.8GB/s. I suspect it is
>> using the IPOIB protocol. If so how can I force it to use the rdma protocol.
>> The run setup is as follows:
>>
>> mpdboot --totalnum=2 --file=hosts
>> mpiexec -machinefile procs -n 3 prog2
>>
>> hosts
>> f1
>> f2
>>
>> procs
>> f1:2 ifhn=f1g
>> f2:1 ifhn=f2g
>>
>> I used mpiexec, since with mpirun_rsh I was not able to get it to use the IB
>> connection at all - it always used ethernet even if the hostnames pointed to
>> the ib0 interface: for example "mpirun_rsh -np 3 f1g f1g f2g prog2" would
>> get about 0.2GB/s.
>>
>> In this case I am using one of the IB nodes as the master but in general I
>> want to use a login node that only has GigE. Here f1g and f2g point to ib0
>> while f1 and f2 point to eth0.
>>
>> Can someone help me out here. Does it look like a run time problem (ie a
>> switch to tell it to use ibverbs) or is it in the build of MVAPICH2
>>
>>
>> Thanks
>>
>> Tony
>>
>> --
>> Tony Ladd
>>
>> Chemical Engineering Department
>> University of Florida
>> Gainesville, Florida 32611-6005
>> USA
>>
>> Email: tladd-"(AT)"-che.ufl.edu
>> Webhttp://ladd.che.ufl.edu
>>
>> Tel:   (352)-392-6509
>> FAX:   (352)-392-9514
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>      
>
>
>    


More information about the mvapich-discuss mailing list