[mvapich-discuss] MPI_Bcast

Mayhall, Anthony J. (MSFC-ES53)[TBE-1] anthony.j.mayhall at nasa.gov
Wed Nov 18 13:47:19 EST 2009


We have only one executable on each node that is executed by  
mpirun_rsh.  The root broadcasts to only one executable on each of the  
other nodes.  We have other executables that communicate with those  
via shared memory, but are not run using mpirun_rsh.  The numbers we  
are seeing are not in comparison to other mvapich builds.  We need a  
faster method of doing the broadcast.  It looks like the mcast option  
may work better for our app.  We will try it and see.

Thanks,


Anthony Mayhall
256-684-1094

On Nov 18, 2009, at 12:35 PM, "Krishna Chaitanya Kandalla" <kandalla at cse.ohio-state.edu 
 > wrote:

> Anthony,
>        Previously, we had explored the IB multicast option to  
> implement a few of the collectives in MVAPICH. However, we found  
> that it was not a scalable option. In MVAPICH2, we implement all  
> collective operations using either point-to-point or shared-memory  
> based algorithms.
>        Getting back to your question, for the message sizes that you  
> have mentioned (40K), we are currently using a shared-memory based  
> algorithm to implement the MPI_Bcast operation. Which earlier  
> version of MVAPICH/MVAPICH2 are you comparing these results with and  
> what is the performance difference that you are observing? Also, you  
> mentioned having to broadcast to one executable on each node, does  
> your job involve running different executables on each node? Or do  
> you mean you are having just one process running on each node?
>
> Thanks,
> Krishna
>
>
>
>
> Mayhall, Anthony J. (MSFC-ES53)[TBE-1] wrote:
>> How is MPI_Bcast implemented in mvapich 1.4?  Can it use IB  
>> multicast?  If so, how do you turn that on?  It is currently taking  
>> our apps a lot longer to broadcast using MPI_Bcast to 10 nodes  
>> (175us for 40K) vs. 2 (46us for 40K) nodes.  We are only  
>> broadcasting to one executable on each node. A multicast should  
>> take the same amount of time regardless of number of nodes wouldn't  
>> it?  I used the multicast_test.c code to test IB multicast and it  
>> seems to not matter much how many nodes are being used as far as  
>> the timing goes.  Can the Bcast buffer be broken into multiple 2048  
>> byte transfers in MPI_Bcast to use IB multicast?  I thought I saw  
>> in a white paper that these methods were being taken advantage of  
>> in mvapich.
>>
>> Or are we just doing something wrong?
>>
>> Thanks,
>>
>> Anthony Mayhall
>> Davidson Technologies, Inc.
>> (256)544-7620
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>



More information about the mvapich-discuss mailing list