[mvapich-discuss] MPI_Bcast

Wed Nov 18 15:49:08 EST 2009

Anthony - Since you are using additional executables which are also
communicating via shared memory, I am wondering whether the performance is
getting degraded by concurrent communication over shared memory. As
Krishna indicated, the latest version is using shared-memory broadcast.
How many cores are per node on your system? Do you see this bahavior for
less than 10 nodes?

Please note that in MVAPICH2 1.4, we have introduced multiple runtime
parameters to select the broadcast scheme for a given environment - pure
pt-to-pt-based scheme vs. shared memory-based scheme. Details are
available at the following location in MVAPICH2 user guide. You can
try some of these options and let us know whether the problem gets
resolved on your set-up.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4.html#x1-8600010.8

Thanks,

DK

On Wed, 18 Nov 2009, Mayhall, Anthony J. (MSFC-ES53)[TBE-1] wrote:

> We have only one executable on each node that is executed by
> mpirun_rsh.  The root broadcasts to only one executable on each of the
> other nodes.  We have other executables that communicate with those
> via shared memory, but are not run using mpirun_rsh.  The numbers we
> are seeing are not in comparison to other mvapich builds.  We need a
> faster method of doing the broadcast.  It looks like the mcast option
> may work better for our app.  We will try it and see.
>
> Thanks,
>
>
> Anthony Mayhall
> 256-684-1094
>
> On Nov 18, 2009, at 12:35 PM, "Krishna Chaitanya Kandalla" <kandalla at cse.ohio-state.edu
>  > wrote:
>
> > Anthony,
> >        Previously, we had explored the IB multicast option to
> > implement a few of the collectives in MVAPICH. However, we found
> > that it was not a scalable option. In MVAPICH2, we implement all
> > collective operations using either point-to-point or shared-memory
> > based algorithms.
> >        Getting back to your question, for the message sizes that you
> > have mentioned (40K), we are currently using a shared-memory based
> > algorithm to implement the MPI_Bcast operation. Which earlier
> > version of MVAPICH/MVAPICH2 are you comparing these results with and
> > what is the performance difference that you are observing? Also, you
> > mentioned having to broadcast to one executable on each node, does
> > your job involve running different executables on each node? Or do
> > you mean you are having just one process running on each node?
> >
> > Thanks,
> > Krishna
> >
> >
> >
> >
> > Mayhall, Anthony J. (MSFC-ES53)[TBE-1] wrote:
> >> How is MPI_Bcast implemented in mvapich 1.4?  Can it use IB
> >> multicast?  If so, how do you turn that on?  It is currently taking
> >> our apps a lot longer to broadcast using MPI_Bcast to 10 nodes
> >> (175us for 40K) vs. 2 (46us for 40K) nodes.  We are only
> >> broadcasting to one executable on each node. A multicast should
> >> take the same amount of time regardless of number of nodes wouldn't
> >> it?  I used the multicast_test.c code to test IB multicast and it
> >> seems to not matter much how many nodes are being used as far as
> >> the timing goes.  Can the Bcast buffer be broken into multiple 2048
> >> byte transfers in MPI_Bcast to use IB multicast?  I thought I saw
> >> in a white paper that these methods were being taken advantage of
> >> in mvapich.
> >>
> >> Or are we just doing something wrong?
> >>
> >> Thanks,
> >>
> >> Anthony Mayhall
> >> Davidson Technologies, Inc.
> >> (256)544-7620
> >>
> >>
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >>
> >>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>