[mvapich-discuss] MVAPICH1 Latency Tuning

Thu Mar 18 01:08:43 EDT 2010

Hi Hideyuki,

Thanks for your reply. Based on the detailed information you have
provided, it might be best for you to contact Voltaire to get the right
set of tuning parameters for you cluster and applications. Different MPI
stacks have different techniques for obtaining the best performance and
scalability. Since you are not using MVAPICH, it is hard for us to provide
any help here.

As Sayantan indicated earlier, some of the configuration set up parameters
do not look good. You might be running your IB cluster with an older stack
including the MPI. This might be leading to some of the poor performance
and memory scalability you are observing.

If you plan to upgrade your cluster with the latest OpenFabrics stack and
with the latest version of MVAPICH/MVAPICH2, we will be happy to extend
help.

Best Regards,

DK

On Mon, 15 Mar 2010, Hideyuki Jitsumoto wrote:

> Hi Sayantan,
>
> First I have to apologize that we use the Voltaire MPI which is
> extended form MVICH, not MVAPICH1. I was under the impression that
> MVAPICH1 and MVICH has same technique for IB management.
> If that is misunderstanding, I'm very sorry about it.
>
> Next, I have not contact the application user yet.
> But I can interviewed some co-worker of him.
>
> 1) Do you know how many concurrent outstanding sends your application
> has during its iteration steps?
> The answer is 1.
> The application has GPU kernel function and MPI communication like follows:
>
> GPU_kernel_func1();
> MPI_isend(); MPI_irecv();
> MPI_wait();
>
> GPU_kernel_func2();
> MPI_isend(); MPI_irecv();
> MPI_wait();
>
> .....
>
> 2) Can you share a code snippet of the inner loop of your application?
> We are very sorry. This application is difficult to share for its license.
> In addition, I heard , incredibly, the inner loop has 10000 line ...!!!!
>
> We profiled the time of each GPU kernel function and MPI communication.
> As the result, we confirm the some GPU kernel function is faster than
> MPI communication. We also confirm the MPI message size is <128KB.
> Finally we come to consider to reduce the latency of MPI communications.
> ( Then I change the rendezvous threshold for not to use the rendezvous
> protocol. )
>
> Moreover, to our regret, there is dependency between MPI
> communications and  GPU_kernel_func after them.
>
> 3) Which InfiniBand adapters do you have on your cluster?
> Voltaire HCA 410Ex (SDR)
>
> 4) What is the node type, number of cores?
> The node is Sun Fire X4500 Server (http://www.sun.com/servers/x64/x4500/).
> It has 16 CPU cores, but we use only 2 process for GPU management.
> the 1 node has a 2 GPU board.
>
> On Fri, Mar 12, 2010 at 12:58 AM, Sayantan Sur <surs at cse.ohio-state.edu> wrote:
> > Hi Hideyuki,
> >
> > On Thu, Mar 11, 2010 at 10:25 AM, Hideyuki Jitsumoto
> > <jitumoto at gsic.titech.ac.jp> wrote:
> >> Sayantan,
> >>
> >> Thank you for your reply.
> >> I'm sorry about unclear information about the application,
> >> because this application is our supercomputer user's.
> >>
> >> Then I need to ask the user the detail of this application.
> >> Probably I'll send you a correct information on Monday.
> >>
> >
> > Looking forward to hearing from you after you get more information
> > from the user.
> >
> >> I'm in flight 10 hours from now, then I'm sorry that I cannot send any
> >> information.
> >>
> >> # I wonder I saw you on a taxi toward MPI Forum ?
> >
> > Yes! I thought the name sounded familiar :-) It was good meeting you
> > at the forum. Hope to meet you at future forums or on the sidelines of
> > ICS in Japan later this year.
> >
> > Thanks.
> >
> >>
> >> Thank you,
> >> Hideyuki
> >>
> >> On Wed, Mar 10, 2010 at 8:18 PM, Sayantan Sur <surs at cse.ohio-state.edu> wrote:
> >>> Hi Hideyuki,
> >>>
> >>> On Wed, Mar 10, 2010 at 2:34 PM, Hideyuki Jitsumoto
> >>> <jitumoto at gsic.titech.ac.jp> wrote:
> >>>> Hello,
> >>>>
> >>>> I want to reduce the latency and memory use with MVAPICH1.
> >>>> My MPI application has:
> >>>> - < 128KB message size
> >>>> - 4 peer connection for one process
> >>>>
> >>>> I make the temporary paramfile as attached file.
> >>>> Please tell me other tuning point or wrong parameter on this file.
> >>>
> >>> Thanks for your question. It is an important question. We would be
> >>> happy to assist you with tuning MVAPICH for your application.
> >>>
> >>> I took a quick look at the param file. Based on what I see, I think
> >>> some parameters relating to rendezvous protocol are not optimal at
> >>> all. We need to understand your application a little bit more to be
> >>> able to help you better. So, a few questions:
> >>>
> >>> 1) Do you know how many concurrent outstanding sends your application
> >>> has during its iteration steps?
> >>> 2) Can you share a code snippet of the inner loop of your application?
> >>> 3) Which InfiniBand adapters do you have on your cluster?
> >>> 4) What is the node type, number of cores?
> >>>
> >>> This will help us to send you a better send of parameters in the next few days.
> >>>
> >>>>
> >>>> Thank you,
> >>>> Hideyuki
> >>>> --
> >>>> Sincerely Yours,
> >>>> Hideyuki Jitsumoto (jitumoto at gsic.titech.ac.jp)
> >>>> Tokyo Institute of Technology
> >>>> Global Scientific Information and Computing center (Matsuoka Lab.)
> >>>>
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sayantan Sur
> >>>
> >>> Research Scientist
> >>> Department of Computer Science
> >>> The Ohio State University.
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely Yours,
> >> Hideyuki Jitsumoto (jitumoto at gsic.titech.ac.jp)
> >> Tokyo Institute of Technology
> >> Global Scientific Information and Computing center (Matsuoka Lab.)
> >>
> >>
> >
> >
> >
> > --
> > Sayantan Sur
> >
> > Research Scientist
> > Department of Computer Science
> > The Ohio State University.
> >
>
>
>
> --
> Sincerely Yours,
> Hideyuki Jitsumoto (jitumoto at gsic.titech.ac.jp)
> Tokyo Institute of Technology
> Global Scientific Information and Computing center (Matsuoka Lab.)
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>