[mvapich-discuss] Re: mvapich job startup unreliable with slurm and --cpu_bind

Dhabaleswar Panda panda at cse.ohio-state.edu
Tue Aug 1 12:29:46 EDT 2006


Hi Mike, 

No problem. Thanks for the patch. We will take a look at it and apply 
the patch to the trunk. 

Best Regards, 

DK

> Sorry for the mixup, were not trying to annouce anything.
> Before we did this we would get the weird mpi_init behavior, seems
> like a deadlock of some sorts, no troubles with the right left exchange,
> less pressure than everyone slamming on rank0.
> --Mike
> 
> 
> On Sun, 2006-07-30 at 23:38 -0400, Dhabaleswar Panda wrote:
> > Hi Greg and Mike, 
> > 
> > Many thanks for sending us the patch related to Slurm and --cpu_bind
> > on July 26th.
> > 
> > You had sent this note to mvapich at cse. Since `mvapich at cse' is an
> > announcement list only, it got blocked and I just noticed your posting
> > now.
> > 
> > I am forwarding this note to mvapich-discuss at cse.ohio-state.edu.
> > 
> > As you might have noticed, we just made the release of mvapich 0.9.8.
> > We will review your patch and incorporate it to the trunk and
> > 0.9.8-branch soon.
> > 
> > May I request to post your future patches to
> > mvapich-discuss at cse.ohio-state.edu.  Best Regards,
> > 
> > DK
> > 
> > ----------------------------------------------------------------
> > 
> > The following patch seems to fix a problem starting mvapich jobs with
> > slurm and the --cpu_bind option.  Under these conditions, some of the
> > MPI processes do not make it out of MPI_Init() and the job hangs on
> > launch.  We think that this is because with slurm and --cpu_bind the
> > startup is more synchronized.
> > 
> > Thanks,
> > 
> > Greg Johnson & Mike Lang
> > 
> > diff -ur mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c mvapich-0.9.8-rc0/src/context/comm_rdma_init.c
> > --- mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c 2006-07-11 16:49:44.000000000 -0600
> > +++ mvapich-0.9.8-rc0/src/context/comm_rdma_init.c      2006-07-11 15:35:46.000000000 -0600
> > @@ -162,6 +162,7 @@
> >  {
> >  #ifndef CH_GEN2_MRAIL
> >      int i = 0;
> > +    int right, left;
> >      struct Coll_Addr_Exch send_pkt;
> >      struct Coll_Addr_Exch *recv_pkt;
> > 
> > @@ -188,19 +189,17 @@
> >  #else
> >      send_pkt.buf_hndl = comm->collbuf->l_coll->buf_hndl;
> >  #endif
> > -
> > -    for(i = 0; i < comm->np; i++) {
> > -        /* Don't send to myself */
> > -        if(i == comm->local_rank) continue;
> > -
> > +    right=(comm->local_rank + 1)%comm->np;
> > +    left=(comm->local_rank + comm->np - 1)%comm->np;
> > +    for(i=0; i < comm->np-1; i++) {
> >          MPI_Sendrecv((void*)&send_pkt, sizeof(struct Coll_Addr_Exch),
> > -                MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
> > -                (void*)&(recv_pkt[i]),sizeof(struct Coll_Addr_Exch),
> > -                MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
> > +                MPI_BYTE, comm->lrank_to_grank[right], ADDR_EXCHANGE_TAG,
> > +                (void*)&(recv_pkt[left]),sizeof(struct Coll_Addr_Exch),
> > +                MPI_BYTE, comm->lrank_to_grank[left], ADDR_EXCHANGE_TAG,
> >                  MPI_COMM_WORLD, &(statarray[i]));
> > -        if (statarray[i].MPI_ERROR != MPI_SUCCESS) {
> > -                fprintf(stderr, "blah! %d %d\n", comm->local_rank, statarray[i].MPI_ERROR);
> > -        }
> > +
> > +       right = (right+1)%comm->np;
> > +       left = (left + comm->np - 1)%comm->np;
> >      }
> > 
> >      for(i = 0; i < comm->np; i++) {
> 



More information about the mvapich-discuss mailing list