[mvapich-discuss] Re: mvapich job startup unreliable with slurm and
--cpu_bind
Dhabaleswar Panda
panda at cse.ohio-state.edu
Tue Aug 1 12:29:46 EDT 2006
Hi Mike,
No problem. Thanks for the patch. We will take a look at it and apply
the patch to the trunk.
Best Regards,
DK
> Sorry for the mixup, were not trying to annouce anything.
> Before we did this we would get the weird mpi_init behavior, seems
> like a deadlock of some sorts, no troubles with the right left exchange,
> less pressure than everyone slamming on rank0.
> --Mike
>
>
> On Sun, 2006-07-30 at 23:38 -0400, Dhabaleswar Panda wrote:
> > Hi Greg and Mike,
> >
> > Many thanks for sending us the patch related to Slurm and --cpu_bind
> > on July 26th.
> >
> > You had sent this note to mvapich at cse. Since `mvapich at cse' is an
> > announcement list only, it got blocked and I just noticed your posting
> > now.
> >
> > I am forwarding this note to mvapich-discuss at cse.ohio-state.edu.
> >
> > As you might have noticed, we just made the release of mvapich 0.9.8.
> > We will review your patch and incorporate it to the trunk and
> > 0.9.8-branch soon.
> >
> > May I request to post your future patches to
> > mvapich-discuss at cse.ohio-state.edu. Best Regards,
> >
> > DK
> >
> > ----------------------------------------------------------------
> >
> > The following patch seems to fix a problem starting mvapich jobs with
> > slurm and the --cpu_bind option. Under these conditions, some of the
> > MPI processes do not make it out of MPI_Init() and the job hangs on
> > launch. We think that this is because with slurm and --cpu_bind the
> > startup is more synchronized.
> >
> > Thanks,
> >
> > Greg Johnson & Mike Lang
> >
> > diff -ur mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c mvapich-0.9.8-rc0/src/context/comm_rdma_init.c
> > --- mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c 2006-07-11 16:49:44.000000000 -0600
> > +++ mvapich-0.9.8-rc0/src/context/comm_rdma_init.c 2006-07-11 15:35:46.000000000 -0600
> > @@ -162,6 +162,7 @@
> > {
> > #ifndef CH_GEN2_MRAIL
> > int i = 0;
> > + int right, left;
> > struct Coll_Addr_Exch send_pkt;
> > struct Coll_Addr_Exch *recv_pkt;
> >
> > @@ -188,19 +189,17 @@
> > #else
> > send_pkt.buf_hndl = comm->collbuf->l_coll->buf_hndl;
> > #endif
> > -
> > - for(i = 0; i < comm->np; i++) {
> > - /* Don't send to myself */
> > - if(i == comm->local_rank) continue;
> > -
> > + right=(comm->local_rank + 1)%comm->np;
> > + left=(comm->local_rank + comm->np - 1)%comm->np;
> > + for(i=0; i < comm->np-1; i++) {
> > MPI_Sendrecv((void*)&send_pkt, sizeof(struct Coll_Addr_Exch),
> > - MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
> > - (void*)&(recv_pkt[i]),sizeof(struct Coll_Addr_Exch),
> > - MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
> > + MPI_BYTE, comm->lrank_to_grank[right], ADDR_EXCHANGE_TAG,
> > + (void*)&(recv_pkt[left]),sizeof(struct Coll_Addr_Exch),
> > + MPI_BYTE, comm->lrank_to_grank[left], ADDR_EXCHANGE_TAG,
> > MPI_COMM_WORLD, &(statarray[i]));
> > - if (statarray[i].MPI_ERROR != MPI_SUCCESS) {
> > - fprintf(stderr, "blah! %d %d\n", comm->local_rank, statarray[i].MPI_ERROR);
> > - }
> > +
> > + right = (right+1)%comm->np;
> > + left = (left + comm->np - 1)%comm->np;
> > }
> >
> > for(i = 0; i < comm->np; i++) {
>
More information about the mvapich-discuss
mailing list