[mvapich-discuss] mvapich job startup unreliable with slurm and
--cpu_bind (patch)
Dhabaleswar Panda
panda at cse.ohio-state.edu
Sun Jul 30 23:38:21 EDT 2006
Hi Greg and Mike,
Many thanks for sending us the patch related to Slurm and --cpu_bind
on July 26th.
You had sent this note to mvapich at cse. Since `mvapich at cse' is an
announcement list only, it got blocked and I just noticed your posting
now.
I am forwarding this note to mvapich-discuss at cse.ohio-state.edu.
As you might have noticed, we just made the release of mvapich 0.9.8.
We will review your patch and incorporate it to the trunk and
0.9.8-branch soon.
May I request to post your future patches to
mvapich-discuss at cse.ohio-state.edu. Best Regards,
DK
----------------------------------------------------------------
The following patch seems to fix a problem starting mvapich jobs with
slurm and the --cpu_bind option. Under these conditions, some of the
MPI processes do not make it out of MPI_Init() and the job hangs on
launch. We think that this is because with slurm and --cpu_bind the
startup is more synchronized.
Thanks,
Greg Johnson & Mike Lang
diff -ur mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c mvapich-0.9.8-rc0/src/context/comm_rdma_init.c
--- mvapich-0.9.8-rc0.orig/src/context/comm_rdma_init.c 2006-07-11 16:49:44.000000000 -0600
+++ mvapich-0.9.8-rc0/src/context/comm_rdma_init.c 2006-07-11 15:35:46.000000000 -0600
@@ -162,6 +162,7 @@
{
#ifndef CH_GEN2_MRAIL
int i = 0;
+ int right, left;
struct Coll_Addr_Exch send_pkt;
struct Coll_Addr_Exch *recv_pkt;
@@ -188,19 +189,17 @@
#else
send_pkt.buf_hndl = comm->collbuf->l_coll->buf_hndl;
#endif
-
- for(i = 0; i < comm->np; i++) {
- /* Don't send to myself */
- if(i == comm->local_rank) continue;
-
+ right=(comm->local_rank + 1)%comm->np;
+ left=(comm->local_rank + comm->np - 1)%comm->np;
+ for(i=0; i < comm->np-1; i++) {
MPI_Sendrecv((void*)&send_pkt, sizeof(struct Coll_Addr_Exch),
- MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
- (void*)&(recv_pkt[i]),sizeof(struct Coll_Addr_Exch),
- MPI_BYTE, comm->lrank_to_grank[i], ADDR_EXCHANGE_TAG,
+ MPI_BYTE, comm->lrank_to_grank[right], ADDR_EXCHANGE_TAG,
+ (void*)&(recv_pkt[left]),sizeof(struct Coll_Addr_Exch),
+ MPI_BYTE, comm->lrank_to_grank[left], ADDR_EXCHANGE_TAG,
MPI_COMM_WORLD, &(statarray[i]));
- if (statarray[i].MPI_ERROR != MPI_SUCCESS) {
- fprintf(stderr, "blah! %d %d\n", comm->local_rank, statarray[i].MPI_ERROR);
- }
+
+ right = (right+1)%comm->np;
+ left = (left + comm->np - 1)%comm->np;
}
for(i = 0; i < comm->np; i++) {
More information about the mvapich-discuss
mailing list