[mvapich-discuss] On "Got Completion" and IBV_EVENT Errors

Joshua Bernstein jbernstein at penguincomputing.com
Thu Jan 31 18:55:12 EST 2008


Thank you for your response Matthew,

Matthew Koop wrote:
> Joshua,
> 
> So are you able to run `ibv_rc_pingpong' with a variety of message sizes?
> You may want to double-check that the cables between machines are well
> connected as well.

ibv_rc_pingpong seems to work correctly:

[root at flatline ~]# ibv_rc_pingpong -i 2
   local address:  LID 0x0006, QPN 0x050016, PSN 0x55eeb7
   remote address: LID 0x0004, QPN 0x100406, PSN 0x07ccc8
8192000 bytes in 0.04 seconds = 1669.28 Mbit/sec
1000 iters in 0.04 seconds = 39.26 usec/iter

As a side note, it would be nice if there was some description about 
what all the ibv_* commands do. For example there is also 
ibv_srq_pingpong and ibv_uc_pingpong. If there is some documentation 
about this some place that I missed, I apologize.

> With the earlier request you cited, the issue didn't occur for simple
> microbenchmarks, only with an application. We have previously seen issues
> when fork or system calls are used in applications (due to
> incompatibilities with the underlying OpenFabrics drivers).

I'm not quite sure I understand the implications of this. Can you 
elaborate? I see the same behavior with the supplied osu_* codes as well.

I should have mentioned this earlier, but we are attempting to move over 
a pmgr_client plugin from the vapi transport to the ch_gen2 transport 
that uses bproc (Scyld) for job startup instead of RSH. In this code we 
do a fork. Some I'd be interested to read your elaboration on this.

Eventually, we (Penguin Computing) hope to be able to contribute this 
enhancement up stream.

> It seems that your issue is more likely to be a setup issue. What does
> ulimit -l report on your compute nodes? 

It is set to half the available memory on the system, as stated in the 
MVAPICH docs.

> Also, it is unlikely that VIADEV_USE_SHMEM_COLL is causing any issue -- turning off this option
> means there is less communication in the init phase (which allows you to
> get to the stdout statements).

no, no, I agree. In fact, my point was using name envar I was able to 
get the application to run a bit further.

After a bit of playing around, I've gotten the code to run a bit farther 
and now when the cpi program does a MPI_Bcast, I get a hang, and the my 
old friend: Got completion with error IBV_WC_RETRY_EXC_ERR,

*Both* processes threads call MPI_Bcast, but only *one* of them sees a
return from MPI_Bcast (n==100) and subsequently calls MPI_Reduce.

-Joshua Bernstein
Penguin Computing
Software Engineer


More information about the mvapich-discuss mailing list