[mvapich-discuss] poll_or_block_event

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Mar 14 12:34:11 EDT 2011


Jacob:
Thanks for your note.  Let's try to narrow down this issue a bit further.

What version of mvapich or mvapich2 are you using?  Does this problem
only happen with this application?  Are you able to get a back trace
from the process when this happens?

You may want to also considering upgrading to mvapich2-1.6 if you're
not already using this.  With this version, you're able to use the
mpiexec (hydra) that ships with mvapich2 itself.  This can help
isolate the issue as well.  For more information on this please see
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-250005.2.2

On Mon, Mar 14, 2011 at 11:18 AM, Jacob Harvey <jaharvey at chem.umass.edu> wrote:
> MVAPICH users,
>
> I'm running into a problem on our cluster that I don't really know
> much about. Basically what happens is the when you submit a
> calculation the job runs for some time and then randomly it appears to
> stop running (ie. no more output is sent back from the executable). At
> that point if you ssh to the node that was running the calculation
> youw ill find that the executable is no longer running (not
> surprisingly). Upon killing the job I get a whole bunch of the
> following errors in the standard error file:
>
> mpiexec: Warning: poll_or_block_event: evt 58 task 24 on node001:
> remote system error.
>
> We are using the OSC mpiexec to launch the jobs from with PBS. I've
> looked around but haven't been able to find much related to this
> error. If anyone could provide any assistance it would be very much
> appreciated. I thank you in advance.
>
> Jacob
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list