[mvapich-discuss] mvapich2 jobs stall after successful completion

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Jan 10 16:30:16 EST 2011


Hi Vlad,
Can you provide us the configuration options that you've used to build
mvapich2?  Also, does this happen for mvapich2-1.6rc2?  If this is
reproducible with mvapich2-1.6rc2 will it be possible for you to post
a back trace of the mpispawn process?  Thanks in advance.

On Mon, Jan 10, 2011 at 3:49 PM, Vlad Cojocaru
<vlad.cojocaru at mpi-muenster.mpg.de> wrote:
> Dear MVAPICH2 users,
>
> I am running molecular dynamics programs (AMBER and NAMD) using
> MVAPICH2, version 1.6rc1 .
> My jobs stall after successful completion. Basically, everything ooks
> fine, job finishes with all the output complete but then the job does
> not exit, it hangs (appears as a ghost job). If I kill the left over
> "mpiswam" process, everything is fine, but of course if I have the
> parallel run as a step in a workflow, I always need to manually kill
> this left over process so that the subsequent jobs can run.
>
> Did anybody noticed such behavior ? I also have to add that this is not
> reproducible, it happens at random times, submitting the same job over
> and over again does produce the same outcome.
> Also, it happens even with the simple test provided with MVAPICH2 ... On
> the same cluster, OPENMPI 1.4.3 runs correctly. MVAPICH2 appears to
> scale better, that's why I would like to use it.
>
> Here are details on my architecture:
> cpu: AMD Opteron Istanbul
> arch: Linux x86_64, CentOS 5.5
> mpi: MVAPICH2 1.6 rc1 (the problem appeared also with version 1.5)
> compiler: INTEL 11.0.073 or GCC 4.5.1 (problem is seen with both
> compilations)
> interconnection: Mellanox infiniband
> Oracle Grid Engine used for controlling the jobs (however the problem
> appears also when jobs are run without the grid engine)
>
> If anybody has seen such a behavior before and knows an elegant fix, I
> would appreciate an advice
>
> Thank you
>
> Best wishes
> Vlad
>
>
> --
> Dr. Vlad Cojocaru
> Max Planck Institute for Molecular Biomedicine
> Department of Cellular and Developmental Biology
> Roentgenstrasse 20
> 48149 Muenster, Germany
> tel: +49-251-70365-324
> fax: +49-251-70365-399
> email: vlad.cojocaru[at]mpi-muenster.mpg.de
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list