[mvapich-discuss] mvapich2 jobs stall after successful completion
Vlad Cojocaru
vlad.cojocaru at mpi-muenster.mpg.de
Mon Jan 10 18:02:10 EST 2011
Dear Jonathan,
The build script I use is below ...
I will see when I'll have time to test the rc2 version and let you know
Best wishes
Vlad
----------build_intel11.sh -------------------
#!/bin/bash
export COMP="intel-11.1.073"
export INTEL_HOME=/usr/global/compiler/intel/11.1.073
export MKL_HOME=/usr/global/compiler/intel/11.1.073/mkl
source $INTEL_HOME/bin/iccvars.sh intel64
source $INTEL_HOME/bin/ifortvars.sh intel64
export INCLUDE_PATH=$INTEL_HOME/include:\
$INTEL_HOME/include/intel64:\
$MKL_HOME/include:\
$MKL_HOME/include/em64t
export CC="icc"
export CXX="icpc"
export F77="ifort"
export F90="ifort"
export FC="ifort"
export LDFLAGS="-L$LD_LIBRARY_PATH"
export CPP="icc -E"
export CXXCPP="icpc -E"
export CPPFLAGS="-I$INCLUDE_PATH"
export CFLAGS="-O3 -msse3"
export CXXFLAGS="-O3 -msse3"
export FPP="ifort -E"
export FFLAGS="-O3 -msse3"
export FCLAGS="-O3 -msse3"
export F90FLAGS="-O3 -msse3"
export F77FLAGS="-O3 -msse3"
echo "LD_LIBRARY_PATH: $LD_LIBRARY_PATH"
echo "PATH: $PATH"
ICC=`which icc`
echo "ICC=$ICC"
ICPC=`which icpc`
echo "ICPC=$ICPC"
IFORT=`which ifort`
echo "IFORT=$IFORT"
make clean
make distclean
./configure --prefix=/usr/global/mpi/mvapich2/1.6rc1-$COMP \
--enable-f77 \
--enable-f90 \
--enable-cxx \
--enable-romio \
--enable-threads=default \
--enable-sharedlibs=gcc \
--with-thread-package=pthreads
make 2>&1 | tee make_$COMP.log
make install 2>&1 | tee make_install_$COMP.log
exit
On 01/10/2011 10:30 PM, Jonathan Perkins wrote:
> Hi Vlad,
> Can you provide us the configuration options that you've used to build
> mvapich2? Also, does this happen for mvapich2-1.6rc2? If this is
> reproducible with mvapich2-1.6rc2 will it be possible for you to post
> a back trace of the mpispawn process? Thanks in advance.
>
> On Mon, Jan 10, 2011 at 3:49 PM, Vlad Cojocaru
> <vlad.cojocaru at mpi-muenster.mpg.de> wrote:
>
>> Dear MVAPICH2 users,
>>
>> I am running molecular dynamics programs (AMBER and NAMD) using
>> MVAPICH2, version 1.6rc1 .
>> My jobs stall after successful completion. Basically, everything ooks
>> fine, job finishes with all the output complete but then the job does
>> not exit, it hangs (appears as a ghost job). If I kill the left over
>> "mpiswam" process, everything is fine, but of course if I have the
>> parallel run as a step in a workflow, I always need to manually kill
>> this left over process so that the subsequent jobs can run.
>>
>> Did anybody noticed such behavior ? I also have to add that this is not
>> reproducible, it happens at random times, submitting the same job over
>> and over again does produce the same outcome.
>> Also, it happens even with the simple test provided with MVAPICH2 ... On
>> the same cluster, OPENMPI 1.4.3 runs correctly. MVAPICH2 appears to
>> scale better, that's why I would like to use it.
>>
>> Here are details on my architecture:
>> cpu: AMD Opteron Istanbul
>> arch: Linux x86_64, CentOS 5.5
>> mpi: MVAPICH2 1.6 rc1 (the problem appeared also with version 1.5)
>> compiler: INTEL 11.0.073 or GCC 4.5.1 (problem is seen with both
>> compilations)
>> interconnection: Mellanox infiniband
>> Oracle Grid Engine used for controlling the jobs (however the problem
>> appears also when jobs are run without the grid engine)
>>
>> If anybody has seen such a behavior before and knows an elegant fix, I
>> would appreciate an advice
>>
>> Thank you
>>
>> Best wishes
>> Vlad
>>
>>
>> --
>> Dr. Vlad Cojocaru
>> Max Planck Institute for Molecular Biomedicine
>> Department of Cellular and Developmental Biology
>> Roentgenstrasse 20
>> 48149 Muenster, Germany
>> tel: +49-251-70365-324
>> fax: +49-251-70365-399
>> email: vlad.cojocaru[at]mpi-muenster.mpg.de
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>
>
>
--
Dr. Vlad Cojocaru
Max Planck Institute for Molecular Biomedicine
Department of Cellular and Developmental Biology
Roentgenstrasse 20
48149 Muenster, Germany
tel: +49-251-70365-324
fax: +49-251-70365-399
email: vlad.cojocaru[at]mpi-muenster.mpg.de
More information about the mvapich-discuss
mailing list