[mvapich-discuss] MVAPICH2 jobs hang up probably due to changes between, 2009-10-08 and 2009-10-28?

Sayantan Sur surs at cse.ohio-state.edu
Fri May 21 12:18:31 EDT 2010


Hi Manhui,

Thanks for your message. We will like to see what may be going on.
Here are some questions that we have:

1) What exactly is the error message? Is the hangup in the startup or
some time after the computation starts up?
2) How many nodes does this show up?

I signed up for a temporary license for molpro on your website. I
downloaded the binary release
"molpro-mpp-2009.1-20_trial.Linux_x86_64.sh.gz". Could you provide us
with detailed instructions on how to reproduce this behavior with
molpro? (you can provide these off list if you choose). Once we
reproduce the error, it will be much easier to debug.

Thanks.

On Fri, May 21, 2010 at 1:41 PM, Manhui Wang <wangm9 at cardiff.ac.uk> wrote:
> Hello,
>     I built MVAPICH2 library using mvapich2-trunk-2009-10-08.tar.gz
> with Intel v10 compilers last year. The following options are used:
> nice -n +18 ./configure  --with-rdma=gen2
> --with-ib-include=/usr/include/infiniband --with-ib-libpath=/usr/lib64
> --prefix=/home/sacmw4/soft/mpich2-trunk-2009-10-08-install FC=ifort
> --enable-f90 F90=ifort F77=ifort --enable-cc CC=icc --enable-cxx
> CXX=icc 2>&1 | tee configure.log
> nice -n +18 make 2>&1 | tee make.log
> nice -n +18 make install 2>&1 | tee install.log
>
> When I linked our Quantum Chemistry Program MOLPRO with this library,
> all our testjobs worked fine and still work fine now.
>
> I tried mvapich2-1.4 version later with the same compilers and options,
> and linked it with our program as before. Some testjobs hang up. I have
> built different versions of mvapich2 (mvapich2-1.2-2009-10-28.tar.gz
> mvapich2-1.4.1.tgz  mvapich2-1.4-2010-04-07.tar.gz  mvapich2-1.4.tgz
>  mvapich2-trunk-2009-10-08.tar.gz mvapich2-1.4-2010-01-02.tar.gz
> mvapich2-1.5rc1.tgz) with the same compilers and options, and linked
> them with the same program. The builds linked with
> mvapich2-1.2-2009-10-28 and versions after 2009-10-28 have the same
> problem (some jobs hang up). I am wondering whether there is something
> wrong with the code changed between 2009-10-08 and 2009-10-28. Or have I
> missed something? I have attached the compilation log files.
>
> Thank you in advance.
> Manhui
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
The Ohio State University.



More information about the mvapich-discuss mailing list