[mvapich-discuss] problems in executing higher number process job
Sangamesh B
forum.san at gmail.com
Wed Aug 20 03:24:18 EDT 2008
Dear Matthew,
While installing charm++, I had referred your mvapich-charm-namd post.
The charm++ configuration files look as follows:
*****************************************************************************/
#ifndef _CONV_MACH_H
#define _CONV_MACH_H
#define CMK_AMD64 1
#define CMK_CONVERSE_MPI 1
#define CMK_DEFAULT_MAIN_USES_COMMON_CODE 1
#define CMK_GETPAGESIZE_AVAILABLE 1
#define CMK_IS_HETERO 0
#define CMK_MALLOC_USE_GNU_MALLOC 0
#define CMK_MALLOC_USE_OS_BUILTIN 1
#define CMK_MEMORY_PAGESIZE 8192
#define CMK_MEMORY_PROTECTABLE 1
#define CMK_NODE_QUEUE_AVAILABLE 0
#define CMK_SHARED_VARS_EXEMPLAR 0
#define CMK_SHARED_VARS_UNAVAILABLE 1
#define CMK_SHARED_VARS_UNIPROCESSOR 0
#define CMK_SIGNAL_NOT_NEEDED 0
#define CMK_SIGNAL_USE_SIGACTION 0
#define CMK_SIGNAL_USE_SIGACTION_WITH_RESTART 1
#define CMK_THREADS_REQUIRE_NO_CPV 0
#define CMK_TIMER_USE_GETRUSAGE 0
#define CMK_TIMER_USE_SPECIAL 1
#define CMK_TIMER_USE_TIMES 0
#define CMK_TIMER_USE_RDTSC 0
#define CMK_THREADS_USE_CONTEXT 1
#define CMK_THREADS_USE_PTHREADS 0
#define CMK_TYPEDEF_INT2 short
#define CMK_TYPEDEF_INT4 int
#define CMK_TYPEDEF_INT8 long long
#define CMK_TYPEDEF_UINT2 unsigned short
#define CMK_TYPEDEF_UINT4 unsigned int
#define CMK_TYPEDEF_UINT8 unsigned long long
#define CMK_TYPEDEF_FLOAT4 float
#define CMK_TYPEDEF_FLOAT8 double
#define CMK_64BIT 1
#define CMK_WHEN_PROCESSOR_IDLE_BUSYWAIT 1
#define CMK_WHEN_PROCESSOR_IDLE_USLEEP 0
#define CMK_WEB_MODE 1
#define CMK_DEBUG_MODE 0
#define CMK_LBDB_ON 1
#endif
# cat src/arch/mpi-linux-x86_64/conv-mach.sh
# user enviorn var: MPICXX and MPICC
# or, use the definition in file $CHARMINC/MPIOPTS
if test -x "$CHARMINC/MPIOPTS"
then
. $CHARMINC/MPIOPTS
else
MPICXX_DEF=/data/mvapich2_intel/bin/mpicxx
MPICC_DEF=/data/mvapich2_intel/bin/mpicc
fi
test -z "$MPICXX" && MPICXX=$MPICXX_DEF
test -z "$MPICC" && MPICC=$MPICC_DEF
test "$MPICXX" != "$MPICXX_DEF" && /bin/rm -f $CHARMINC/MPIOPTS
if test ! -f "$CHARMINC/MPIOPTS"
then
echo MPICXX_DEF=$MPICXX > $CHARMINC/MPIOPTS
echo MPICC_DEF=$MPICC >> $CHARMINC/MPIOPTS
chmod +x $CHARMINC/MPIOPTS
fi
CMK_REAL_COMPILER=`$MPICXX -show 2>/dev/null | cut -d' ' -f1 `
case "$CMK_REAL_COMPILER" in
g++) CMK_AMD64="-m64 -fPIC" ;;
esac
CMK_CPP_CHARM="/lib/cpp -P"
CMK_CPP_C="$MPICC -E"
CMK_CC="$MPICC $CMK_AMD64 "
CMK_CXX="$MPICXX $CMK_AMD64 "
CMK_CXXPP="$MPICXX -E $CMK_AMD64 "
#CMK_SYSLIBS="-lmpich"
CMK_LIBS="-lckqt $CMK_SYSLIBS "
CMK_LD_LIBRARY_PATH="-Wl,-rpath,$CHARMLIBSO/"
CMK_NATIVE_CC="icc $CMK_AMD64 "
CMK_NATIVE_LD="icc $CMK_AMD64 "
CMK_NATIVE_CXX="icpc $CMK_AMD64 "
CMK_NATIVE_LDXX="icpc $CMK_AMD64 "
CMK_NATIVE_LIBS=""
# fortran compiler
CMK_CF77="f77"
CMK_CF90="f90"
CMK_F90LIBS="-L/usr/absoft/lib -L/opt/absoft/lib -lf90math -lfio -lU77
-lf77math "
CMK_F77LIBS="-lg2c "
CMK_F90_USE_MODDIR=1
CMK_F90_MODINC="-p"
CMK_QT='generic64'
CMK_RANLIB="ranlib"
All configurations are fine. I think there is no problem with installation.
Any similar hints for Gromacs?
Thank you,
Sangamesh
Consultant, HPC
On Wed, Aug 20, 2008 at 12:26 AM, Matthew Koop <koop at cse.ohio-state.edu>wrote:
> Sangamesh,
>
> I'm not sure what your issue here is, however, we have run each of these
> sets of software in the past without any problem. I just re-verified again
> that NAMD works fine with that version of MVAPICH2 and compilers at 128
> processes and above.
>
> Can you give your parameters that you used for building Charm++?
> (conv_mach.sh)
>
> I've posted this in the past as a guide for MVAPICH:
> cd charm-5.9
> cd ./src/arch
>
> cp -r mpi-linux-amd64 mpi-linux-amd64-mvapich
> cd mpi-linux-amd64-mvapich
>
> * edit conv-mach.h and change:
>
> #define CMK_MALLOC_USE_GNU_MALLOC 1
> #define CMK_MALLOC_USE_OS_BUILTIN 0
>
> to
>
> #define CMK_MALLOC_USE_GNU_MALLOC 0
> #define CMK_MALLOC_USE_OS_BUILTIN 1
>
> * make sure the MVAPICH mpicc and mpiCC are first in your path. Otherwise,
> add the full path to the mpicc and mpiCC commands in conv_mach.sh
>
> cd ../../..
>
> ./build charm++ mpi-linux-amd64-mvapich --no-build-shared
>
> You may need to change mpiCC to mpicxx in the conv_mach.sh in
> charm-5.9/src/arch/mpi-linux-amd64-mvapich
>
> Matt
>
> On Tue, 19 Aug 2008, Sangamesh B wrote:
>
> > Hi DK Sir,
> >
> > I'm using OpenIB. MVAPICH2 is built with OFED-1.3 and Intel
> compilers.
> >
> > This is the new cluster we built recently. The environment is different
> from
> > the earlier. But earlier also we built mvapich2 for OFA interface only.
> >
> > We've used make.mvapich2.ofa for installation. This will not install
> uDAPL
> > stack right?
> >
> > Thank you,
> > Sangamesh
> >
> > On Tue, Aug 19, 2008 at 5:51 AM, Joshua Bernstein <
> > jbernstein at penguincomputing.com> wrote:
> >
> > > Agreed,
> > >
> > > Generally the "OpenIB" transport provides for greater startup
> and
> > > reliability over large number of cores, so if you are using uDAPL, I
> would
> > > suggest giving openib a shot.
> > >
> > > -Joshua Bernstein
> > > Software Engineer
> > > Penguin Computing
> > >
> > > Dhabaleswar Panda wrote:
> > >
> > >> Sangamesh,
> > >>
> > >> Some of your earlier queries were for the uDAPL interface of MVAPICH2
> > >> running on your customized adapter. Do these problems occur on the
> same
> > >> environment/interface? Since MVAPICH2 supports multiple interfaces, it
> > >> will be good if you can indicate which interface of MVAPICH2 you are
> using
> > >> here.
> > >>
> > >> DK
> > >>
> > >> On Mon, 18 Aug 2008, Sangamesh B wrote:
> > >>
> > >> Dear all,
> > >>>
> > >>> Problem No 1:
> > >>>
> > >>> Application: GROMACS 3.3.3
> > >>>
> > >>> Parallel Library: MVAPICH2-1.0.3
> > >>>
> > >>> Compilers: Intel C++ and Fortran 10
> > >>>
> > >>> A parallel Gromacs-3.3.3(C application) 32 core job runs
> successfully on
> > >>> a
> > >>> Rocks 4.3, 33
> > >>> node cluster ( Dual processor, Quad core Intel Xeon: Total 264 cores
> ).
> > >>>
> > >>> But if I submit same job for 64 or higher no of processes, it comes
> > >>> without
> > >>> doing
> > >>> anything.
> > >>>
> > >>> This is my command line:
> > >>>
> > >>> grompp_mpi -np 64 -f run.mdp -p topol.top -c pr.gro -o run.tpr
> > >>> mpirun -machinefile ./machfile1 -np 64 mdrun_mpi -v -deffnm run
> > >>>
> > >>>
> > >>>
> > >>> Problem No 2:
> > >>>
> > >>> Application: NAMD 2.6
> > >>>
> > >>> Parallel Library: MVAPICH2-1.0.3
> > >>>
> > >>> Compilers: Intel C++ and Fortran 10
> > >>>
> > >>> I built successfully charm++ with mvapich2 and intel compilers, and
> then
> > >>> compiled NAMD2.
> > >>>
> > >>> The test examples given in the NAMD distribution works fine.
> > >>>
> > >>> With the following input file( This input file is the one which is
> used
> > >>> in
> > >>> the NAMD website, for benchmarking. It runs/scales upto 252 processes
> as
> > >>> mentioned in NAMD website). But in my case it runs only for 8
> process, 16
> > >>> process, 32 process, 64 processes.
> > >>>
> > >>> But when a 128 core job submitted, it doesn't run at all. The
> following
> > >>> is
> > >>> the command and error.
> > >>>
> > >>> #mpirun -machinefile ./machfile -np 128
> > >>> /data/apps/namd26_mvapich2/Linux-mvapich2/namd2 ./apoa1.namd | tee
> > >>> namd_128cores
> > >>> Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
> > >>> rank 65 in job 4 master_host_name_50238 caused collective abort of
> all
> > >>> ranks
> > >>> exit status of rank 65: killed by signal 9
> > >>>
> > >>>
> > >>> So, in further, I built charmc with network version of charm++
> library
> > >>> without using mvapich2. Now it works for any number process job.
> > >>>
> > >>> So, for the above two problems, I guess there is some thing problem
> with
> > >>> mvapich2 itself. Is there a solution for it?
> > >>>
> > >>>
> > >>> Regards,
> > >>> Sangamesh
> > >>>
> > >>>
> > >> _______________________________________________
> > >> mvapich-discuss mailing list
> > >> mvapich-discuss at cse.ohio-state.edu
> > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >>
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080820/35ed7632/attachment-0001.html
More information about the mvapich-discuss
mailing list