[mvapich-discuss] problems in executing higher number process job

Sangamesh B forum.san at gmail.com
Wed Aug 20 03:24:18 EDT 2008


Dear Matthew,

     While installing charm++, I had referred your mvapich-charm-namd post.

The charm++ configuration files look as follows:

 *****************************************************************************/

#ifndef _CONV_MACH_H
#define _CONV_MACH_H

#define CMK_AMD64                                          1
#define CMK_CONVERSE_MPI                                   1

#define CMK_DEFAULT_MAIN_USES_COMMON_CODE                  1

#define CMK_GETPAGESIZE_AVAILABLE                          1

#define CMK_IS_HETERO                                      0

#define CMK_MALLOC_USE_GNU_MALLOC                          0
#define CMK_MALLOC_USE_OS_BUILTIN                          1

#define CMK_MEMORY_PAGESIZE                                8192
#define CMK_MEMORY_PROTECTABLE                             1

#define CMK_NODE_QUEUE_AVAILABLE                           0

#define CMK_SHARED_VARS_EXEMPLAR                           0
#define CMK_SHARED_VARS_UNAVAILABLE                        1
#define CMK_SHARED_VARS_UNIPROCESSOR                       0

#define CMK_SIGNAL_NOT_NEEDED                              0
#define CMK_SIGNAL_USE_SIGACTION                           0
#define CMK_SIGNAL_USE_SIGACTION_WITH_RESTART              1

#define CMK_THREADS_REQUIRE_NO_CPV                         0

#define CMK_TIMER_USE_GETRUSAGE                            0
#define CMK_TIMER_USE_SPECIAL                              1
#define CMK_TIMER_USE_TIMES                                0
#define CMK_TIMER_USE_RDTSC                                0

#define CMK_THREADS_USE_CONTEXT                            1
#define CMK_THREADS_USE_PTHREADS                           0

#define CMK_TYPEDEF_INT2 short
#define CMK_TYPEDEF_INT4 int
#define CMK_TYPEDEF_INT8 long long
#define CMK_TYPEDEF_UINT2 unsigned short
#define CMK_TYPEDEF_UINT4 unsigned int
#define CMK_TYPEDEF_UINT8 unsigned long long
#define CMK_TYPEDEF_FLOAT4 float
#define CMK_TYPEDEF_FLOAT8 double

#define CMK_64BIT    1

#define CMK_WHEN_PROCESSOR_IDLE_BUSYWAIT                   1
#define CMK_WHEN_PROCESSOR_IDLE_USLEEP                     0


#define CMK_WEB_MODE                                       1
#define CMK_DEBUG_MODE                                     0

#define CMK_LBDB_ON                                        1

#endif

# cat src/arch/mpi-linux-x86_64/conv-mach.sh

# user enviorn var: MPICXX and MPICC
# or, use the definition in file $CHARMINC/MPIOPTS
if test -x "$CHARMINC/MPIOPTS"
then
  . $CHARMINC/MPIOPTS
else
  MPICXX_DEF=/data/mvapich2_intel/bin/mpicxx
  MPICC_DEF=/data/mvapich2_intel/bin/mpicc
fi

test -z "$MPICXX" && MPICXX=$MPICXX_DEF
test -z "$MPICC" && MPICC=$MPICC_DEF
test "$MPICXX" != "$MPICXX_DEF" && /bin/rm -f $CHARMINC/MPIOPTS
if test ! -f "$CHARMINC/MPIOPTS"
then
  echo MPICXX_DEF=$MPICXX > $CHARMINC/MPIOPTS
  echo MPICC_DEF=$MPICC >> $CHARMINC/MPIOPTS
  chmod +x $CHARMINC/MPIOPTS
fi

CMK_REAL_COMPILER=`$MPICXX -show 2>/dev/null | cut -d' ' -f1 `
case "$CMK_REAL_COMPILER" in
g++) CMK_AMD64="-m64 -fPIC" ;;
esac

CMK_CPP_CHARM="/lib/cpp -P"
CMK_CPP_C="$MPICC -E"
CMK_CC="$MPICC $CMK_AMD64 "
CMK_CXX="$MPICXX $CMK_AMD64 "
CMK_CXXPP="$MPICXX -E $CMK_AMD64 "

#CMK_SYSLIBS="-lmpich"

CMK_LIBS="-lckqt $CMK_SYSLIBS "
CMK_LD_LIBRARY_PATH="-Wl,-rpath,$CHARMLIBSO/"

CMK_NATIVE_CC="icc $CMK_AMD64 "
CMK_NATIVE_LD="icc $CMK_AMD64 "
CMK_NATIVE_CXX="icpc $CMK_AMD64 "
CMK_NATIVE_LDXX="icpc $CMK_AMD64 "
CMK_NATIVE_LIBS=""

# fortran compiler
CMK_CF77="f77"
CMK_CF90="f90"
CMK_F90LIBS="-L/usr/absoft/lib -L/opt/absoft/lib -lf90math -lfio -lU77
-lf77math "
CMK_F77LIBS="-lg2c "
CMK_F90_USE_MODDIR=1
CMK_F90_MODINC="-p"

CMK_QT='generic64'
CMK_RANLIB="ranlib"


All configurations are fine.  I think there is no problem with installation.

Any similar hints for Gromacs?

Thank you,
Sangamesh
Consultant, HPC

On Wed, Aug 20, 2008 at 12:26 AM, Matthew Koop <koop at cse.ohio-state.edu>wrote:

> Sangamesh,
>
> I'm not sure what your issue here is, however, we have run each of these
> sets of software in the past without any problem. I just re-verified again
> that NAMD works fine with that version of MVAPICH2 and compilers at 128
> processes and above.
>
> Can you give your parameters that you used for building Charm++?
> (conv_mach.sh)
>
> I've posted this in the past as a guide for MVAPICH:
> cd charm-5.9
> cd ./src/arch
>
> cp -r mpi-linux-amd64 mpi-linux-amd64-mvapich
> cd mpi-linux-amd64-mvapich
>
> * edit conv-mach.h and change:
>
> #define CMK_MALLOC_USE_GNU_MALLOC                          1
> #define CMK_MALLOC_USE_OS_BUILTIN                          0
>
> to
>
> #define CMK_MALLOC_USE_GNU_MALLOC                          0
> #define CMK_MALLOC_USE_OS_BUILTIN                          1
>
> * make sure the MVAPICH mpicc and mpiCC are first in your path. Otherwise,
> add the full path to the mpicc and mpiCC commands in conv_mach.sh
>
> cd ../../..
>
> ./build charm++ mpi-linux-amd64-mvapich --no-build-shared
>
> You may need to change mpiCC to mpicxx in the conv_mach.sh in
> charm-5.9/src/arch/mpi-linux-amd64-mvapich
>
> Matt
>
> On Tue, 19 Aug 2008, Sangamesh B wrote:
>
> > Hi DK Sir,
> >
> >      I'm using OpenIB. MVAPICH2 is built with OFED-1.3 and Intel
> compilers.
> >
> > This is the new cluster we built recently. The environment is different
> from
> > the earlier. But earlier also we built mvapich2 for OFA interface only.
> >
> > We've used make.mvapich2.ofa for installation. This will not install
> uDAPL
> > stack right?
> >
> > Thank you,
> > Sangamesh
> >
> > On Tue, Aug 19, 2008 at 5:51 AM, Joshua Bernstein <
> > jbernstein at penguincomputing.com> wrote:
> >
> > > Agreed,
> > >
> > >        Generally the "OpenIB" transport provides for greater startup
> and
> > > reliability over large number of cores, so if you are using uDAPL, I
> would
> > > suggest giving openib a shot.
> > >
> > > -Joshua Bernstein
> > > Software Engineer
> > > Penguin Computing
> > >
> > > Dhabaleswar Panda wrote:
> > >
> > >> Sangamesh,
> > >>
> > >> Some of your earlier queries were for the uDAPL interface of MVAPICH2
> > >> running on your customized adapter. Do these problems occur on the
> same
> > >> environment/interface? Since MVAPICH2 supports multiple interfaces, it
> > >> will be good if you can indicate which interface of MVAPICH2 you are
> using
> > >> here.
> > >>
> > >> DK
> > >>
> > >> On Mon, 18 Aug 2008, Sangamesh B wrote:
> > >>
> > >>   Dear all,
> > >>>
> > >>> Problem No 1:
> > >>>
> > >>> Application: GROMACS 3.3.3
> > >>>
> > >>> Parallel Library: MVAPICH2-1.0.3
> > >>>
> > >>> Compilers: Intel C++ and Fortran 10
> > >>>
> > >>>  A parallel Gromacs-3.3.3(C application) 32 core job runs
> successfully on
> > >>> a
> > >>> Rocks 4.3, 33
> > >>> node cluster ( Dual processor, Quad core Intel Xeon: Total 264 cores
> ).
> > >>>
> > >>> But if I submit same job for 64 or higher no of processes, it  comes
> > >>> without
> > >>> doing
> > >>> anything.
> > >>>
> > >>> This is my command line:
> > >>>
> > >>> grompp_mpi -np 64 -f run.mdp -p topol.top -c pr.gro -o run.tpr
> > >>> mpirun -machinefile ./machfile1 -np 64 mdrun_mpi -v -deffnm run
> > >>>
> > >>>
> > >>>
> > >>> Problem No 2:
> > >>>
> > >>> Application: NAMD 2.6
> > >>>
> > >>> Parallel Library: MVAPICH2-1.0.3
> > >>>
> > >>> Compilers: Intel C++ and Fortran 10
> > >>>
> > >>> I built successfully charm++ with mvapich2 and intel compilers, and
> then
> > >>> compiled NAMD2.
> > >>>
> > >>> The test examples given in the NAMD distribution works fine.
> > >>>
> > >>> With the following input file( This input file is the one which is
> used
> > >>> in
> > >>> the NAMD website, for benchmarking. It runs/scales upto 252 processes
> as
> > >>> mentioned in NAMD website). But in my case it runs only for 8
> process, 16
> > >>> process, 32 process, 64 processes.
> > >>>
> > >>> But when a 128 core job submitted, it doesn't run at all. The
> following
> > >>> is
> > >>> the command and error.
> > >>>
> > >>> #mpirun -machinefile ./machfile -np 128
> > >>> /data/apps/namd26_mvapich2/Linux-mvapich2/namd2 ./apoa1.namd | tee
> > >>> namd_128cores
> > >>> Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
> > >>> rank 65 in job 4  master_host_name_50238   caused collective abort of
> all
> > >>> ranks
> > >>>  exit status of rank 65: killed by signal 9
> > >>>
> > >>>
> > >>> So, in further, I built charmc with network version of charm++
> library
> > >>> without using mvapich2. Now it works for any number process job.
> > >>>
> > >>> So, for the above two problems, I guess there is some thing problem
> with
> > >>> mvapich2 itself.  Is there a solution for it?
> > >>>
> > >>>
> > >>> Regards,
> > >>> Sangamesh
> > >>>
> > >>>
> > >> _______________________________________________
> > >> mvapich-discuss mailing list
> > >> mvapich-discuss at cse.ohio-state.edu
> > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >>
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080820/35ed7632/attachment-0001.html


More information about the mvapich-discuss mailing list