[mvapich-discuss] mvapich2 1.6 cannot run job on many nodes

Fri Jul 15 10:37:33 EDT 2011

Sorry maybe I confused with info but problem with MVAPICH2 1.6 is not solved.
With version 1.6 osu_benchmarks on the "problem node" work fine but any other code 
(compiled by MVAPICH2 1.6) when I include this "problem node" to the host list 
have problems as I mentioned in previous e-mails. 
As MVAPICH2 1.6 as 1.0.3 work fine on cluster exclude "problem node". 
OpenMPI works on all nodes include "problem node".
And I didn't see (right now of course) any reason to update OFED or cluster in general.
My conclusion is that there is a problem with MVAPICH2 on that node and it relates to
IB there. I'm not sure but maybe problem is in Nemesis interface of MVAPICH2 (but what 
about MVAPICH2 1.0.3? Why it does still has problem there?). OpenMPI compiled with 
rdma support. OK. I will compile MVAPICH2 with CH3-Gen2 to exclude this. But question
still exist. Where is the problem? Or it is hardware (but ib test work fine), or it is
software (in this case which one? OFED libraries or still MVAPICH2)?

Concerning Nemesis & CH3-Gen2. From the manual and this list I understood that 
Nemesis interface is preferable. But here you mentioned that CH3-Gen2 still has 
better performance, features and scalability. So what is the current state of interfaces
and in which cases we should use one or another?

Thanx,
Egor.

> Thanks for your note. Good to know that MVAPICH2 1.6 is working fine.
> MVAPICH2 1.0.3 is quite old and is not supported any more. The error code
> seems to points to IB-related error. You seem to be running an older OFED
> also. Please update the OFED to the latest stable version and use it with
> the latest MVAPICH2 (1.6) version. In the earlier e-mail, you had also
> indicated that you are using the Nemesis interface of MVAPICH2. You can
> use the CH3-Gen2 interface to get the best performance, features and
> scalability.
> 
> Below is a link to the section on building the CH3-Gen2 interface from
> the MVAPICH2 1.6 user guide.
> 
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-100004.4
> 
> 2011/7/15 <worldeb at ukr.net>:
> >
> >  Hi again,
> >
> > it seems I found problem or at least localized it.
> > There is one node which when I submit job there produces these errors.
> > In case of cpi code it happens when this node is in list of more than 8 nodes.
> > For other codes it happen just for two nodes (as for mvapich2 1.6 as for 1.0.3)
> > Any codes work well when jobs run only on this problem node.
> >
> > I try to play with this node and network using osu benchmarks, for example osu_bibw
> >
> > mvapich2-1.6 passed without any problems
> > mvapich2-1.0.3 with osu mpiexec & torque shows errors:
> > send desc error
> > [0] Abort: [] Got completion with error 9, vendor code=8a, dest rank=1
> > at line 519 in file ibv_channel_manager.c
> > [1] Abort: Got FATAL event 3
> > at line 796 in file ibv_channel_manager.c
> >
> > mvapich1-1.0.1 passed without any errors
> > openmpi-1.4.3 with -mca btl self,openib passed without errors.
> >
> > So, is it IB problem? In this case why does it just happen only with mvapich2?
> >
> > I tested IB card on this node by standard ib_rdma/read/send_bw/lat and it seems to work.
> >
> > Thnax,
> > Egor.
> >
> >> I have 8 cores per node. Half of nodes have 16GB RAM, half of them have 32GB.
> >> CPU are
> >> Intel(R) Xeon(R) E5410 @ 2.33GHz
> >> Intel(R) Xeon(R) E5472 @ 3.00GHz
> >> Intel(R) Xeon(R) E5620 @ 2.40GHz
> >> OFED 1.3.1-rc2 and CentOS 5 with kernel 2.6.18-53.1.21.el5.
> >>
> >> ulimit -a on the all nodes:
> >> core file size (blocks, -c) 0
> >> data seg size (kbytes, -d) unlimited
> >> max nice (-e) 0
> >> file size (blocks, -f) unlimited
> >> pending signals (-i) 139264
> >> max locked memory (kbytes, -l) unlimited
> >> max memory size (kbytes, -m) unlimited
> >> open files (-n) 1024
> >> pipe size (512 bytes, -p) 8
> >> POSIX message queues (bytes, -q) 819200
> >> max rt priority (-r) 0
> >> stack size (kbytes, -s) 10240
> >> cpu time (seconds, -t) unlimited
> >> max user processes (-u) 139264
> >> virtual memory (kbytes, -v) unlimited
> >> file locks (-x) unlimited
> >>
> >> the same problem I have as for gcc as for intel 10.1 compilers.
> >>
> >> thnax,
> >> Egor.
> >>
> >> > I've used the same configuration options but I have not been
> >> > able to reproduce this problem. I've used varying number of cores
> >> > (focusing on 321 and 512 cores), while running cpi and osu_mbw_mr with
> >> > mpirun_rsh and hydra (mpiexec). Perhaps there is some missing
> >> > information I need to reproduce this. How many cores per machine are
> >> > you using? Perhaps a certain machine triggers the problem. Can you
> >> > tell us what cpu and how much memory each machine has? Thanks in
> >> > advance.
> >> >
> >> > 2011/7/14 <worldeb at ukr.net>:
> >> > >
> >> > > Hi folks,
> >> > >
> >> > > mvapich2-1.6-r4751
> >> > > gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)
> >> > > InfiniBand: Mellanox Technologies MT25204
> >> > > torque 2.1.8
> >> > >
> >> > > ./configure --prefix=/usr/mpi/gcc/mvapich2-1.6.0 --enable-f77 --enable-f90 --enable-cxx --enable-debuginfo --enable-smpcoll --enable-async-progress --enable-threads=default --with-hwloc --with-device=ch3:nemesis:ib --enable-sharedlibs=gcc --enable-romio
> >> > >
> >> > > Cannot run jobs on many nodes (for examples >320 cores) as using batch system with mpiexec osu or native mpiexec as submiting them directly by mpiexec.hydra or mpirun_rsh.
> >> > > Actually this number of 320 cores is not fixed. It change from time to time but mpirun_rsh submit jobs successfully on the less nodes exactly.
> >> > >
> >> > > I try to play only with simple codes like "hello word" on each cpu or even with cpi from examples or osu_benchmarks.
> >> > >
> >> > > Errors are like:
> >> > >
> >> > > mpiexec.hydra -n 321 -f HOSTFILE ./test_mvapich2_gcc-1.6.0
> >> > >
> >> > > Fatal error in MPI_Init: Internal MPI error!, error stack:
> >> > > MPIR_Init_thread(413): Initialization failed
> >> > > (unknown)(): Internal MPI error!
> >> > > =====================================================================================
> >> > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> > > = EXIT CODE: 256
> >> > > = CLEANING UP REMAINING PROCESSES
> >> > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> > > =====================================================================================
> >> > > [proxy:0:0 at node01] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> >> > > [proxy:0:0 at node01] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> >> > > [proxy:0:0 at node01].ac.at] main (./pm/pmiserv/pmip.c:214): demux engine error waiting for event
> >> > > [mpiexec at head] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> >> > > [mpiexec at head] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> >> > > [mpiexec at head] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:199): launcher returned error waiting for completion
> >> > > [mpiexec at head] main (./ui/mpich/mpiexec.c:385): process manager error waiting for completion
> >> > >
> >> > >
> >> > > I have no problem with the same codes but compiled by last openmpi with IB and calculated on all nodes.
> >> > >
> >> > > Any suggestions what was a problem and how to solve it.