[mvapich-discuss] Problem with MVAPICH2 (1.9, r6338) on InfiniBand

Devendar Bureddy bureddy at cse.ohio-state.edu
Mon Jul 8 17:50:45 EDT 2013


Hi Steven

Good to know that it worked fine with MV2_NUM_HCAS=2.   The initial issue
could be related to your IB setup( #HCAs/node, type of HCA ..etc).
 MVAPICH2 by default try to use all the configured HCAs and that might be
causing a issue in your setup. The parameter MV2_NUM_HCAS=2 is explicitly
requesting to initialize and use only 2 HCAs/node.

-Devendar


On Mon, Jul 8, 2013 at 5:29 PM, Steven Vancoillie <
steven.vancoillie at gmail.com> wrote:

> Hi Devendar,
>
> I installed + ran these osu micro benchmarks (thanks for pointing me
> there), and they showed the same problem.
> However, I think you just solved it, as setting MV2_NUM_HCAS=2 works
> fine. Even though I didn't know what I was doing, I thought I'ld just
> try a number different from the default. As one has to set this
> parameter manually, I guess it's not trivial to detect this? If so,
> how could one find out what is the correct number?
>
> Anyway, thanks a lot for your help!
>
> greetings,
> Steven
>
> On Mon, Jul 8, 2013 at 9:37 PM, Devendar Bureddy
> <bureddy at cse.ohio-state.edu> wrote:
> > Hi Steven
> >
> > The ch3:mrail is the default configuration. You can see more details
> > here:
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9.html#x1-110004.4
> > .
> >
> > Please do not confuse with "mrail" configuration with multi-rail network
> > support. The use of multiple IB network lanes(HCA) is controlled with
> > MV2_NUM_HCAS run-time parameter.
> >
> > The wrong usage of HCAs could show similar errors. Are basic
> osu-benchmarks
> > running correctly in your setup?
> >
> > -Devendar
> >
> >
> > On Mon, Jul 8, 2013 at 12:21 PM, Steven Vancoillie
> > <steven.vancoillie at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> when running an application (Global Arrays test suite) on top of
> >> MVAPICH2 (via ARMCI-MPI), I get the following error:
> >>
> >> [0->3] send desc error, wc_opcode=0
> >> [0->3] wc.status=12, wc.wr_id=0x6cf5c8, wc.opcode=0,
> vbuf->phead->type=54
> >> = MPIDI_CH3_PKT_CLOSE
> >> [r5i1n6:mpi_rank_0][MPIDI_CH3I_MRAILI_Cq_poll]
> >> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:586: [] Got
> >> completion with error 12, vendor code=0x81, dest rank=3
> >>
> >> I would be grateful if someone can help me resolve this issue.
> >>
> >> The is the output of mpiexec --version:
> >>
> >> HYDRA build details:
> >>     Version:                                 3.0.3
> >>     Release Date:                            unreleased development copy
> >>     CC:                              icc
> >>     CXX:                             icpc
> >>     F77:                             ifort
> >>     F90:                             ifort
> >>     Configure options:
> >> '--disable-option-checking' '--prefix=/apps/leuven/mvapich2/1.9_intel'
> >> 'CC=icc' 'CXX=icpc' 'F77=ifort' '--enable-f77' '--enable-fc'
> >> '--enable-cxx' '--enable-romio' '--enable-debuginfo' '--enable-mpe'
> >> '--enable-shared' '--without-ftb' '--with-mpe' '--disable-ckpt'
> >> '--disable-mcast' '--disable-checkerrors' '--enable-embedded-mode'
> >> '--cache-file=/dev/null' '--srcdir=.' 'CFLAGS= -DNDEBUG -DNVALGRIND
> >> -O2' 'LDFLAGS=-L/lib -L/lib -Wl,-rpath,/lib -L/lib' 'LIBS=-libumad
> >> -libverbs -lrt -lnuma -lpthread ' 'CPPFLAGS=
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpl/include
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpl/include
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/openpa/src
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/openpa/src
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpi/romio/include
> >> -I/include -I/include'
> >>     Process Manager:                         pmi
> >>     Launchers available:                     ssh rsh fork slurm ll lsf
> >> sge manual persist
> >>     Topology libraries available:            hwloc
> >>     Resource management kernels available:   user slurm ll lsf sge pbs
> >> cobalt
> >>     Checkpointing libraries available:
> >>     Demux engines available:                 poll select
> >>
> >> Furthermore, someone suggested to me to build MVAPICH2 without mrail
> >> support: "you should probably use the default build of mvapich instead
> >> of the mrail build, unless you want to use multiple network lanes
> >> simultaneously". Unfortunately, I can't really understand from the
> >> user guide how I should do this, or even what multiple rail means. Is
> >> there some online documentation that addresses this or how I should
> >> build it?
> >>
> >> kind regards,
> >> Steven
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
> >
> > --
> > Devendar
>



-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130708/17bff865/attachment.html


More information about the mvapich-discuss mailing list