[mvapich-discuss] Problem with MVAPICH2 (1.9, r6338) on InfiniBand
Devendar Bureddy
bureddy at cse.ohio-state.edu
Mon Jul 8 17:50:45 EDT 2013
Hi Steven
Good to know that it worked fine with MV2_NUM_HCAS=2. The initial issue
could be related to your IB setup( #HCAs/node, type of HCA ..etc).
MVAPICH2 by default try to use all the configured HCAs and that might be
causing a issue in your setup. The parameter MV2_NUM_HCAS=2 is explicitly
requesting to initialize and use only 2 HCAs/node.
-Devendar
On Mon, Jul 8, 2013 at 5:29 PM, Steven Vancoillie <
steven.vancoillie at gmail.com> wrote:
> Hi Devendar,
>
> I installed + ran these osu micro benchmarks (thanks for pointing me
> there), and they showed the same problem.
> However, I think you just solved it, as setting MV2_NUM_HCAS=2 works
> fine. Even though I didn't know what I was doing, I thought I'ld just
> try a number different from the default. As one has to set this
> parameter manually, I guess it's not trivial to detect this? If so,
> how could one find out what is the correct number?
>
> Anyway, thanks a lot for your help!
>
> greetings,
> Steven
>
> On Mon, Jul 8, 2013 at 9:37 PM, Devendar Bureddy
> <bureddy at cse.ohio-state.edu> wrote:
> > Hi Steven
> >
> > The ch3:mrail is the default configuration. You can see more details
> > here:
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9.html#x1-110004.4
> > .
> >
> > Please do not confuse with "mrail" configuration with multi-rail network
> > support. The use of multiple IB network lanes(HCA) is controlled with
> > MV2_NUM_HCAS run-time parameter.
> >
> > The wrong usage of HCAs could show similar errors. Are basic
> osu-benchmarks
> > running correctly in your setup?
> >
> > -Devendar
> >
> >
> > On Mon, Jul 8, 2013 at 12:21 PM, Steven Vancoillie
> > <steven.vancoillie at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> when running an application (Global Arrays test suite) on top of
> >> MVAPICH2 (via ARMCI-MPI), I get the following error:
> >>
> >> [0->3] send desc error, wc_opcode=0
> >> [0->3] wc.status=12, wc.wr_id=0x6cf5c8, wc.opcode=0,
> vbuf->phead->type=54
> >> = MPIDI_CH3_PKT_CLOSE
> >> [r5i1n6:mpi_rank_0][MPIDI_CH3I_MRAILI_Cq_poll]
> >> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:586: [] Got
> >> completion with error 12, vendor code=0x81, dest rank=3
> >>
> >> I would be grateful if someone can help me resolve this issue.
> >>
> >> The is the output of mpiexec --version:
> >>
> >> HYDRA build details:
> >> Version: 3.0.3
> >> Release Date: unreleased development copy
> >> CC: icc
> >> CXX: icpc
> >> F77: ifort
> >> F90: ifort
> >> Configure options:
> >> '--disable-option-checking' '--prefix=/apps/leuven/mvapich2/1.9_intel'
> >> 'CC=icc' 'CXX=icpc' 'F77=ifort' '--enable-f77' '--enable-fc'
> >> '--enable-cxx' '--enable-romio' '--enable-debuginfo' '--enable-mpe'
> >> '--enable-shared' '--without-ftb' '--with-mpe' '--disable-ckpt'
> >> '--disable-mcast' '--disable-checkerrors' '--enable-embedded-mode'
> >> '--cache-file=/dev/null' '--srcdir=.' 'CFLAGS= -DNDEBUG -DNVALGRIND
> >> -O2' 'LDFLAGS=-L/lib -L/lib -Wl,-rpath,/lib -L/lib' 'LIBS=-libumad
> >> -libverbs -lrt -lnuma -lpthread ' 'CPPFLAGS=
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpl/include
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpl/include
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/openpa/src
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/openpa/src
> >>
> >>
> -I/data/leuven/source/mvapich2/1.9_intel/mvapich2-1.9-r6338/src/mpi/romio/include
> >> -I/include -I/include'
> >> Process Manager: pmi
> >> Launchers available: ssh rsh fork slurm ll lsf
> >> sge manual persist
> >> Topology libraries available: hwloc
> >> Resource management kernels available: user slurm ll lsf sge pbs
> >> cobalt
> >> Checkpointing libraries available:
> >> Demux engines available: poll select
> >>
> >> Furthermore, someone suggested to me to build MVAPICH2 without mrail
> >> support: "you should probably use the default build of mvapich instead
> >> of the mrail build, unless you want to use multiple network lanes
> >> simultaneously". Unfortunately, I can't really understand from the
> >> user guide how I should do this, or even what multiple rail means. Is
> >> there some online documentation that addresses this or how I should
> >> build it?
> >>
> >> kind regards,
> >> Steven
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
> >
> > --
> > Devendar
>
--
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130708/17bff865/attachment.html
More information about the mvapich-discuss
mailing list