[mvapich-discuss] Install error for mvapich2-0.9.5

Abhinav Vishnu vishnu at cse.ohio-state.edu
Mon Sep 18 10:35:02 EDT 2006


Hi Chanquing,

Thanks for trying mvapich-0.9.8 multirail and reporting the problem
to us.

>
> HI, Here is another question about mvapich-0.9.8 multi-rail on OpenIB.
>
> I have system with two PCI-Express SDR cards, firmware 4.7.0, for the
> first card, port 1 is active, port 2 is down, for the second card, port
> 1 is down, port 2 is active.

Currently, we do not handle this case in the multirail device. The device
assumes that the ports are connected in a homogenous fashion across all
participating nodes. We are working on a solution for this problem.

>
> I installed OpenFabric 1.0, and installed mvapich-0.9.8 with gen2
> multi-rail support.
>
> $ cat hostfile
> xcg13
> xcg14
> $
> $ export MPI_ROOT=/scratch/ctang/mvapich-0.9.8
> $ $MPI_ROOT/bin/mpicc -g -o pp.x ping_pong.c
> $ $MPI_ROOT/bin/mpirun_rsh \
> -np 2 -hostfile hostfile \
> NUM_HCAS=2 \
> NUM_PORTS=1 \
> NUM_QP_PER_PORT=1 \
> LD_LIBRARY_PATH=/scratch/ctang/mvapich-0.9.8/lib/shared \
> ./pp.x 1048576
> ping-pong 1048576 bytes ...
> [0] Abort: [xcg13:0] Got completion with error code 12
>  at line 1277 in file viacheck.c
> Timeout alarm signaled
> Cleaning up all processes ...done.
> $
>
> What's wrong with me, and can you tell me how to solve this problem ?
> Thanks.

The problem is that for the 2nd card, the first port is not in ACTIVE
state. This problem should be gone by connecting the first port of the
2nd card to the switch.

Please let us know if the problem persists.

Thanks,

-- Abhinav

>
> --CQ Tang
>
>
>
> >-----Original Message-----
> >From: Sayantan Sur [mailto:surs at cse.ohio-state.edu]
> >Sent: Saturday, September 09, 2006 6:13 PM
> >To: Tang, Changqing
> >Cc: mvapich-discuss at cse.ohio-state.edu
> >Subject: Re: [mvapich-discuss] Install error for mvapich2-0.9.5
> >
> >Hello,
> >
> >* On Sep,1 Tang, Changqing<changquing.tang at hp.com> wrote :
> >>
> >> HI, I am a HP engineer trying to use your mvapich-0.9.5. I
> >compile the
> >> code on a Opteron system with two PCI-X IB cards, But got the
> >> following error:
> >>
> >> 29:2: #error SRQ is not suppported for Mellanox PCI-X cards
> >
> >Thanks for trying out our latest MVAPICH2-0.9.5 release.
> >
> >Based on earlier feedback from Christian about the scripts
> >having this error on PCI-X platforms, we have updated our
> >build scripts. You may update (or obtain) an SVN copy either
> >from the MVAPICH2 trunk (main
> >development) or the 0.9.5 branch (all major bug fixes to 0.9.5
> >release).
> >You may visit the MVAPICH2 download page for more information
> >on trunk and branches `policies'.
> >
> >>
> >> Based on what information you make this statement ? We can install
> >> both VAPI and OpenIB with SRQ support on the system
> >
> >We made this statement based on the degraded performance of
> >the SRQ feature on the PCI-X HCAs. You can follow an existing
> >thread discussing the performance issues.
> >
> >http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-S
> eptember/000339.html
> >
> >For the time being Send/Receive mode offers better performance
> >on PCI-X HCAs. Note that for PCI-Express HCAs, by default the
> >SRQ support is used.
> >
> >Thanks,
> >Sayantan.
> >
> >--
> >http://www.cse.ohio-state.edu/~surs
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at mail.cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list