[mvapich-discuss] how mvapich2 works on mips

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Dec 23 08:49:17 EST 2011


Thanks for your report.  We will take a look at your issue and get
back to you shortly.

In the meantime for question 1, can you try modifying the hostfile for
mpirun_rsh?  With the current hostfile, it looks like you're actually
attempting to run internode and not SMP.  When you change the hostfile
to only contain inode1, how does it behave?

Have you also verified that the ib level tests (such as ib_send_lat &
ib_send_bw from OFED) are working properly?

On Fri, Dec 23, 2011 at 2:56 AM, Wang Xiyue <zerocain at gmail.com> wrote:
> Hi, all:
>   currently i met some problems when i use mvapich2 on mips machine, any
> sugesstion will be welcome.
>
>    the cpu is based on mips architecture. each board have 2 cpus, each cpu
> have 4 cores.
>    each cpu use 2 memory stick(ddr2, 2g for each), so each node has 4g
> memory space, and the board has 8g memory space in total.
>    and we use the kernel which we made some modifies(the version is 2.6.36),
> and the HCA cards which we use is mlx4.
>    we use OFED-1.5.3-rc2 for userspace, but since our OS is debian, so we
> can't just install OFED with its installation script. libibvers, librdmacm,
> libibumad, libmlx4 and libibcm are picked up from OFED-1.5.3-rc2, and
> libibcommon's version is 1.1.2, i got the source code with apt-get source.
>     oh, almost forgot, we use mvapich2-1.6.
>
>    now here's the question:
>    1. when we use SMP, the mpi works fine with mpiexec:
>        mpiexec -n 16 ./cpi
>        but when use mpirun_rsh, the cpi hangs in MPI_Init:
>        mpirun_rsh -np 16 -hostfile ./host ./cpi
>
>        cat ./host:
>        inode1
>        inode2
>
>        and i reduce the number of processes to 2, mpirun_rsh still hangs.
>
>    2. when we use NUMA, the mpi sometimes works fine, but mostly it doesn't
> work. almost one out of thirty will works fine.
>        there're 2 situation:
>        (1). hangs in MPI_Init with mpiexec;
>        (2). it says poll cq failed:
>                rdma_ring_based_allgather: Poll CQ failed
>
>        i compiled mpi with:
>        ./configure  --libdir=/usr    -with-rdma=gen2
>  -with-ib-libpath=/usr    --enable-g=dbg    --enable-fast=none
>  --disable-cxx       CFLAGS="-Wall -g"
>
>
>        any suggestion?
>        thanx
>
>                                                            Celia.Wang
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list