[mvapich-discuss] MVAPICH2 "cannot create cq" error

Matthew Koop koop at cse.ohio-state.edu
Tue Oct 30 08:10:02 EDT 2007


Marc,

Did you perhaps update the lockable memory settings after starting the MPD
ring? If so, try exiting the ring using mpdallexit and then booting it
again with mpdboot so that mpd gets the new ulimit settings.

Also, have you tried the ibv_rc_pingpong test that comes with the OFED
distribution? It will allow you to verify that your IB installation is
correct.

Let us know if restarting the ring helps at all.

Matt


On Tue, 30 Oct 2007, Marc Noguera wrote:

> Dear list,
> I am trying to use mvapich2 on our cluster. I am making some tests on
> two dual opteron nodes running fedora core 6, using mvapich2 and
> portland compilers.
> I have successfully compiled mvapich2 using these compilers, at least I
> think so. I have used make.mvapich.ofa script as I have OFED 1.2.5
> software stack installed on infiniband hardware.
> Environment at mvapich2 compile time was:
> CC=pgcc
> CXX=pgCC
> F77=pgf77
> F90=pgf90
> OPEN_IB_HOME=/usr/local/ofed
> PREFIX=~/mvapich2
> RDMA_CM_SUPPORT="no"
>
> After that, I have compiled the pi3f90.f test program (mpif90 pi3f90)
> and I am trying to execute the a.out binary using mpdboot and mpiexec.
>
> I have done as said userguide, and have the .mpd.conf  file (wiht 600
> permissions) in $HOME. I have also created a mpd.hosts in my workdir,
> with these two lines containing:
>
> 10.10.1.170 ifhn=10.10.1.170
> 10.10.1.171 ifhn=10.10.1.171
>
> Moreover, I have modified /etc/security/limits.conf and /etc/init.d/sshd
> to ensure unlimited mem_lock values, also as mentioned by the userguide.
> That is, "ulimit -l" command gives a "unlimited" output on both test
> machines.
> Finally when trying to run the a.out test application, I obtain:
>
> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdboot -n 2
> --ifhn=10.10.1.170
> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdtrace -l
> borg70.uab.es_43715 (10.10.1.170)
> borg71.uab.es_37091 (10.10.1.171)
> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpiexec -n 2 ./a.out
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(259)....: Initialization failed
> MPID_Init(102)...........: channel initialization failed
> MPIDI_CH3_Init(178)......:
> MPIDI_CH3I_RMDA_init(203): Failed to Initialize HCA type
> rdma_iba_hca_init(639)...: cannot create cq
> rank 1 in job 1  borg70.uab.es_43715   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9
> borg70.uab.es:/users/sysuser/test/T3>
>
>
>
> In the troubleshooting section of the userguid I find that "cannot
> create cq" are possibly due to mem_lock limits, but I think I have fixed
> these, or at least I think so.
> I am really stuck at this point.
> Can you give me any hint on what am I doing wrong?
>
> Thanks in advance
> Marc
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list