[mvapich-discuss] MVAPICH2 "cannot create cq" error

Marc Noguera marc at klingon.uab.es
Tue Oct 30 08:04:23 EDT 2007


Dear list,
I am trying to use mvapich2 on our cluster. I am making some tests on 
two dual opteron nodes running fedora core 6, using mvapich2 and 
portland compilers.
I have successfully compiled mvapich2 using these compilers, at least I 
think so. I have used make.mvapich.ofa script as I have OFED 1.2.5 
software stack installed on infiniband hardware.
Environment at mvapich2 compile time was:
CC=pgcc
CXX=pgCC
F77=pgf77
F90=pgf90
OPEN_IB_HOME=/usr/local/ofed
PREFIX=~/mvapich2
RDMA_CM_SUPPORT="no"

After that, I have compiled the pi3f90.f test program (mpif90 pi3f90) 
and I am trying to execute the a.out binary using mpdboot and mpiexec.

I have done as said userguide, and have the .mpd.conf  file (wiht 600 
permissions) in $HOME. I have also created a mpd.hosts in my workdir, 
with these two lines containing:

10.10.1.170 ifhn=10.10.1.170
10.10.1.171 ifhn=10.10.1.171

Moreover, I have modified /etc/security/limits.conf and /etc/init.d/sshd 
to ensure unlimited mem_lock values, also as mentioned by the userguide. 
That is, "ulimit -l" command gives a "unlimited" output on both test 
machines.
Finally when trying to run the a.out test application, I obtain:

borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdboot -n 2 
--ifhn=10.10.1.170
borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdtrace -l
borg70.uab.es_43715 (10.10.1.170)
borg71.uab.es_37091 (10.10.1.171)
borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpiexec -n 2 ./a.out
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(259)....: Initialization failed
MPID_Init(102)...........: channel initialization failed
MPIDI_CH3_Init(178)......:
MPIDI_CH3I_RMDA_init(203): Failed to Initialize HCA type
rdma_iba_hca_init(639)...: cannot create cq
rank 1 in job 1  borg70.uab.es_43715   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9
borg70.uab.es:/users/sysuser/test/T3>



In the troubleshooting section of the userguid I find that "cannot 
create cq" are possibly due to mem_lock limits, but I think I have fixed 
these, or at least I think so.
I am really stuck at this point.
Can you give me any hint on what am I doing wrong?

Thanks in advance
Marc





More information about the mvapich-discuss mailing list