[mvapich-discuss] MVAPICH2 "cannot create cq" error
Marc Noguera
marc at klingon.uab.es
Tue Oct 30 08:04:23 EDT 2007
Dear list,
I am trying to use mvapich2 on our cluster. I am making some tests on
two dual opteron nodes running fedora core 6, using mvapich2 and
portland compilers.
I have successfully compiled mvapich2 using these compilers, at least I
think so. I have used make.mvapich.ofa script as I have OFED 1.2.5
software stack installed on infiniband hardware.
Environment at mvapich2 compile time was:
CC=pgcc
CXX=pgCC
F77=pgf77
F90=pgf90
OPEN_IB_HOME=/usr/local/ofed
PREFIX=~/mvapich2
RDMA_CM_SUPPORT="no"
After that, I have compiled the pi3f90.f test program (mpif90 pi3f90)
and I am trying to execute the a.out binary using mpdboot and mpiexec.
I have done as said userguide, and have the .mpd.conf file (wiht 600
permissions) in $HOME. I have also created a mpd.hosts in my workdir,
with these two lines containing:
10.10.1.170 ifhn=10.10.1.170
10.10.1.171 ifhn=10.10.1.171
Moreover, I have modified /etc/security/limits.conf and /etc/init.d/sshd
to ensure unlimited mem_lock values, also as mentioned by the userguide.
That is, "ulimit -l" command gives a "unlimited" output on both test
machines.
Finally when trying to run the a.out test application, I obtain:
borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdboot -n 2
--ifhn=10.10.1.170
borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdtrace -l
borg70.uab.es_43715 (10.10.1.170)
borg71.uab.es_37091 (10.10.1.171)
borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpiexec -n 2 ./a.out
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(259)....: Initialization failed
MPID_Init(102)...........: channel initialization failed
MPIDI_CH3_Init(178)......:
MPIDI_CH3I_RMDA_init(203): Failed to Initialize HCA type
rdma_iba_hca_init(639)...: cannot create cq
rank 1 in job 1 borg70.uab.es_43715 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
borg70.uab.es:/users/sysuser/test/T3>
In the troubleshooting section of the userguid I find that "cannot
create cq" are possibly due to mem_lock limits, but I think I have fixed
these, or at least I think so.
I am really stuck at this point.
Can you give me any hint on what am I doing wrong?
Thanks in advance
Marc
More information about the mvapich-discuss
mailing list