[mvapich-discuss] MVAPICH2 "cannot create cq" error

Marc Noguera marc at klingon.uab.es
Tue Oct 30 09:26:50 EDT 2007


Hi,
thanks Matt for the suggestions.

I have restarted mpds after modifying ulimits and have checked 
ibv_rc_pingpong, with this output, which I think is correct:
[root at borg71 ~]# ibv_rc_pingpong 10.10.1.170
  local address:  LID 0x0001, QPN 0x180407, PSN 0x853d83
  remote address: LID 0x0005, QPN 0x130407, PSN 0x7f5527
8192000 bytes in 0.02 seconds = 3195.63 Mbit/sec
1000 iters in 0.02 seconds = 20.51 usec/iter


Any other idea
Thank you again
Marc
En/na Matthew Koop ha escrit:
> Marc,
>
> Did you perhaps update the lockable memory settings after starting the MPD
> ring? If so, try exiting the ring using mpdallexit and then booting it
> again with mpdboot so that mpd gets the new ulimit settings.
>
> Also, have you tried the ibv_rc_pingpong test that comes with the OFED
> distribution? It will allow you to verify that your IB installation is
> correct.
>
> Let us know if restarting the ring helps at all.
>
> Matt
>
>
> On Tue, 30 Oct 2007, Marc Noguera wrote:
>
>   
>> Dear list,
>> I am trying to use mvapich2 on our cluster. I am making some tests on
>> two dual opteron nodes running fedora core 6, using mvapich2 and
>> portland compilers.
>> I have successfully compiled mvapich2 using these compilers, at least I
>> think so. I have used make.mvapich.ofa script as I have OFED 1.2.5
>> software stack installed on infiniband hardware.
>> Environment at mvapich2 compile time was:
>> CC=pgcc
>> CXX=pgCC
>> F77=pgf77
>> F90=pgf90
>> OPEN_IB_HOME=/usr/local/ofed
>> PREFIX=~/mvapich2
>> RDMA_CM_SUPPORT="no"
>>
>> After that, I have compiled the pi3f90.f test program (mpif90 pi3f90)
>> and I am trying to execute the a.out binary using mpdboot and mpiexec.
>>
>> I have done as said userguide, and have the .mpd.conf  file (wiht 600
>> permissions) in $HOME. I have also created a mpd.hosts in my workdir,
>> with these two lines containing:
>>
>> 10.10.1.170 ifhn=10.10.1.170
>> 10.10.1.171 ifhn=10.10.1.171
>>
>> Moreover, I have modified /etc/security/limits.conf and /etc/init.d/sshd
>> to ensure unlimited mem_lock values, also as mentioned by the userguide.
>> That is, "ulimit -l" command gives a "unlimited" output on both test
>> machines.
>> Finally when trying to run the a.out test application, I obtain:
>>
>> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdboot -n 2
>> --ifhn=10.10.1.170
>> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpdtrace -l
>> borg70.uab.es_43715 (10.10.1.170)
>> borg71.uab.es_37091 (10.10.1.171)
>> borg70.uab.es:/users/sysuser/test/T3>~/mvapich2/bin/mpiexec -n 2 ./a.out
>> Fatal error in MPI_Init:
>> Other MPI error, error stack:
>> MPIR_Init_thread(259)....: Initialization failed
>> MPID_Init(102)...........: channel initialization failed
>> MPIDI_CH3_Init(178)......:
>> MPIDI_CH3I_RMDA_init(203): Failed to Initialize HCA type
>> rdma_iba_hca_init(639)...: cannot create cq
>> rank 1 in job 1  borg70.uab.es_43715   caused collective abort of all ranks
>>   exit status of rank 1: killed by signal 9
>> borg70.uab.es:/users/sysuser/test/T3>
>>
>>
>>
>> In the troubleshooting section of the userguid I find that "cannot
>> create cq" are possibly due to mem_lock limits, but I think I have fixed
>> these, or at least I think so.
>> I am really stuck at this point.
>> Can you give me any hint on what am I doing wrong?
>>
>> Thanks in advance
>> Marc
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>     
>
>
>
>   



More information about the mvapich-discuss mailing list