[mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8

Korambath, Prakashan ppk at ats.ucla.edu
Fri Mar 2 17:09:04 EST 2007


Hi Wei,

  It was getting the default value of 32.  Now that I added 'ulimit -l unlimited' into /etc/init.d/sshd itself, it is ok.  Thanks a lot for the help.

Prakashan


-----Original Message-----
From: wei huang [mailto:huanwei at cse.ohio-state.edu]
Sent: Fri 3/2/2007 1:54 PM
To: Korambath, Prakashan
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8
 
Hi Prakashan,

Thanks for using mvapich2.

This is pretty weird because the ulimit is typically the reason when you
see create cq failure. May I ask you to make sure that ulimit is unlimited
on both nodes? Also, it will be good if you verify using the following
commands (so that ulimit is actually ulimited when you run the program):

ssh n11 ulimit -l
ssh grid4 ulimit -l

Also, would you please verify on both machines that port is active.

Finally, if all them are fine, would you please make sure ib level
micro-benchmarks run successfully?

Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Fri, 2 Mar 2007, Korambath, Prakashan wrote:

> Hi,
>  I just setup two nodes connected through an IB cable running Fedora
> Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1.  ibstat and ibnodes
> outputs are below.  I ran make.mvapich2.gen2 file in order to create
> the mpi related files.  I am getting following error when I am running
> mpiexec.  Could you please tell me what I am doing wrong?  The
> configure is using --with-device=osu_ch3:mrail inside
> make.mvapich2.gen2 .  I don't know whether I have wrong device or
> something. Also ulimit -l shows unlimited.  Thanks for your help.
>
>
> Prakashan Korambath
> UCLA
>
> ------------------------------------------
>
>
>
> -bash-3.1$ mpd &
> [1] 13652
> -bash-3.1$ !mpdboot
> mpdboot -n 2 -f hostfile
> [1]+  Done                    mpd
> -bash-3.1$ mpicc -o bones bones.c
> -bash-3.1$ which mpicc
> ~/mvapich2/bin/mpicc
> -bash-3.1$ mpiexec -n 2 ./bones
> cannot create cq
> Failed to Initialize HCA type
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(230): Initialization failed
> MPID_Init(81)........: channel initialization failed
> (unknown)(): Other MPI errorrank 1 in job 1  grid4.ats.ucla.edu_33136   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9
> -bash-3.1$
> -bash-3.1$ mpdtrace
> grid4
> n11
>
>
>
> -----------------------
> [root at grid4 ~]# ibstat
> CA 'mthca0'
>         CA type: MT25204
>         Number of ports: 1
>         Firmware version: 1.0.800
>         Hardware version: a0
>         Node GUID: 0x00066a0098007a39
>         System image GUID: 0x00066a0098007a39
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 20
>                 Base lid: 1
>                 LMC: 0
>                 SM lid: 2
>                 Capability mask: 0x02510a6a
>                 Port GUID: 0x00066a00a0007a39
> [root at grid4 ~]# ibnodes
> Ca      : 0x00066a0098007a25 ports 1 "n11 HCA-1"
> Ca      : 0x00066a0098007a39 ports 1 "grid4 HCA-1"
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/664f2475/attachment-0001.html


More information about the mvapich-discuss mailing list