[mvapich-discuss] MPI INIT error

Hoot Thompson hoot at ptpnow.com
Sat Apr 6 11:18:24 EDT 2013


This help?

[jhthomps at rh64-1-ib ~]$ /usr/local/other/utilities/mvapich2/bin/mpirun 
-n 2 -hosts rh64-1-ib,rh64-3-ib 
/usr/local/other/utilities/mvapich2/libexec/osu-micro-benchmarks/osu_bw
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(408).......:
MPID_Init(308)..............: channel initialization failed
MPIDI_CH3_Init(283).........:
MPIDI_CH3I_RDMA_init(171)...:
rdma_setup_startup_ring(389): cannot open hca device


=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 256
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[cli_1]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(408).......:
MPID_Init(308)..............: channel initialization failed
MPIDI_CH3_Init(283).........:
MPIDI_CH3I_RDMA_init(171)...:
rdma_setup_startup_ring(389): cannot open hca device


=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 256
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================




On 04/06/2013 10:18 AM, Devendar Bureddy wrote:
> Hi Hoot
>
> Can you configure MVAPICH2 with the additional flags: 
>  "--enable-fast=none --enable-fast=dbg" to see if it shows better 
> error info than "Other MPI error"?
>
> Can you aslo give it a try with mpirun_rsh?
>
> syntax:    ./mpirun_rsh -n 2  rh64-1-ib rh64-3-ib ./osu_bw
>
> -Devendar
>
>
> On Sat, Apr 6, 2013 at 10:00 AM, Hoot Thompson <hoot at ptpnow.com 
> <mailto:hoot at ptpnow.com>> wrote:
>
>     I've been down this path before and I believe I've taken care of
>     my usual oversights. Here's the background, it's a RHEL6.4 setup
>     using the distro IB modules (not an OFED download). I'm trying to
>     run the micro benchmarks and I'm getting (debug output attached) ....
>
>     =====================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   EXIT CODE: 256
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     =====================================================================================
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): init
>     pmi_version=1 pmi_subversion=1
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=response_to_init
>     pmi_version=1 pmi_subversion=1 rc=0
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_maxes
>
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=maxes kvsname_max=256
>     keylen_max=64 vallen_max=1024
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_appnum
>
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=appnum appnum=0
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
>     [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get
>     kvsname=kvs_4129_0 key=PMI_process_mapping
>     [proxy:0:1 at rh64-3-ib] PMI response: cmd=get_result rc=0
>     msg=success value=(vector,(0,2,1))
>     [cli_1]: aborting job:
>     Fatal error in MPI_Init:
>     Other MPI error
>
>
>
>
>     =====================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   EXIT CODE: 256
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     =====================================================================================
>
>
>     Here's the output of ulimit on both ends (configured in limits.conf)
>     [jhthomps at rh64-1-ib ~]$  ulimit -l
>     unlimited
>     [root at rh64-3-ib jhthomps]# ulimit -l
>     unlimited
>
>     Firewalls are down and I think the /etc/hosts files are right.
>
>     Suggestions?
>
>     Thanks,
>
>     Hoot
>
>
>
>
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> -- 
> Devendar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130406/605539f6/attachment.html


More information about the mvapich-discuss mailing list