[mvapich-discuss] MPI INIT error

Devendar Bureddy bureddy at cse.ohio-state.edu
Sat Apr 6 10:18:03 EDT 2013


Hi Hoot

Can you configure MVAPICH2 with the additional flags:  "--enable-fast=none
--enable-fast=dbg" to see if it shows better error info than "Other MPI
error"?

Can you aslo give it a try with mpirun_rsh?

syntax:    ./mpirun_rsh -n 2  rh64-1-ib rh64-3-ib ./osu_bw

-Devendar


On Sat, Apr 6, 2013 at 10:00 AM, Hoot Thompson <hoot at ptpnow.com> wrote:

> I've been down this path before and I believe I've taken care of my usual
> oversights. Here's the background, it's a RHEL6.4 setup using the distro IB
> modules (not an OFED download). I'm trying to run the micro benchmarks and
> I'm getting (debug output attached) ....
>
> ==============================**==============================**
> =========================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 256
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ==============================**==============================**
> =========================
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_maxes
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_appnum
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get
> kvsname=kvs_4129_0 key=PMI_process_mapping
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [cli_1]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error
>
>
>
>
> ==============================**==============================**
> =========================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 256
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ==============================**==============================**
> =========================
>
>
> Here's the output of ulimit on both ends (configured in limits.conf)
> [jhthomps at rh64-1-ib ~]$  ulimit -l
> unlimited
> [root at rh64-3-ib jhthomps]# ulimit -l
> unlimited
>
> Firewalls are down and I think the /etc/hosts files are right.
>
> Suggestions?
>
> Thanks,
>
> Hoot
>
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>


-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130406/12adf2d2/attachment-0001.html


More information about the mvapich-discuss mailing list