[mvapich-discuss] MPI INIT error
Hoot Thompson
hoot at ptpnow.com
Sat Apr 6 11:06:54 EDT 2013
Here's the mpirun_rsh...
[jhthomps at rh64-1-ib ~]$
/usr/local/other/utilities/mvapich2/bin/mpirun_rsh -n 2 rh64-1-ib
rh64-3-ib
/usr/local/other/utilities/mvapich2/libexec/osu-micro-benchmarks/osu_bw
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error
[rh64-1-ib:mpispawn_0][child_handler] MPI process (rank: 0, pid: 7781)
exited with status 1
[rh64-1-ib:mpispawn_0][readline] Unexpected End-Of-File on file
descriptor 8. MPI process died?
[rh64-1-ib:mpispawn_0][mtpmi_processops] Error while reading PMI socket.
MPI process died?
[cli_1]: aborting job:
Fatal error in MPI_Init:
Other MPI error
[rh64-3-ib:mpispawn_1][readline] Unexpected End-Of-File on file
descriptor 7. MPI process died?
[rh64-3-ib:mpispawn_1][mtpmi_processops] Error while reading PMI socket.
MPI process died?
[rh64-3-ib:mpispawn_1][child_handler] MPI process (rank: 0, pid: 7410)
exited with status 1
On 04/06/2013 10:18 AM, Devendar Bureddy wrote:
> Hi Hoot
>
> Can you configure MVAPICH2 with the additional flags:
> "--enable-fast=none --enable-fast=dbg" to see if it shows better
> error info than "Other MPI error"?
>
> Can you aslo give it a try with mpirun_rsh?
>
> syntax: ./mpirun_rsh -n 2 rh64-1-ib rh64-3-ib ./osu_bw
>
> -Devendar
>
>
> On Sat, Apr 6, 2013 at 10:00 AM, Hoot Thompson <hoot at ptpnow.com
> <mailto:hoot at ptpnow.com>> wrote:
>
> I've been down this path before and I believe I've taken care of
> my usual oversights. Here's the background, it's a RHEL6.4 setup
> using the distro IB modules (not an OFED download). I'm trying to
> run the micro benchmarks and I'm getting (debug output attached) ....
>
> =====================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 256
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=response_to_init
> pmi_version=1 pmi_subversion=1 rc=0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_maxes
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_appnum
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=my_kvsname kvsname=kvs_4129_0
> [proxy:0:1 at rh64-3-ib] got pmi command (from 4): get
> kvsname=kvs_4129_0 key=PMI_process_mapping
> [proxy:0:1 at rh64-3-ib] PMI response: cmd=get_result rc=0
> msg=success value=(vector,(0,2,1))
> [cli_1]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error
>
>
>
>
> =====================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 256
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
>
>
> Here's the output of ulimit on both ends (configured in limits.conf)
> [jhthomps at rh64-1-ib ~]$ ulimit -l
> unlimited
> [root at rh64-3-ib jhthomps]# ulimit -l
> unlimited
>
> Firewalls are down and I think the /etc/hosts files are right.
>
> Suggestions?
>
> Thanks,
>
> Hoot
>
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> --
> Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130406/99fe1f6a/attachment.html
More information about the mvapich-discuss
mailing list