[mvapich-discuss] Error in mpi across multiple nodes

lung Fermin ferminlung at gmail.com
Wed May 6 23:55:01 EDT 2015


Hi,

I am running a commercial program, which has mpi parallelization
implemented, on a cluster. Everything seems fine when the program is run on
a single node. However, if I distributed the mpi job across different
nodes, the following error occurred:

[z1-17:mpispawn_0][child_handler] MPI process (rank: 0, pid: 28291)
terminated with signal 9 -> abort job
[z1-17:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 9.
MPI process died?
[z1-17:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
[z1-17:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-17
aborted: Error while reading a PMI socket (4)
[z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21.
MPI process died?
[z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21.
MPI process died?
[z1-18:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI
process died?
cp: cannot stat `.in.tmp': No such file or directory

Some details:
>  compiler: mpif90 compiled with ifort, MVAPICH 2.0a
> Blacs and Scalapack libraries are used (-lmkl_blacs_intelmpi_lp64,
-lmkl_scalapack_lp64), mkl ver 11.1
> command for execution: mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_

What's the cause and how to deal with the above errors? Any suggestions and
help would be appreciated.

Thanks in advance,

Fermin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150507/356ac7ab/attachment.html>


More information about the mvapich-discuss mailing list