[mvapich-discuss] error closing socket at end of mpirun_rsh original posted Oct 11

Scott Shaw sshaw at sgi.com
Wed Jan 9 14:40:21 EST 2008


Hi,
On several clusters we are experiencing the same issues originally
posted on Oct 11, 2007 regarding "error closing socket at end of
mpirun_rsh" job. Running the mpi test with one core works,  no error is
generated but n+1 cores error is generated.

Is there a patch available which addresses the "Termination socket read
failed" error message?  I have tested three different clusters and each
cluster exhibits the same error.  I also check the "mvapich-discuss"
archives and still did not see a resolution. 

I am currently running mvapich v0.9.9 which is bundled with ofed v1.2.

r1i0n0 /store/sshaw> mpirun_rsh -np 1 -hostfile ./hfile ./mpi_test
Rank=0 present and calling MPI_Finalize
Rank=0 bailing, nicely

r1i0n0 /store/sshaw> mpirun_rsh -np 2 -hostfile ./hfile ./mpi_test
Rank=1 present and calling MPI_Finalize
Rank=0 present and calling MPI_Finalize
Rank=0 bailing, nicely
Termination socket read failed: Bad file descriptor
Rank=1 bailing, nicely

r1i0n0 /store/sshaw> mpirun_rsh -np 4 -hostfile ./hfile ./mpi_test
Rank=1 present and calling MPI_Finalize
Rank=3 present and calling MPI_Finalize
Rank=0 present and calling MPI_Finalize
Rank=2 present and calling MPI_Finalize
Rank=0 bailing, nicely
Termination socket read failed: Bad file descriptor
Rank=3 bailing, nicely
Rank=1 bailing, nicely
Rank=2 bailing, nicely

Thanks,
Scott




More information about the mvapich-discuss mailing list