[mvapich-discuss] error closing socket at end of mpirun_rsh
original posted Oct 11
Scott Shaw
sshaw at sgi.com
Wed Jan 9 14:40:21 EST 2008
Hi,
On several clusters we are experiencing the same issues originally
posted on Oct 11, 2007 regarding "error closing socket at end of
mpirun_rsh" job. Running the mpi test with one core works, no error is
generated but n+1 cores error is generated.
Is there a patch available which addresses the "Termination socket read
failed" error message? I have tested three different clusters and each
cluster exhibits the same error. I also check the "mvapich-discuss"
archives and still did not see a resolution.
I am currently running mvapich v0.9.9 which is bundled with ofed v1.2.
r1i0n0 /store/sshaw> mpirun_rsh -np 1 -hostfile ./hfile ./mpi_test
Rank=0 present and calling MPI_Finalize
Rank=0 bailing, nicely
r1i0n0 /store/sshaw> mpirun_rsh -np 2 -hostfile ./hfile ./mpi_test
Rank=1 present and calling MPI_Finalize
Rank=0 present and calling MPI_Finalize
Rank=0 bailing, nicely
Termination socket read failed: Bad file descriptor
Rank=1 bailing, nicely
r1i0n0 /store/sshaw> mpirun_rsh -np 4 -hostfile ./hfile ./mpi_test
Rank=1 present and calling MPI_Finalize
Rank=3 present and calling MPI_Finalize
Rank=0 present and calling MPI_Finalize
Rank=2 present and calling MPI_Finalize
Rank=0 bailing, nicely
Termination socket read failed: Bad file descriptor
Rank=3 bailing, nicely
Rank=1 bailing, nicely
Rank=2 bailing, nicely
Thanks,
Scott
More information about the mvapich-discuss
mailing list