[mvapich-discuss] error closing socket at end of mpirun_rsh
Mark Potts
potts at hpcapplications.com
Wed Oct 10 16:42:30 EDT 2007
Hi,
Can you explain the functioning of the wait_for_errors() function
in .../mpid/ch_gen2/processes/mpirun_rsh.c in MVAPICH 0.9.9 and
what might be happening to cause even small (2 process jobs) to
frequently fail with the message
"Termination socket read failed: Bad file descriptor" .
I'm not clear what the socket s/s1 does and therefore how we
could be getting the above error message upon reading either
"flag" or "local_id" in this code.
This error occurs frequently but not for every job and is
emitted following full, proper termination of the processes on
the client nodes. We are using MVAPICH 0.9.9 ch_gen2.
Thanks.
regards,
--
***********************************
>> Mark J. Potts, PhD
>>
>> HPC Applications Inc.
>> phone: 410-992-8360 Bus
>> 410-313-9318 Home
>> 443-418-4375 Cell
>> email: potts at hpcapplications.com
>> potts at excray.com
***********************************
More information about the mvapich-discuss
mailing list