[mvapich-discuss] error closing socket at end of mpirun_rsh
Mark Potts
potts at hpcapplications.com
Thu Oct 11 20:39:05 EDT 2007
Jonathan,
Thanks for the quick repsonse.
The error message is from a MVAPICH build with the patches.
Without extensive testing it appears that the nature of the
application is not an issue. The error message can emit from
even very simple hello world type programs w/o any message
exchanges.
Tests at the time each patch was added did not show this error.
It is possible that it was infrequent enough that
we did not catch it after a patch addition. At about the
same time we've done OS upgrades which now cloud the issue
as to what induced the error. I'll be trying to find a
system with the previous OS to determine if the problem
emerged before or after the OS upgrade.
regards,
Jonathan L. Perkins wrote:
> Mark Potts wrote:
>> Hi,
>> Can you explain the functioning of the wait_for_errors() function
>> in .../mpid/ch_gen2/processes/mpirun_rsh.c in MVAPICH 0.9.9 and
>> what might be happening to cause even small (2 process jobs) to
>> frequently fail with the message
>> "Termination socket read failed: Bad file descriptor" .
>> I'm not clear what the socket s/s1 does and therefore how we
>> could be getting the above error message upon reading either
>> "flag" or "local_id" in this code.
>>
>> This error occurs frequently but not for every job and is
>> emitted following full, proper termination of the processes on
>> the client nodes. We are using MVAPICH 0.9.9 ch_gen2.
>>
>> Thanks.
>> regards,
>
> This function provides information about which host an abort originated
> from. You shouldn't get this error unless one of the clients (MPI
> processes) tried to open up a connection to tell mpirun_rsh about an abort.
>
> We haven't seen this issue during internal testing. Is there a
> particular base case program that you could send us that should
> reproduce the problem?
>
> Also, when did you first start experiencing this problem. Was it after
> applying one of mpirun_rsh patches that we sent you?
>
--
***********************************
>> Mark J. Potts, PhD
>>
>> HPC Applications Inc.
>> phone: 410-992-8360 Bus
>> 410-313-9318 Home
>> 443-418-4375 Cell
>> email: potts at hpcapplications.com
>> potts at excray.com
***********************************
More information about the mvapich-discuss
mailing list