[mvapich-discuss] job aborted after a few days run
    Vishwas 
    vvasisht at locuz.com
       
    Thu Nov 30 00:45:30 EST 2006
    
    
  
Hello,
 
I was running a farming job on my cluster. After few days of the run, job
got aborted abruptly. The following error generated in the log file.
 
[138] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=21
 at line 410 in file vapi_channel_manager.c
send desc error
[131] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=23
 at line 410 in file vapi_channel_manager.c
rank 138 in job 9  gulabjamun.ncbs.res.in_34137   caused collective abort of
all ranks
  exit status of rank 138: killed by signal 9 
rank 131 in job 9  gulabjamun.ncbs.res.in_34137   caused collective abort of
all ranks
  exit status of rank 131: killed by signal 9 
rank 86 in job 9  gulabjamun.ncbs.res.in_34137   caused collective abort of
all ranks
  exit status of rank 86: killed by signal 9 
~/ROBUST/nov15_2006_3x7 
~/ROBUST/nov15_2006_3x7 
send desc error
[76] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=20
 at line 410 in file vapi_channel_manager.c
send desc error
[61] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=21
 at line 410 in file vapi_channel_manager.c
rank 76 in job 9  gulabjamun.ncbs.res.in_34137   caused collective abort of
all ranks
  exit status of rank 76: killed by signal 9 
rank 61 in job 9  gulabjamun.ncbs.res.in_34137   caused collective abort of
all ranks
  exit status of rank 61: killed by signal 9 
~/ROBUST/nov15_2006_3x7 
send desc error
[52] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=23
 at line 410 in file vapi_channel_manager.c
send desc error
[39] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=23
 at line 410 in file vapi_channel_manager.c
send desc error
[27] Abort: [] Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor
code=81, dest rank=21
 at line 410 in file vapi_channel_manager.c
 
Regards
Vishwas
-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.430 / Virus Database: 268.14.19/555 - Release Date: 11/27/2006
6:09 PM
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20061130/bee6dd46/attachment-0001.html
    
    
More information about the mvapich-discuss
mailing list