[mvapich-discuss] error when extending jobs on 37 nodes.

teng ma xiaok1981 at gmail.com
Fri Nov 4 17:34:36 EDT 2011


I ran out of my machine time.  Maybe it's related with memory usage.  I
will retry when I get a fresh machine,

Thanks
Teng
On Fri, Nov 4, 2011 at 2:58 PM, Krishna Kandalla <
kandalla at cse.ohio-state.edu> wrote:

> Hi Teng,
>            Thanks for reporting this issue. We used the same exact same
> config and run-time options and tried running IMB on our systems (almost 1K
> cores). Things appear to run smoothly across several runs. Could you please
> make sure that your nodes are healthy and  there are no other hardware
> related issues? Could you also make sure that you have the corresponding
> LIMIC2 modules installed correctly on all nodes?
>
> Thanks,
> Krishna
>
>
>
> On Wed, Nov 2, 2011 at 10:23 PM, teng ma <xiaok1981 at gmail.com> wrote:
>
>> I used mvapich 1.7
>> configure as
>>
>> $ ./configure --prefix /home/tma/opt/mvapich217-limic2 --with-limic2
>> LDFLAGS=-Wl,-rpath=/usr/local/lib
>>
>> limic 2  0.5.5
>>
>> 20 g ib.  24 cores/ node.   IMB test
>>
>> I bound process onto each core by this command
>>
>> mpirun_rsh -np 888 -hostfile ~/rankfile MV2_CPU_BINDING_POLICY=bunch
>> ./IMB-MPI1 Bcast -npmin 888
>>
>> It reports following errors:
>>
>> There is no problems when tests spawned on nodes less than 35. But bigger
>> than 35 nodes, sometimes it's working, sometimes it reports following
>> error. It keep reporting errors when reaching 37 nodes(888 procs).
>>
>> Thanks for help
>> Teng
>>
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_388][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_392][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_403][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_386][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_385][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_397][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_396][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_407][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_404][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_393][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_395][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_391][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_394][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_401][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_399][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_390][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpi_rank_389][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][readline] Unexpected
>> End-Of-File on file descriptor 8. MPI process died?
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][child_handler] MPI process
>> (rank: 392, pid: 16153) terminated with signal 7 -> abort job
>> [parapluie-2.rennes.grid5000.fr:mpirun_rsh][process_mpispawn_connection]
>> mpispawn_16 from node parapluie-26.rennes.grid5000.fr aborted: MPI
>> process error (1)
>> [parapluie-34.rennes.grid5000.fr:mpi_rank_723][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-34.rennes.grid5000.fr:mpi_rank_725][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][child_handler] MPI process
>> (rank: 725, pid: 15984) terminated with signal 7 -> abort job
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][readline] Unexpected
>> End-Of-File on file descriptor 7. MPI process died?
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][read_size] Unexpected
>> End-Of-File on file descriptor 31. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][child_handler] MPI process
>> (rank: 182, pid: 15057) terminated with signal 2 -> abort job
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][read_size] Unexpected
>> End-Of-File on file descriptor 33. MPI process died?
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][read_size] Unexpected
>> End-Of-File on file descriptor 34. MPI process died?
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-14.rennes.grid5000.fr:mpi_rank_265][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-14.rennes.grid5000.fr:mpi_rank_285][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-14.rennes.grid5000.fr:mpi_rank_282][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-14.rennes.grid5000.fr:mpi_rank_268][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-14.rennes.grid5000.fr:mpi_rank_281][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][readline] Unexpected
>> End-Of-File on file descriptor 21. MPI process died?
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][child_handler] MPI process
>> (rank: 282, pid: 16217) terminated with signal 7 -> abort job
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][child_handler] MPI process
>> (rank: 306, pid: 16126) terminated with signal 2 -> abort job
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][child_handler] MPI process
>> (rank: 258, pid: 16181) terminated with signal 2 -> abort job
>> [parapluie-12.rennes.grid5000.fr:mpispawn_9][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-12.rennes.grid5000.fr:mpispawn_9][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][child_handler] MPI process
>> (rank: 42, pid: 20462) terminated with signal 2 -> abort job
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][child_handler] MPI process
>> (rank: 776, pid: 14656) terminated with signal 2 -> abort job
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][read_size] Unexpected
>> End-Of-File on file descriptor 32. MPI process died?
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-22.rennes.grid5000.fr:mpispawn_19][readline] Unexpected
>> End-Of-File on file descriptor 19. MPI process died?
>> [parapluie-22.rennes.grid5000.fr:mpispawn_19][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][read_size] Unexpected
>> End-Of-File on file descriptor 31. MPI process died?
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][read_size] Unexpected
>> End-Of-File on file descriptor 32. MPI process died?
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][child_handler] MPI process
>> (rank: 52, pid: 14639) terminated with signal 2 -> abort job
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][child_handler] MPI process
>> (rank: 19, pid: 15701) terminated with signal 2 -> abort job
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][child_handler] MPI process
>> (rank: 154, pid: 14891) terminated with signal 2 -> abort job
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][child_handler] MPI process
>> (rank: 747, pid: 15915) terminated with signal 2 -> abort job
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][child_handler] MPI process
>> (rank: 104, pid: 14631) terminated with signal 2 -> abort job
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][child_handler] MPI process
>> (rank: 327, pid: 16129) terminated with signal 2 -> abort job
>> [parapluie-17.rennes.grid5000.fr:mpispawn_14][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-17.rennes.grid5000.fr:mpispawn_14][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-23.rennes.grid5000.fr:mpispawn_20][readline] Unexpected
>> End-Of-File on file descriptor 6. MPI process died?
>> [parapluie-23.rennes.grid5000.fr:mpispawn_20][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][child_handler] MPI process
>> (rank: 125, pid: 14833) terminated with signal 2 -> abort job
>> [parapluie-21.rennes.grid5000.fr:mpispawn_18][readline] Unexpected
>> End-Of-File on file descriptor 5. MPI process died?
>> [parapluie-21.rennes.grid5000.fr:mpispawn_18][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][child_handler] MPI process
>> (rank: 556, pid: 16057) terminated with signal 2 -> abort job
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][child_handler] MPI process
>> (rank: 203, pid: 15479) terminated with signal 2 -> abort job
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][child_handler] MPI process
>> (rank: 547, pid: 16123) terminated with signal 2 -> abort job
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][child_handler] MPI process
>> (rank: 707, pid: 16064) terminated with signal 2 -> abort job
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][child_handler] MPI process
>> (rank: 679, pid: 15969) terminated with signal 2 -> abort job
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][child_handler] MPI process
>> (rank: 602, pid: 16120) terminated with signal 2 -> abort job
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][child_handler] MPI process
>> (rank: 638, pid: 16126) terminated with signal 2 -> abort job
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][child_handler] MPI process
>> (rank: 836, pid: 14263) terminated with signal 2 -> abort job
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][child_handler] MPI process
>> (rank: 77, pid: 14608) terminated with signal 2 -> abort job
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][child_handler] MPI process
>> (rank: 591, pid: 16064) terminated with signal 2 -> abort job
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][child_handler] MPI process
>> (rank: 410, pid: 16147) terminated with signal 2 -> abort job
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][child_handler] MPI process
>> (rank: 651, pid: 16140) terminated with signal 2 -> abort job
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][child_handler] MPI process
>> (rank: 861, pid: 13947) terminated with signal 2 -> abort job
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][child_handler] MPI process
>> (rank: 524, pid: 16096) terminated with signal 2 -> abort job
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][child_handler] MPI process
>> (rank: 872, pid: 13689) terminated with signal 2 -> abort job
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][child_handler] MPI process
>> (rank: 380, pid: 16144) terminated with signal 2 -> abort job
>> [parapluie-37.rennes.grid5000.fr:mpispawn_33][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-37.rennes.grid5000.fr:mpispawn_33][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_308][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_289][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_305][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_294][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_307][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_311][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_300][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_301][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_292][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_303][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_309][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpi_rank_295][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][readline] Unexpected
>> End-Of-File on file descriptor 20. MPI process died?
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_697][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_698][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_702][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_717][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_715][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_707][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_713][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpi_rank_700][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][readline] Unexpected
>> End-Of-File on file descriptor 8. MPI process died?
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-15.rennes.grid5000.fr:mpispawn_12][child_handler] MPI process
>> (rank: 303, pid: 16225) terminated with signal 7 -> abort job
>> [parapluie-2.rennes.grid5000.fr:mpirun_rsh][process_mpispawn_connection]
>> mpispawn_12 from node parapluie-21.rennes.grid5000.fr aborted: MPI
>> process error (1)
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][read_size] Unexpected
>> End-Of-File on file descriptor 33. MPI process died?
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-4.rennes.grid5000.fr:mpispawn_2][child_handler] MPI process
>> (rank: 61, pid: 14747) terminated with signal 2 -> abort job
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-12.rennes.grid5000.fr:mpispawn_9][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-12.rennes.grid5000.fr:mpispawn_9][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][read_size] Unexpected
>> End-Of-File on file descriptor 31. MPI process died?
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_367][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_364][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_366][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_374][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_371][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_378][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_377][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_381][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_379][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_365][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_375][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_372][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_368][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_380][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_382][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpi_rank_361][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][readline] Unexpected
>> End-Of-File on file descriptor 6. MPI process died?
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-14.rennes.grid5000.fr:mpispawn_11][child_handler] MPI process
>> (rank: 281, pid: 16276) terminated with signal 2 -> abort job
>> [parapluie-13.rennes.grid5000.fr:mpispawn_10][child_handler] MPI process
>> (rank: 253, pid: 16278) terminated with signal 2 -> abort job
>> [parapluie-12.rennes.grid5000.fr:mpispawn_9][child_handler] MPI process
>> (rank: 218, pid: 15584) terminated with signal 2 -> abort job
>> [parapluie-1.rennes.grid5000.fr:mpispawn_0][child_handler] MPI process
>> (rank: 4, pid: 15788) terminated with signal 2 -> abort job
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-18.rennes.grid5000.fr:mpispawn_15][child_handler] MPI process
>> (rank: 378, pid: 16225) terminated with signal 7 -> abort job
>> [parapluie-33.rennes.grid5000.fr:mpispawn_29][child_handler] MPI process
>> (rank: 707, pid: 16163) terminated with signal 7 -> abort job
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][read_size] Unexpected
>> End-Of-File on file descriptor 31. MPI process died?
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-5.rennes.grid5000.fr:mpispawn_3][child_handler] MPI process
>> (rank: 73, pid: 14706) terminated with signal 2 -> abort job
>> [parapluie-17.rennes.grid5000.fr:mpispawn_14][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-17.rennes.grid5000.fr:mpispawn_14][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-2.rennes.grid5000.fr:mpispawn_1][child_handler] MPI process
>> (rank: 27, pid: 20617) terminated with signal 2 -> abort job
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-37.rennes.grid5000.fr:mpispawn_33][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-37.rennes.grid5000.fr:mpispawn_33][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-10.rennes.grid5000.fr:mpispawn_8][child_handler] MPI process
>> (rank: 205, pid: 15577) terminated with signal 2 -> abort job
>> [parapluie-23.rennes.grid5000.fr:mpispawn_20][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-23.rennes.grid5000.fr:mpispawn_20][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-6.rennes.grid5000.fr:mpispawn_4][child_handler] MPI process
>> (rank: 100, pid: 14729) terminated with signal 2 -> abort job
>> [parapluie-22.rennes.grid5000.fr:mpispawn_19][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-22.rennes.grid5000.fr:mpispawn_19][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-22.rennes.grid5000.fr:mpispawn_19][child_handler] MPI process
>> (rank: 456, pid: 16225) terminated with signal 2 -> abort job
>> [parapluie-17.rennes.grid5000.fr:mpispawn_14][child_handler] MPI process
>> (rank: 342, pid: 16240) terminated with signal 2 -> abort job
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-40.rennes.grid5000.fr:mpispawn_36][child_handler] MPI process
>> (rank: 873, pid: 13792) terminated with signal 2 -> abort job
>> [parapluie-21.rennes.grid5000.fr:mpispawn_18][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_435][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_433][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_451][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_434][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_445][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_436][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_443][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_446][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_440][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_454][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_441][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpi_rank_437][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-21.rennes.grid5000.fr:mpispawn_18][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-21.rennes.grid5000.fr:mpispawn_18][child_handler] MPI process
>> (rank: 441, pid: 16143) terminated with signal 7 -> abort job
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-7.rennes.grid5000.fr:mpispawn_5][child_handler] MPI process
>> (rank: 129, pid: 14939) terminated with signal 2 -> abort job
>> [parapluie-39.rennes.grid5000.fr:mpispawn_35][child_handler] MPI process
>> (rank: 853, pid: 14041) terminated with signal 2 -> abort job
>> [parapluie-34.rennes.grid5000.fr:mpispawn_30][child_handler] MPI process
>> (rank: 722, pid: 16032) terminated with signal 2 -> abort job
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_639][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_635][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_633][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_644][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_628][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpi_rank_647][error_sighandler] Caught
>> error: Bus error (signal 7)
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][readline] Unexpected
>> End-Of-File on file descriptor 7. MPI process died?
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][mtpmi_processops] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][read_size] Unexpected
>> End-Of-File on file descriptor 30. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-9.rennes.grid5000.fr:mpispawn_7][child_handler] MPI process
>> (rank: 177, pid: 15154) terminated with signal 2 -> abort job
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][handle_mt_peer] Error while
>> reading PMI socket. MPI process died?
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-8.rennes.grid5000.fr:mpispawn_6][child_handler] MPI process
>> (rank: 157, pid: 14996) terminated with signal 2 -> abort job
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-35.rennes.grid5000.fr:mpispawn_31][child_handler] MPI process
>> (rank: 751, pid: 16020) terminated with signal 2 -> abort job
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-19.rennes.grid5000.fr:mpispawn_16][child_handler] MPI process
>> (rank: 388, pid: 16245) terminated with signal 2 -> abort job
>> [parapluie-37.rennes.grid5000.fr:mpispawn_33][child_handler] MPI process
>> (rank: 806, pid: 14630) terminated with signal 2 -> abort job
>> [parapluie-30.rennes.grid5000.fr:mpispawn_26][child_handler] MPI process
>> (rank: 635, pid: 16225) terminated with signal 7 -> abort job
>> [parapluie-16.rennes.grid5000.fr:mpispawn_13][child_handler] MPI process
>> (rank: 316, pid: 16220) terminated with signal 2 -> abort job
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-36.rennes.grid5000.fr:mpispawn_32][child_handler] MPI process
>> (rank: 781, pid: 14760) terminated with signal 2 -> abort job
>> [parapluie-27.rennes.grid5000.fr:mpispawn_23][child_handler] MPI process
>> (rank: 560, pid: 16163) terminated with signal 2 -> abort job
>> [parapluie-25.rennes.grid5000.fr:mpispawn_21][child_handler] MPI process
>> (rank: 519, pid: 16193) terminated with signal 2 -> abort job
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-28.rennes.grid5000.fr:mpispawn_24][child_handler] MPI process
>> (rank: 578, pid: 16153) terminated with signal 2 -> abort job
>> [parapluie-29.rennes.grid5000.fr:mpispawn_25][child_handler] MPI process
>> (rank: 605, pid: 16225) terminated with signal 2 -> abort job
>> [parapluie-26.rennes.grid5000.fr:mpispawn_22][child_handler] MPI process
>> (rank: 531, pid: 16209) terminated with signal 2 -> abort job
>> [parapluie-31.rennes.grid5000.fr:mpispawn_27][child_handler] MPI process
>> (rank: 663, pid: 16254) terminated with signal 2 -> abort job
>> [parapluie-23.rennes.grid5000.fr:mpispawn_20][child_handler] MPI process
>> (rank: 482, pid: 16203) terminated with signal 2 -> abort job
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][read_size] Unexpected
>> End-Of-File on file descriptor 29. MPI process died?
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][handle_mt_peer] Error
>> while reading PMI socket. MPI process died?
>> [parapluie-32.rennes.grid5000.fr:mpispawn_28][child_handler] MPI process
>> (rank: 680, pid: 16072) terminated with signal 2 -> abort job
>> [parapluie-38.rennes.grid5000.fr:mpispawn_34][child_handler] MPI process
>> (rank: 837, pid: 14366) terminated with signal 2 -> abort job
>> [parapluie-20.rennes.grid5000.fr:mpispawn_17][child_handler] MPI process
>> (rank: 417, pid: 16256) terminated with signal 2 -> abort job
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20111104/d68c658c/attachment-0001.html


More information about the mvapich-discuss mailing list