[mvapich-discuss] error when extending jobs on 37 nodes.

Krishna Kandalla kandalla at cse.ohio-state.edu
Fri Nov 4 14:58:00 EDT 2011


Hi Teng,
           Thanks for reporting this issue. We used the same exact same
config and run-time options and tried running IMB on our systems (almost 1K
cores). Things appear to run smoothly across several runs. Could you please
make sure that your nodes are healthy and  there are no other hardware
related issues? Could you also make sure that you have the corresponding
LIMIC2 modules installed correctly on all nodes?

Thanks,
Krishna



On Wed, Nov 2, 2011 at 10:23 PM, teng ma <xiaok1981 at gmail.com> wrote:

> I used mvapich 1.7
> configure as
>
> $ ./configure --prefix /home/tma/opt/mvapich217-limic2 --with-limic2
> LDFLAGS=-Wl,-rpath=/usr/local/lib
>
> limic 2  0.5.5
>
> 20 g ib.  24 cores/ node.   IMB test
>
> I bound process onto each core by this command
>
> mpirun_rsh -np 888 -hostfile ~/rankfile MV2_CPU_BINDING_POLICY=bunch
> ./IMB-MPI1 Bcast -npmin 888
>
> It reports following errors:
>
> There is no problems when tests spawned on nodes less than 35. But bigger
> than 35 nodes, sometimes it's working, sometimes it reports following
> error. It keep reporting errors when reaching 37 nodes(888 procs).
>
> Thanks for help
> Teng
>
> [parapluie-19.rennes.grid5000.fr:mpi_rank_388][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_392][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_403][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_386][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_385][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_397][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_396][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_407][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_404][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_393][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_395][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_391][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_394][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_401][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_399][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_390][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpi_rank_389][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][readline] Unexpected
> End-Of-File on file descriptor 8. MPI process died?
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][child_handler] MPI process
> (rank: 392, pid: 16153) terminated with signal 7 -> abort job
> [parapluie-2.rennes.grid5000.fr:mpirun_rsh][process_mpispawn_connection]
> mpispawn_16 from node parapluie-26.rennes.grid5000.fr aborted: MPI
> process error (1)
> [parapluie-34.rennes.grid5000.fr:mpi_rank_723][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-34.rennes.grid5000.fr:mpi_rank_725][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][child_handler] MPI process
> (rank: 725, pid: 15984) terminated with signal 7 -> abort job
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][readline] Unexpected
> End-Of-File on file descriptor 7. MPI process died?
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][read_size] Unexpected
> End-Of-File on file descriptor 31. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][child_handler] MPI process
> (rank: 182, pid: 15057) terminated with signal 2 -> abort job
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][read_size] Unexpected
> End-Of-File on file descriptor 33. MPI process died?
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][read_size] Unexpected
> End-Of-File on file descriptor 34. MPI process died?
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-14.rennes.grid5000.fr:mpi_rank_265][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-14.rennes.grid5000.fr:mpi_rank_285][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-14.rennes.grid5000.fr:mpi_rank_282][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-14.rennes.grid5000.fr:mpi_rank_268][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-14.rennes.grid5000.fr:mpi_rank_281][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][readline] Unexpected
> End-Of-File on file descriptor 21. MPI process died?
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][child_handler] MPI process
> (rank: 282, pid: 16217) terminated with signal 7 -> abort job
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][child_handler] MPI process
> (rank: 306, pid: 16126) terminated with signal 2 -> abort job
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][child_handler] MPI process
> (rank: 258, pid: 16181) terminated with signal 2 -> abort job
> [parapluie-12.rennes.grid5000.fr:mpispawn_9][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-12.rennes.grid5000.fr:mpispawn_9][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][child_handler] MPI process
> (rank: 42, pid: 20462) terminated with signal 2 -> abort job
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][child_handler] MPI process
> (rank: 776, pid: 14656) terminated with signal 2 -> abort job
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][read_size] Unexpected
> End-Of-File on file descriptor 32. MPI process died?
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-22.rennes.grid5000.fr:mpispawn_19][readline] Unexpected
> End-Of-File on file descriptor 19. MPI process died?
> [parapluie-22.rennes.grid5000.fr:mpispawn_19][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][read_size] Unexpected
> End-Of-File on file descriptor 31. MPI process died?
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][read_size] Unexpected
> End-Of-File on file descriptor 32. MPI process died?
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][child_handler] MPI process
> (rank: 52, pid: 14639) terminated with signal 2 -> abort job
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][child_handler] MPI process
> (rank: 19, pid: 15701) terminated with signal 2 -> abort job
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][child_handler] MPI process
> (rank: 154, pid: 14891) terminated with signal 2 -> abort job
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][child_handler] MPI process
> (rank: 747, pid: 15915) terminated with signal 2 -> abort job
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][child_handler] MPI process
> (rank: 104, pid: 14631) terminated with signal 2 -> abort job
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][child_handler] MPI process
> (rank: 327, pid: 16129) terminated with signal 2 -> abort job
> [parapluie-17.rennes.grid5000.fr:mpispawn_14][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-17.rennes.grid5000.fr:mpispawn_14][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-23.rennes.grid5000.fr:mpispawn_20][readline] Unexpected
> End-Of-File on file descriptor 6. MPI process died?
> [parapluie-23.rennes.grid5000.fr:mpispawn_20][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][child_handler] MPI process
> (rank: 125, pid: 14833) terminated with signal 2 -> abort job
> [parapluie-21.rennes.grid5000.fr:mpispawn_18][readline] Unexpected
> End-Of-File on file descriptor 5. MPI process died?
> [parapluie-21.rennes.grid5000.fr:mpispawn_18][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][child_handler] MPI process
> (rank: 556, pid: 16057) terminated with signal 2 -> abort job
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][child_handler] MPI process
> (rank: 203, pid: 15479) terminated with signal 2 -> abort job
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][child_handler] MPI process
> (rank: 547, pid: 16123) terminated with signal 2 -> abort job
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][child_handler] MPI process
> (rank: 707, pid: 16064) terminated with signal 2 -> abort job
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][child_handler] MPI process
> (rank: 679, pid: 15969) terminated with signal 2 -> abort job
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][child_handler] MPI process
> (rank: 602, pid: 16120) terminated with signal 2 -> abort job
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][child_handler] MPI process
> (rank: 638, pid: 16126) terminated with signal 2 -> abort job
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][child_handler] MPI process
> (rank: 836, pid: 14263) terminated with signal 2 -> abort job
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][child_handler] MPI process
> (rank: 77, pid: 14608) terminated with signal 2 -> abort job
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][child_handler] MPI process
> (rank: 591, pid: 16064) terminated with signal 2 -> abort job
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][child_handler] MPI process
> (rank: 410, pid: 16147) terminated with signal 2 -> abort job
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][child_handler] MPI process
> (rank: 651, pid: 16140) terminated with signal 2 -> abort job
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][child_handler] MPI process
> (rank: 861, pid: 13947) terminated with signal 2 -> abort job
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][child_handler] MPI process
> (rank: 524, pid: 16096) terminated with signal 2 -> abort job
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][child_handler] MPI process
> (rank: 872, pid: 13689) terminated with signal 2 -> abort job
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][child_handler] MPI process
> (rank: 380, pid: 16144) terminated with signal 2 -> abort job
> [parapluie-37.rennes.grid5000.fr:mpispawn_33][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-37.rennes.grid5000.fr:mpispawn_33][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-15.rennes.grid5000.fr:mpi_rank_308][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_289][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_305][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_294][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_307][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_311][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_300][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_301][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_292][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_303][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_309][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpi_rank_295][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][readline] Unexpected
> End-Of-File on file descriptor 20. MPI process died?
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-33.rennes.grid5000.fr:mpi_rank_697][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_698][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_702][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_717][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_715][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_707][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_713][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpi_rank_700][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][readline] Unexpected
> End-Of-File on file descriptor 8. MPI process died?
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-15.rennes.grid5000.fr:mpispawn_12][child_handler] MPI process
> (rank: 303, pid: 16225) terminated with signal 7 -> abort job
> [parapluie-2.rennes.grid5000.fr:mpirun_rsh][process_mpispawn_connection]
> mpispawn_12 from node parapluie-21.rennes.grid5000.fr aborted: MPI
> process error (1)
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][read_size] Unexpected
> End-Of-File on file descriptor 33. MPI process died?
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-4.rennes.grid5000.fr:mpispawn_2][child_handler] MPI process
> (rank: 61, pid: 14747) terminated with signal 2 -> abort job
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-12.rennes.grid5000.fr:mpispawn_9][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-12.rennes.grid5000.fr:mpispawn_9][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][read_size] Unexpected
> End-Of-File on file descriptor 31. MPI process died?
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-18.rennes.grid5000.fr:mpi_rank_367][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_364][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_366][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_374][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_371][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_378][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_377][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_381][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_379][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_365][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_375][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_372][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_368][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_380][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_382][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpi_rank_361][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][readline] Unexpected
> End-Of-File on file descriptor 6. MPI process died?
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-14.rennes.grid5000.fr:mpispawn_11][child_handler] MPI process
> (rank: 281, pid: 16276) terminated with signal 2 -> abort job
> [parapluie-13.rennes.grid5000.fr:mpispawn_10][child_handler] MPI process
> (rank: 253, pid: 16278) terminated with signal 2 -> abort job
> [parapluie-12.rennes.grid5000.fr:mpispawn_9][child_handler] MPI process
> (rank: 218, pid: 15584) terminated with signal 2 -> abort job
> [parapluie-1.rennes.grid5000.fr:mpispawn_0][child_handler] MPI process
> (rank: 4, pid: 15788) terminated with signal 2 -> abort job
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-18.rennes.grid5000.fr:mpispawn_15][child_handler] MPI process
> (rank: 378, pid: 16225) terminated with signal 7 -> abort job
> [parapluie-33.rennes.grid5000.fr:mpispawn_29][child_handler] MPI process
> (rank: 707, pid: 16163) terminated with signal 7 -> abort job
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][read_size] Unexpected
> End-Of-File on file descriptor 31. MPI process died?
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-5.rennes.grid5000.fr:mpispawn_3][child_handler] MPI process
> (rank: 73, pid: 14706) terminated with signal 2 -> abort job
> [parapluie-17.rennes.grid5000.fr:mpispawn_14][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-17.rennes.grid5000.fr:mpispawn_14][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-2.rennes.grid5000.fr:mpispawn_1][child_handler] MPI process
> (rank: 27, pid: 20617) terminated with signal 2 -> abort job
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-37.rennes.grid5000.fr:mpispawn_33][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-37.rennes.grid5000.fr:mpispawn_33][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-10.rennes.grid5000.fr:mpispawn_8][child_handler] MPI process
> (rank: 205, pid: 15577) terminated with signal 2 -> abort job
> [parapluie-23.rennes.grid5000.fr:mpispawn_20][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-23.rennes.grid5000.fr:mpispawn_20][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-6.rennes.grid5000.fr:mpispawn_4][child_handler] MPI process
> (rank: 100, pid: 14729) terminated with signal 2 -> abort job
> [parapluie-22.rennes.grid5000.fr:mpispawn_19][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-22.rennes.grid5000.fr:mpispawn_19][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-22.rennes.grid5000.fr:mpispawn_19][child_handler] MPI process
> (rank: 456, pid: 16225) terminated with signal 2 -> abort job
> [parapluie-17.rennes.grid5000.fr:mpispawn_14][child_handler] MPI process
> (rank: 342, pid: 16240) terminated with signal 2 -> abort job
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-40.rennes.grid5000.fr:mpispawn_36][child_handler] MPI process
> (rank: 873, pid: 13792) terminated with signal 2 -> abort job
> [parapluie-21.rennes.grid5000.fr:mpispawn_18][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-21.rennes.grid5000.fr:mpi_rank_435][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_433][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_451][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_434][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_445][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_436][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_443][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_446][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_440][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_454][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_441][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpi_rank_437][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-21.rennes.grid5000.fr:mpispawn_18][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-21.rennes.grid5000.fr:mpispawn_18][child_handler] MPI process
> (rank: 441, pid: 16143) terminated with signal 7 -> abort job
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-7.rennes.grid5000.fr:mpispawn_5][child_handler] MPI process
> (rank: 129, pid: 14939) terminated with signal 2 -> abort job
> [parapluie-39.rennes.grid5000.fr:mpispawn_35][child_handler] MPI process
> (rank: 853, pid: 14041) terminated with signal 2 -> abort job
> [parapluie-34.rennes.grid5000.fr:mpispawn_30][child_handler] MPI process
> (rank: 722, pid: 16032) terminated with signal 2 -> abort job
> [parapluie-30.rennes.grid5000.fr:mpi_rank_639][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpi_rank_635][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpi_rank_633][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpi_rank_644][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpi_rank_628][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpi_rank_647][error_sighandler] Caught
> error: Bus error (signal 7)
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][readline] Unexpected
> End-Of-File on file descriptor 7. MPI process died?
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][mtpmi_processops] Error
> while reading PMI socket. MPI process died?
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][read_size] Unexpected
> End-Of-File on file descriptor 30. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-9.rennes.grid5000.fr:mpispawn_7][child_handler] MPI process
> (rank: 177, pid: 15154) terminated with signal 2 -> abort job
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-8.rennes.grid5000.fr:mpispawn_6][child_handler] MPI process
> (rank: 157, pid: 14996) terminated with signal 2 -> abort job
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-35.rennes.grid5000.fr:mpispawn_31][child_handler] MPI process
> (rank: 751, pid: 16020) terminated with signal 2 -> abort job
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-19.rennes.grid5000.fr:mpispawn_16][child_handler] MPI process
> (rank: 388, pid: 16245) terminated with signal 2 -> abort job
> [parapluie-37.rennes.grid5000.fr:mpispawn_33][child_handler] MPI process
> (rank: 806, pid: 14630) terminated with signal 2 -> abort job
> [parapluie-30.rennes.grid5000.fr:mpispawn_26][child_handler] MPI process
> (rank: 635, pid: 16225) terminated with signal 7 -> abort job
> [parapluie-16.rennes.grid5000.fr:mpispawn_13][child_handler] MPI process
> (rank: 316, pid: 16220) terminated with signal 2 -> abort job
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-36.rennes.grid5000.fr:mpispawn_32][child_handler] MPI process
> (rank: 781, pid: 14760) terminated with signal 2 -> abort job
> [parapluie-27.rennes.grid5000.fr:mpispawn_23][child_handler] MPI process
> (rank: 560, pid: 16163) terminated with signal 2 -> abort job
> [parapluie-25.rennes.grid5000.fr:mpispawn_21][child_handler] MPI process
> (rank: 519, pid: 16193) terminated with signal 2 -> abort job
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-28.rennes.grid5000.fr:mpispawn_24][child_handler] MPI process
> (rank: 578, pid: 16153) terminated with signal 2 -> abort job
> [parapluie-29.rennes.grid5000.fr:mpispawn_25][child_handler] MPI process
> (rank: 605, pid: 16225) terminated with signal 2 -> abort job
> [parapluie-26.rennes.grid5000.fr:mpispawn_22][child_handler] MPI process
> (rank: 531, pid: 16209) terminated with signal 2 -> abort job
> [parapluie-31.rennes.grid5000.fr:mpispawn_27][child_handler] MPI process
> (rank: 663, pid: 16254) terminated with signal 2 -> abort job
> [parapluie-23.rennes.grid5000.fr:mpispawn_20][child_handler] MPI process
> (rank: 482, pid: 16203) terminated with signal 2 -> abort job
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][read_size] Unexpected
> End-Of-File on file descriptor 29. MPI process died?
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][handle_mt_peer] Error while
> reading PMI socket. MPI process died?
> [parapluie-32.rennes.grid5000.fr:mpispawn_28][child_handler] MPI process
> (rank: 680, pid: 16072) terminated with signal 2 -> abort job
> [parapluie-38.rennes.grid5000.fr:mpispawn_34][child_handler] MPI process
> (rank: 837, pid: 14366) terminated with signal 2 -> abort job
> [parapluie-20.rennes.grid5000.fr:mpispawn_17][child_handler] MPI process
> (rank: 417, pid: 16256) terminated with signal 2 -> abort job
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20111104/f3fcb45f/attachment-0001.html


More information about the mvapich-discuss mailing list