[mvapich-discuss] Problems with MVAPICH2

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Jul 7 12:01:47 EDT 2011


Hi, both process managers seem to indicate that there was a problem
with the execution of the MPI program.  To rule out any installation
problems, can you try to run some simple benchmarks.

Please take a look at the following link for information on how to run
the osu benchmarks.
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7_alpha2.html#x1-650007

2011/7/7 Àî¸Õ <li.larry at hotmail.com>:
> Hi,
>
>   I got tow error while running jobs with Mvapich217 a2 / Mvapich216, It got
> error message like this: Thanks.
> error1:
>  mpiexec -f ./hosts -np 64 ./vasp
>  running on   64 nodes
>  distr:  one band on    1 nodes,   64 groups
>  vasp.5.2.11 18Jan11 complex
>  POSCAR found :  3 types and      36 ions
>  LDA part: xc-table for Pade appr. of Perdew
>  POSCAR, INCAR and KPOINTS ok, starting setup
>  WARNING: small aliasing (wrap around) errors must be expected
>  FFT: planning ...(           1 )
>  WAVECAR not read
>  entering main loop
>        N       E                     dE              d eps       ncg
> rms          rms(c)
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 11
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:0 at node08] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:0 at node08] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at node08] main (./pm/pmiserv/pmip.c:222): demux engine error
> waiting for event
> [proxy:0:1 at node09] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:! 1@ node09] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at node09] main (./pm/pmiserv/pmip.c:222): demux engine error
> waiting for event
> [proxy:0:2 at node10] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
> [proxy:0:2 at node10] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:2 at node10] main (./pm/pmiserv/pmip.c:222): demux engine error
> waiting for event
> [mpiexec at hpcserver] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
> badly; aborting
> [mpiexec at hpcserver] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> [mpiexec at hpcserver] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:179): launcher returned error waiting for
> completion
> [mpiexec at hpcserver] main (./ui/mpich/mpiexec.c:397): process ma! na ger
> error waiting for completion
>
> error2:
> [huochunfang at hpcserver test]$ mpirun_rsh -rsh -np 64 -hostfile ./hosts
> ~/bin/vasp
>  running on   64 nodes
>  distr:  one band on   64 nodes,    1 groups
>  vasp.5.2.11 18Jan11 complex
>  POSCAR found :  2 types and     514 ions
>  LDA part: xc-table for Ceperly-Alder, standard interpolation
>  POSCAR, INCAR and KPOINTS ok, starting setup
>  FFT: planning ...(          16 )
>  WAVECAR not read
> MPI process (rank: 32) terminated unexpectedly on node02
> Exit code -5 signaled from node02
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine      &nb! sp ;     Line
> Source
> libmkl_mc3.so      00002AAAB1ECFA80  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF99F  Unknown               Unknown  Unknown
> forrtl: error (69): process int! er rupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF8C0  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source        ! &n bsp;
> libmkl_mc3.so      00002AAAB1ECF97E  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF96E  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC   &nbs! p;             Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF7A2  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECFAA2  Unknown    &n! bs p;          Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF997  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine        &nbsp! ;& nbsp;
> Line        Source
> libmkl_mc3.so      00002AAAB1ECF862  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF9AF  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGI! NT )
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF93A  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source          &n! bs p;
> libmkl_mc3.so      00002AAAB1ECF7EF  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6          00000031634306F7  Unknown               Unknown  Unknown
> vasp               0000000000EACF22  Unknown     &nbsp! ;& nbsp;
> Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown               Unknown  Unknown
> libc.so.6          000000316341D994  Unknown               Unknown  Unknown
> vasp   &n! bs p;           0000000000408619  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6          00000031634306F7  Unknown               Unknown  Unknown
> vasp               0000000000EACF22  Unknown    &nbsp! ;& nbsp;
> Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown               Unknown  Unknown
> libc.so.6          000000316341D994  Unknown               Unknown  Unknown
> vasp  &n! bs p;            0000000000408619  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6          00000031634306F7  Unknown               Unknown  Unknown
> vasp               0000000000EACF22  Unknown   &nbsp! ;& nbsp;
> Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown               Unknown  Unknown
> libc.so.6          000000316341D994  Unknown               Unknown  Unknown
> vasp &n! bs p;             0000000000408619  Unknown               Unknown
> Unknown
> MPI process (rank: 43) terminated unexpectedly on node11
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF867  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image      &nb! sp ;       PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF889  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so   &nbsp! ;& nbsp; 00002AAAB1ECFA39  Unknown
> Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF82E  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC              &nb! sp ; Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF99F  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1D339AC  Unknown               Unknow! n& nbsp;
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF9F3  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line        Sour! ce
>
> libmkl_mc3.so      00002AAAB1ECFA57  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF8CB  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image        &nbsp! ;& nbsp;    PC                Routine
> Line        Source
> libmkl_mc3.so      00002AAAB1ECF9F3  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002A! AA B1ECF97E  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF99F  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine&nb! sp ;           Line
> Source
> libmkl_mc3.so      00002AAAB1ECF83E  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECFAAF  Unknown               Unknown  Unknow!
> n< br>forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF80F  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source  ! &n bsp;
> libmkl_mc3.so      00002AAAB1ECF95C  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB8BF87B4  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image           &! nb sp;  PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF857  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6          ! 00 000039110306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown       &n! bs p;       Unknown
> Unknown
> libc.so.6          000000391101D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6        &! nb sp; 00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown      &n! bs p;        Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6       &! nb sp;  00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown     &n! bs p;         Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6      &! nb sp;   00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown    &n! bs p;          Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6     &! nb sp;    00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown   &n! bs p;           Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6    &! nb sp;     00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown  &n! bs p;            Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6   &! nb sp;      00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown &n! bs p;             Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6  &! nb sp;       00000039DFA306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown&n! bs p;              Unknown
> Unknown
> libc.so.6          00000039DFA1D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so&nb! sp ;     00002AAAB1ECF97E  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF78D  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC           &! nb sp;    Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF8C0  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF84F  Unknown           &nbsp! ;& nbsp;
> Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECF8C0  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line    &nb! sp ;
> Source
> libmkl_mc3.so      00002AAAB1ECFAC2  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1ECFCB5  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image     &nb! sp ;        PC                Routine            Line
> Source
> libmkl_mc3.so      00002AAAB1D339A7  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6   &nb! sp ;      00000039110306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown &nbs! p;              Unknown
> Unknown
> libc.so.6          000000391101D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6  &nb! sp ;       00000034270306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unknown&nbs! p;               Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6 &nb! sp ;        00000034270306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C  Unkno! wn                Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.so.6&nb! sp ;         00000034270306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C ! U nknown               Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libc.s! o. 6          00000034270306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               000000000040870C! &n bsp; Unknown               Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> ! li bc.so.6          00000034270306F7  Unknown               Unknown
> Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               0000000000! 40 870C  Unknown               Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source           &nbsp! ;
> libc.so.6          00000034270306F7  Unknown               Unknown  Unknown
> vasp               0000000000EACF22  Unknown               Unknown  Unknown
> vasp               000000000057F100  Unknown               Unknown  Unknown
> vasp               0000000000428490  Unknown               Unknown  Unknown
> vasp               0000! 00 000040870C  Unknown               Unknown
> Unknown
> libc.so.6          000000342701D994  Unknown               Unknown  Unknown
> vasp               0000000000408619  Unknown               Unknown  Unknown
>
> larry
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list