[mvapich-discuss] What's a cause?

Dhabaleswar Panda panda at cse.ohio-state.edu
Thu Jun 18 07:18:28 EDT 2009


Please see that all nodes are configured uniformly with respect to
libraries and drivers. Also, make sure that all nodes are accessible
through the rsh/ssh mechanism you are using with mpirun. Otherwise, your
job might be getting aborted during the job launch phase or immediately
after it. Something like this seems to be happening here.

I would also like to indicate that MVAPICH 1.0.1 version (you are using)
is more than one year old. Please use the latest 1.1 branch version from
the following location. This version has multiple bugfixes and additional
optimizations/features compared to the 1.0.1 version.

http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1/

DK

On Thu, 18 Jun 2009, Satoshi Isono wrote:

> Hello everyone,
>
> When I used MVAPICH 1.0.1, I got errors as below after two minutes. MPI
> size is 2,560 processes. I think this problem was caused system trouble
> on each compute node. I would like to know everyone's thought. Messages
> shows that some of shared libraries cannot load. Are there any key items
> as below error messages?
>
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libpthread.so.0    0000003D3FE0DE60  Unknown               Unknown
> Unknown
> libpthread.so.0    0000003D3FE0CC79  Unknown               Unknown
> Unknown
> libibverbs.so.1    0000003D3F606B2F  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000CB3642  Unknown               Unknown
> Unknown
> libpthread.so.0    0000003D3FE062E7  Unknown               Unknown
> Unknown
> libc.so.6          0000003D3F2CE3BD  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> nhm_driver-2       000000000068AC82  Unknown               Unknown
> Unknown
> nhm_driver-2       00000000004059D6  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000405942  Unknown               Unknown
> Unknown
> libc.so.6          0000003D3F21D8A4  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000405869  Unknown               Unknown
> Unknown
> MPI process terminated unexpectedly
>
>
> Regards,
> Satoshi Isono
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list