[mvapich-discuss] What's a cause?

Satoshi Isono isono at cray.com
Thu Jun 18 09:52:23 EDT 2009


Dear DK Panda,

Thanks for your advice. At first, I try to make sure all nodes have been
configured MPI environment.

Regards,
Satoshi Isono

-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
Sent: Thursday, June 18, 2009 8:18 PM
To: Satoshi Isono
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] What's a cause?

Please see that all nodes are configured uniformly with respect to
libraries and drivers. Also, make sure that all nodes are accessible
through the rsh/ssh mechanism you are using with mpirun. Otherwise, your
job might be getting aborted during the job launch phase or immediately
after it. Something like this seems to be happening here.

I would also like to indicate that MVAPICH 1.0.1 version (you are using)
is more than one year old. Please use the latest 1.1 branch version from
the following location. This version has multiple bugfixes and
additional
optimizations/features compared to the 1.0.1 version.

http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1/

DK

On Thu, 18 Jun 2009, Satoshi Isono wrote:

> Hello everyone,
>
> When I used MVAPICH 1.0.1, I got errors as below after two minutes.
MPI
> size is 2,560 processes. I think this problem was caused system
trouble
> on each compute node. I would like to know everyone's thought.
Messages
> shows that some of shared libraries cannot load. Are there any key
items
> as below error messages?
>
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> MPI process terminated unexpectedly
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> libpthread.so.0    0000003D3FE0DE60  Unknown               Unknown
> Unknown
> libpthread.so.0    0000003D3FE0CC79  Unknown               Unknown
> Unknown
> libibverbs.so.1    0000003D3F606B2F  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000CB3642  Unknown               Unknown
> Unknown
> libpthread.so.0    0000003D3FE062E7  Unknown               Unknown
> Unknown
> libc.so.6          0000003D3F2CE3BD  Unknown               Unknown
> Unknown
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC                Routine            Line
> Source
> nhm_driver-2       000000000068AC82  Unknown               Unknown
> Unknown
> nhm_driver-2       00000000004059D6  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000405942  Unknown               Unknown
> Unknown
> libc.so.6          0000003D3F21D8A4  Unknown               Unknown
> Unknown
> nhm_driver-2       0000000000405869  Unknown               Unknown
> Unknown
> MPI process terminated unexpectedly
>
>
> Regards,
> Satoshi Isono
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>




More information about the mvapich-discuss mailing list