[mvapich-discuss] Re: Architecture compatibility or maybe something else

Laurence Marks L-marks at northwestern.edu
Fri Apr 18 13:31:27 EDT 2008


I have an issue running mpi tasks on a new cluster which may be any of
(or a combination of)
a) The intel scalapack libraries
b) The version of OFED and infiniband cards
c) The compiler (ifort) and the architecture
d) Something I've not thought of.

It's not the code, that is stable and runs fine on other systems.

Running mvapich, on a dual-quadcore Intel(R) Xeon(R) CPU E5410
everything works if I run only 1 mpi task per quadcore.
If I do 2 or more I get a SIGSEV within the scalapack call PDSYEVX
which looks like it is associated with threading:
libpthread.so.0 00000030D9C0DD40
libpthread.so.0 00000030D9C0DC1D
libiomp5.so 00002AAAAB4C1511

Running mvapich2 and/or intelmpi I get a different error,
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0xa4ee68) failed
MPI_Comm_size(69).: Invalid communicator

which I can trace to the scalapack call CALL SL_INIT(ICTXTALL, 1, NPE)

The code ran fine when it was benchmarked a few months ago, and so far
has been tested (by Intel) on a dual duo-core without problems; the
engineer is going to use a dual quadcore.

I would appreciate any suggestions as to where to look to try and
understand what is going on.

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Commission on Electron Diffraction of IUCR
www.numis.northwestern.edu/IUCR_CED


More information about the mvapich-discuss mailing list