[mvapich-discuss] Re: Architecture compatibility or maybe something else

Laurence Marks L-marks at northwestern.edu
Fri Apr 18 18:10:36 EDT 2008


It's a fairly common code (http://www.wien2k.at/) that's been used on
many computers, but I don't know for certain about on InfiniBand and
the more recent dual quadcore systems (I may be the first). It does
not use any system calls or fork UNLESS the most recent intel
scalapack/mkl does -- which it might (I've no idea, I don't know how
they are doing their multithreading).

I have a subset of the code which can be used for debugging purposes;
about 3.5Mb total. It would need to be compiled with "appropriate"
options, but this is fast. I can send this directly to you, sending to
the list would be inappropriate.

On Fri, Apr 18, 2008 at 4:10 PM, Matthew Koop <koop at cse.ohio-state.edu> wrote:
> Laurence,
>
>  Has the code been run on any other InfiniBand cluster using MVAPICH? Does
>  your code make any sort of system calls or fork?
>
>  Also, is this code available so that we can try to reproduce and debug?
>
>  Matt
>
>
>
>  On Fri, 18 Apr 2008, Laurence Marks wrote:
>
>  > I have an issue running mpi tasks on a new cluster which may be any of
>  > (or a combination of)
>  > a) The intel scalapack libraries
>  > b) The version of OFED and infiniband cards
>  > c) The compiler (ifort) and the architecture
>  > d) Something I've not thought of.
>  >
>  > It's not the code, that is stable and runs fine on other systems.
>  >
>  > Running mvapich, on a dual-quadcore Intel(R) Xeon(R) CPU E5410
>  > everything works if I run only 1 mpi task per quadcore.
>  > If I do 2 or more I get a SIGSEV within the scalapack call PDSYEVX
>  > which looks like it is associated with threading:
>  > libpthread.so.0 00000030D9C0DD40
>  > libpthread.so.0 00000030D9C0DC1D
>  > libiomp5.so 00002AAAAB4C1511
>  >
>  > Running mvapich2 and/or intelmpi I get a different error,
>  > Fatal error in MPI_Comm_size: Invalid communicator, error stack:
>  > MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0xa4ee68) failed
>  > MPI_Comm_size(69).: Invalid communicator
>  >
>  > which I can trace to the scalapack call CALL SL_INIT(ICTXTALL, 1, NPE)
>  >
>  > The code ran fine when it was benchmarked a few months ago, and so far
>  > has been tested (by Intel) on a dual duo-core without problems; the
>  > engineer is going to use a dual quadcore.
>  >
>  > I would appreciate any suggestions as to where to look to try and
>  > understand what is going on.
>  >
>  > --
>  > Laurence Marks
>  > Department of Materials Science and Engineering
>  > MSE Rm 2036 Cook Hall
>  > 2220 N Campus Drive
>  > Northwestern University
>  > Evanston, IL 60208, USA
>  > Tel: (847) 491-3996 Fax: (847) 491-7820
>  > email: L-marks at northwestern dot edu
>  > Web: www.numis.northwestern.edu
>  > Commission on Electron Diffraction of IUCR
>  > www.numis.northwestern.edu/IUCR_CED
>
>
> > _______________________________________________
>  > mvapich-discuss mailing list
>  > mvapich-discuss at cse.ohio-state.edu
>  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  >
>
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Commission on Electron Diffraction of IUCR
www.numis.northwestern.edu/IUCR_CED


More information about the mvapich-discuss mailing list