[mvapich-discuss] Problem with MVAPICH and Intel compilers on Glenn and BALE

Dhabaleswar Panda panda at cse.ohio-state.edu
Tue Jun 16 15:17:14 EDT 2009


Jim,

The following note was posted on the mvapich-discuss list on June 9th as a
response to another query. Not sure whether the symptoms you are seeing
because of the back-to-back allocation and deallocation and memory not
being returned properly.

A patch was applied to the 1.1 branch version last month to solve this
problem.

Can you try the 1.1 branch version (the URL specified below) and see
whether the problem persists or goes away.

Thanks,

DK

==============

In the 1.1 branch we have made changes to help with issues where the MPI
library was shown to be using too much memory (and failing in some cases).
In this case a 'free' was not really unpinning memory immediately, so a
'free' immediately followed by a large malloc (and touching of that
buffer) could run the machine out of memory.

If you think this might be the problem then the following place is where
you can download the latest tarball:

http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1

=================================

On Tue, 16 Jun 2009, James Giuliani wrote:

> I have finally been able to reproduce a memory problem that occurs with MVAPICH and the Intel compilers.
>
> I have been able to reproduce the problem down to a series of F90 allocates and deallocates in the attached f90 code.
>
> To summarize the behavior, with the following environment:
>
> mvapich (0.9.9 or 1.1)
> Intel compilers (v10)
>
> the attached program will hang after 70,520 loop iterations.  If you set the loop logic to exit after 500 iterations, MPI_FINALIZE dies with a segmentation violation.
>
> mpiexec -n 1 ./test
> going back in loop         1
> going back in loop         2
> ..
> ..
> going back in loop         499
> going back in loop         500
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image              PC                Routine            Line        Source
> libpthread.so.0    00002AC8C53FFE70  Unknown               Unknown  Unknown
> libmpich.so.1.0    00002AC8C4D0AB91  Unknown               Unknown  Unknown
> libmpich.so.1.0    00002AC8C4D0940D  Unknown               Unknown  Unknown
> libmpich.so.1.0    00002AC8C4D22A91  Unknown               Unknown  Unknown
> libmpich.so.1.0    00002AC8C4CE0BBB  Unknown               Unknown  Unknown
> libmpich.so.1.0    00002AC8C4CE0912  Unknown               Unknown  Unknown
> test               0000000000405A17  Unknown               Unknown  Unknown
> test               0000000000403D42  Unknown               Unknown  Unknown
> libc.so.6          00002AC8C5AB68B4  Unknown               Unknown  Unknown
> test               0000000000403C69  Unknown               Unknown  Unknown
>
> This problem does not appear if:
>
> 1) You compile with Portland group compiler
> 2) You move the MPI_INIT until after the loop
>
> Is anyone aware why this may be a problem?
>
>



More information about the mvapich-discuss mailing list