[mvapich-discuss] MPIDI_CH3I_CM_Init: Error initializing MVAPICH2 malloc library

Gilles Civario gilles.civario at ichec.ie
Thu Sep 8 06:43:44 EDT 2011


Hi Devendar

Thank you for your answer.

First of all, I want to apologise as I realise I mixed-up with the informations I gave:
Actually, the error I encountered arises with both mvapich2 versions I tried, knowing 1.5.1 and 1.7rc1.
The message coming from 1.5.1 is indeed:

Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(310)..: Initialization failed
MPID_Init(113).........: channel initialization failed
MPIDI_CH3_Init(161)....:
MPIDI_CH3I_CM_Init(828): Error initializing MVAPICH2 malloc library

But the one corresponding to 1.7rc1 is actually:

Fatal error in MPI_Init:
Other MPI error

An d the return code is 1.


In regard to the library's installation, AFAICR I didn't put anything fancy for the configuration: just set CC, CXX, F77 and FC to the Intel compilers, and set the --prefix...
Here is what mpich2version gives me:

[gcivario at stoney4 bin]$ ./mpich2version
MPICH2 Version:        1.7rc1
MPICH2 Release date:    Wed Jul 20 10:30:09 EDT 2011
MPICH2 Device:        ch3:mrail
MPICH2 configure:     --prefix=/ichec/home/staff/gcivario/pakages/mvapich2/1.7rc1-intel
MPICH2 CC:     icc    -DNDEBUG -DNVALGRIND -O2
MPICH2 CXX:     icpc   -DNDEBUG -DNVALGRIND -O2
MPICH2 F77:     ifort   -O2
MPICH2 FC:     ifort   -O2

In regard to the specific application I'm using, well I'm afraid this is not one I may talk about, but the only thing I can say is that this is the only one I know which causes this error on the machine, whereas we use mainly
mvapich2-1.5.1 on our clusters and we have hundreds of jobs running every day using the library, many of which are much larger than 128 processes.
So the problem is clearly related to the code itself. However, the very same code runs fine with a different MPI flavour, like Intel MPI (albeit slower ;) ).
I don't know what makes this very code atypical and leads to the observed behaviour.
Please let me know if there is anything else I can do that might be helpful.

Cheers.

Gilles

On 07/09/11 16:43, Devendar Bureddy wrote:
> Hi Gilles
>
> Can you please share your configuration? This will help us to look in
> for specific details of the issue.
>
> Can you let us know what application your trying? If possible, can you
> please try osu_alltoall located at
> $PREFIX/libexec/osu-micro-benchmarks/osu_alltoall and see if you are
> seeing same error
>
>
> -Devendar
>
> On Wed, Sep 7, 2011 at 10:48 AM, Gilles Civario<gilles.civario at ichec.ie>  wrote:
>> Hi guys,
>>
>> I installed mvapich2-1.7-rc1 on one of our cluster to test some features,
>> and with the code I played with, I encountered a problem: while scaling my
>> runs using powers of 2 processes, the code ran beautifully up to 64
>> processes, and with 128 gave me, for each process, the following message:
>>
>> Fatal error in MPI_Init:
>> Other MPI error, error stack:
>> MPIR_Init_thread(310)..: Initialization failed
>> MPID_Init(113).........: channel initialization failed
>> MPIDI_CH3_Init(161)....:
>> MPIDI_CH3I_CM_Init(828): Error initializing MVAPICH2 malloc library
>>
>> After googleing the web and browsing the sources, I suspected an issue of
>> some sort with my max locked memory limit. But I managed to rule it out.
>> Then I suspected some issue with our mpiexec (which is not the one coming
>> with the library), but here again, I ruled it out.
>> So I tried a more systematic approach and I searched in the  on line
>> documentation all environment variables I could play with which default to
>> 128 and set them one after the other to 256... Still no joy. Then I did the
>> same with the ones default to 64 and set them to 128. And then,
>> MV2_ON_DEMAND_THRESHOLD did the trick!
>> So my issue is now solved. However, I have a few questions / requests:
>>
>> what is the impact for me to set this specific environment variable, to a
>> value large enough so that I don't experiment those crashes in the future?
>> Could it be possible to either document the error / message, or to make it
>> more explicit so that people experimenting the same issue can sort it out
>> more easily than I did?
>>
>> Thank you for your great job guys.
>>
>> Cheers.
>>
>> Gilles Civario
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>


More information about the mvapich-discuss mailing list