[mvapich-discuss] parallel file read ->
"cannot allocate memory for the file buffer"
Nathan Dauchy
Nathan.Dauchy at noaa.gov
Thu Oct 18 17:47:49 EDT 2007
Abhinav Vishnu wrote:
> From MVAPICH 0.9.9 onwards, we have converted almost all features
> as run-time variable. With this support, you will not need to change the
> MPI
> source for enabling/disabling features. For MVAPICH, the SRQ usage
> variable can be controlled by VIADEV_USE_SRQ:
>
> http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-1100009.4.1
>
>> I have not yet figured out how to disable SRQ in MVAPICH2.
>>
> In MVAPICH2, the usage of SRQ can be controlled by using MV2_USE_SRQ.
> Details with respect to this variable are present here:
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2.html#x1-12000010.46
>
Thanks, those pointers are very helpful, and certainly preferable to
modifying the source code.
>>
>> 1) Can anyone duplicate our problem with the above code?
>>
> We are taking a look at it.
>> 2) Does the code violate MPI standards or exceed MVAPICH limitations?
>>
> From MVAPICH/MVAPICH2 perspective, there should be no limitations IMO.
>> 3) Is there a change to the MPI stack or runtime environment that will
>> avoid the problem?
>>
> As you have mentioned earlier, disabling SRQ seems to solve the problem
> for you. Unfortunately, at this point, we do not have much insight with
> respect to the root cause of the problem. Please let us know the outcome
> of your experimentation by disabling SRQ for MVAPICH/MVAPICH2.
>
It may be that the problem was introduce with MVAPICH-0.9.9, not
necessarily with SRQ. We have not run this test on MVAPICH-0.9.8 WITH
SRQ. I can build and test that, but it may take a while.
In the meantime, I tested the following:
* kernel-2.6.9-55.ELsmp, OFED-1.2, MVAPICH-0.9.9
mpirun_rsh -paramfile mpirun.params
Where mpirun.params includes "VIADEV_USE_SRQ=0"
* kernel-2.6.20.20, OFED-1.2.5.1, MVAPICH2-1.0
mpiexec -env MV2_USE_SRQ 0
I got the "forrtl: severe (98): cannot allocate memory for the file
buffer" error in both cases.
Thanks,
Nathan
More information about the mvapich-discuss
mailing list