[mvapich-discuss] open non-existing files on Lustre expecting MPI_ERR_NO_SUCH_FILE but got MPI_SUCCESS

Mark Dixon m.c.dixon at leeds.ac.uk
Fri Feb 10 04:06:17 EST 2017


Hi all,

Yes, will test - really sorry for the delay, am out of the office most of 
this week.

Cheers,

Mark

On Thu, 9 Feb 2017, Wei-keng Liao wrote:

> Hi, Hari
>
> I do not have an access to a machine with infiniband and Lustre.
> I have forwarded your patch to Mark Dixon and he will give it a try
> and get back to you soon. Please keep him in the loop. Thanks.
>
>
> Wei-keng
>
> On Feb 7, 2017, at 4:09 PM, Hari Subramoni wrote:
>
>> Hello Wei-keng,
>>
>> Can you please try the attached patch and see if things work for you?
>>
>> This patch will be available by default with future releases of the MVAPICH2 library.
>>
>> Best Regards,
>> Hari.
>>
>> On Tue, Feb 7, 2017 at 4:50 AM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
>> Hello Wei-keng,
>>
>> Many thanks for the report. We will take a look at it and get back to you soon.
>>
>> Best Regards,
>> Hari.
>>
>> On Feb 7, 2017 1:12 AM, "Wei-keng Liao" <wkliao at eecs.northwestern.edu> wrote:
>> Hi,
>>
>> I am a developer of PnetCDF library and would like to file a bug report.
>> Mark Dixon, a PnetCDF user cc-ed in this email, reported an error when
>> building PnetCDF with mvapich2-2.2. The root cause of the error is due to
>> mvapich2-2.2 fails to return the expected MPI error class MPI_ERR_NO_SUCH_FILE
>> when a test program tries to open a non-existing file on Lustre.
>>
>> After some diggings, I found out that the Lustre driver in mvapich2 appends
>> O_CREAT flag to the mode argument of the open call, (line 50, in file
>> src/mpi/romio/adio/ad_lustre/ad_lustre_open.c). Because of that, the file is
>> mistakenly created and MPI_SUCCESS is returned, which is an unexpected outcome
>> by MPI standard.
>>
>> Attached is a small MPI program to reproduce such error. Mark has used it to
>> verify the issue described above. He also ran it against an ext4 (UFS) and the
>> correct error class was returned. So the problem is specific for Lustre driver.
>> Please see the discussion email threads:
>> http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2017-February/001897.html
>>
>>
>> Wei-keng
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> <lustre_open.patch>
>
>

-- 
-------------------------------------------------------------------
Mark Dixon                         Email    : m.c.dixon at leeds.ac.uk
Advanced Research Computing (ARC)  Tel (int): 35429
IT Services building               Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-------------------------------------------------------------------


More information about the mvapich-discuss mailing list