[mvapich-discuss] open non-existing files on Lustre expecting MPI_ERR_NO_SUCH_FILE but got MPI_SUCCESS

Mark Dixon m.c.dixon at leeds.ac.uk
Tue Feb 14 10:14:25 EST 2017


Hi all,

Have rebuilt mvapich2 with the patch to ad_lustre_open.c and retested with 
Wei-keng's test program. Things look much better now, thanks very much! :)

$ mpicc -o test_open_no_such_file test_open_no_such_file.c

$ mpiexec -n 1 ./test_open_no_such_file /tmp/non-exist-file-on-ext4
non-exiting file "/tmp/non-exist-file-on-ext4"
MPI error string: File does not exist, error stack:
ADIOI_UFS_OPEN(69): File /tmp/non-exist-file-on-ext4 does not exist
Error class = MPI_ERR_NO_SUCH_FILE

$ mpiexec -n 1 ./test_open_no_such_file /nobackup/non-exist-file-on-lustre
non-exiting file "/nobackup/non-exist-file-on-lustre"
MPI error string: File does not exist, error stack:
ADIOI_LUSTRE_OPEN(69): File /nobackup/non-exist-file-on-lustre does not exist
Error class = MPI_ERR_NO_SUCH_FILE

All the best,

Mark


On Fri, 10 Feb 2017, Mark Dixon wrote:

> Hi all,
>
> Yes, will test - really sorry for the delay, am out of the office most of 
> this week.
>
> Cheers,
>
> Mark
>
> On Thu, 9 Feb 2017, Wei-keng Liao wrote:
>
>>  Hi, Hari
>>
>>  I do not have an access to a machine with infiniband and Lustre.
>>  I have forwarded your patch to Mark Dixon and he will give it a try
>>  and get back to you soon. Please keep him in the loop. Thanks.
>> 
>>
>>  Wei-keng
>>
>>  On Feb 7, 2017, at 4:09 PM, Hari Subramoni wrote:
>> 
>> >  Hello Wei-keng,
>> > 
>> >  Can you please try the attached patch and see if things work for you?
>> > 
>> >  This patch will be available by default with future releases of the 
>> >  MVAPICH2 library.
>> > 
>> >  Best Regards,
>> >  Hari.
>> > 
>> >  On Tue, Feb 7, 2017 at 4:50 AM, Hari Subramoni <subramoni.1 at osu.edu> 
>> >  wrote:
>> >  Hello Wei-keng,
>> > 
>> >  Many thanks for the report. We will take a look at it and get back to 
>> >  you soon.
>> > 
>> >  Best Regards,
>> >  Hari.
>> > 
>> >  On Feb 7, 2017 1:12 AM, "Wei-keng Liao" <wkliao at eecs.northwestern.edu> 
>> >  wrote:
>> >  Hi,
>> > 
>> >  I am a developer of PnetCDF library and would like to file a bug report.
>> >  Mark Dixon, a PnetCDF user cc-ed in this email, reported an error when
>> >  building PnetCDF with mvapich2-2.2. The root cause of the error is due 
>> >  to
>> >  mvapich2-2.2 fails to return the expected MPI error class 
>> >  MPI_ERR_NO_SUCH_FILE
>> >  when a test program tries to open a non-existing file on Lustre.
>> > 
>> >  After some diggings, I found out that the Lustre driver in mvapich2 
>> >  appends
>> >  O_CREAT flag to the mode argument of the open call, (line 50, in file
>> >  src/mpi/romio/adio/ad_lustre/ad_lustre_open.c). Because of that, the 
>> >  file is
>> >  mistakenly created and MPI_SUCCESS is returned, which is an unexpected 
>> >  outcome
>> >  by MPI standard.
>> > 
>> >  Attached is a small MPI program to reproduce such error. Mark has used 
>> >  it to
>> >  verify the issue described above. He also ran it against an ext4 (UFS) 
>> >  and the
>> >  correct error class was returned. So the problem is specific for Lustre 
>> >  driver.
>> >  Please see the discussion email threads:
>> >  http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2017-February/001897.html
>> > 
>> > 
>> >  Wei-keng
>> > 
>> > 
>> >  _______________________________________________
>> >  mvapich-discuss mailing list
>> >  mvapich-discuss at cse.ohio-state.edu
>> >  http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> > 
>> > 
>> >  <lustre_open.patch>
>> 
>> 
>
> -- 
> -------------------------------------------------------------------
> Mark Dixon                         Email    : m.c.dixon at leeds.ac.uk
> Advanced Research Computing (ARC)  Tel (int): 35429
> IT Services building               Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -------------------------------------------------------------------
>

-- 
-------------------------------------------------------------------
Mark Dixon                         Email    : m.c.dixon at leeds.ac.uk
Advanced Research Computing (ARC)  Tel (int): 35429
IT Services building               Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-------------------------------------------------------------------


More information about the mvapich-discuss mailing list