[mvapich-discuss] Re: MPI-IO Inconsistency over Lustre using
MVAPICH (Nathan Baca)
Terrence.LIAO at total.com
Terrence.LIAO at total.com
Wed Mar 4 10:54:20 EST 2009
This is not a solution, but to report similar observation on "MPI-IO
Inconsistency over Lustre using MVAPICH", that we have been seeing our
MPI-IO code produces wrong results from time to time. This happens not
only in MVAPICH, also in other MPI, such as SGI's MPT. We have rewritten
our code not to use MPI-IO.
Thank you very much.
-- Terrence
--------------------------------------------------------
Terrence Liao, Ph.D.
Research Computer Scientist
TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC
1201 Louisiana, Suite 1800, Houston, TX 77002
Tel: 713.647.3498 Fax: 713.647.3638
Email: terrence.liao at total.com
Houston HPC site: http://us-hou-spt01/sites/rt/hpc/default.aspx
Pau HPC site: http://collaboratif.ep.corp.local/sites/hpc/hpc/RD.aspx
mvapich-discuss-request at cse.ohio-state.edu
Sent by: mvapich-discuss-bounces at cse.ohio-state.edu
03/04/2009 08:14 AM
Please respond to
mvapich-discuss at cse.ohio-state.edu
To
mvapich-discuss at cse.ohio-state.edu
cc
Subject
mvapich-discuss Digest, Vol 39, Issue 2
Send mvapich-discuss mailing list submissions to
mvapich-discuss at cse.ohio-state.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
or, via email, send a message with subject or body 'help' to
mvapich-discuss-request at cse.ohio-state.edu
You can reach the person managing the list at
mvapich-discuss-owner at cse.ohio-state.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of mvapich-discuss digest..."
Today's Topics:
1. Re: Mvapich2-1.2 for OpenFabrics IB/iWARP : Jobterminates
with error (Jonathan Perkins)
2. MPI-IO Inconsistency over Lustre using MVAPICH (Nathan Baca)
3. (no subject) (nilesh awate)
----------------------------------------------------------------------
Message: 1
Date: Mon, 2 Mar 2009 13:17:41 -0500
From: Jonathan Perkins <perkinjo at cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] Mvapich2-1.2 for OpenFabrics IB/iWARP :
Jobterminates with error
To: Vivek Gavane <vivekg at cdac.in>
Cc: mvapich-discuss at cse.ohio-state.edu
Message-ID: <20090302181740.GK2993 at cse.ohio-state.edu>
Content-Type: text/plain; charset=us-ascii
Vivek:
We do not have an environment setup that can easily support the
installation of this MEME Suite. Is there a simpler MPI program that
this error can be reproduced with. This will greatly assist us in
debugging this issue.
On Fri, Feb 20, 2009 at 11:32:30AM +0530, Vivek Gavane wrote:
> Sir,
> I have tried for different set of nodes for various runs, the same
> error is reported. But when I tried for small number of cores i.e 8 the
> job never came out even though it was complete and the output file was
> generated. Also the processes were showing 99.9% CPU usage even after
> complete output was generated.
>
> The application code I am using is MEME version meme3.0.3
> http://meme.nbcr.net/downloads/old_versions/
>
> Also I installed the newer version of MEME version meme_4.1.0
> http://meme.nbcr.net/downloads/
>
> It is also giving the following error everytime on different set of
nodes:
> -----------------------------------
> Exit code -5 signaled from ibc0-27
> Killing remote processes...MPI process terminated unexpectedly
> DONE
> -----------------------------------
>
> The redirected output file of the application contains:
> -----------------------------
> cleanupSignal 15 received.
> -----------------------------
>
> Thanks.
> --
> Regards,
> Vivek Gavane
>
> Member Technical Staff
> Bioinformatics team,
> Scientific & Engineering Computing Group,
> National PARAM Supercomputing Facility,
> Centre for Development of Advanced Computing,
> Pune-411007.
>
> Phone: +91 20 25704100 ext. 195
> Direct Line: +91 20 25704195
>
> On Thu, Feb 19, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:
>
> > Vivek,
> >
> > Do you see this error always when you run this application? Do you see
> > this error when you run your application on different set of nodes? If
> > this happens always (irrespective of runs and nodes), will it be
possible
> > for you to send us a code snippet which reproduces this problem. This
will
> > help us to investigate this issue further.
> >
> > Thanks,
> >
> > DK
> >
> >> Sir,
> >> Thank you for the reply but the cable and switch seems to be
fine. Is
> >> there any other reason/solution for the errors. And also the
application
> >> program is giving complete and correct output except for the errors
at the
> >> end.
> >>
> >> Thanks.
> >> --
> >> Regards,
> >> Vivek Gavane
> >>
> >> Member Technical Staff
> >> Bioinformatics team,
> >> Scientific & Engineering Computing Group,
> >> National PARAM Supercomputing Facility,
> >> Centre for Development of Advanced Computing,
> >> Pune-411007.
> >>
> >> Phone: +91 20 25704100 ext. 195
> >> Direct Line: +91 20 25704195
> >>
> >> On Tue, Feb 17, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu>
said:
> >>
> >> > Code 12 is a timeout -- could be a bad cable/HCA/switch leaf. If
the
> >> > system is really large then it could be congestion.
> >> >
> >> > Thanks,
> >> >
> >> > DK
> >> >
> >> > On Tue, 17 Feb 2009, Vivek Gavane wrote:
> >> >
> >> >> Hello,
> >> >> I have mvapich2-1.2 compiled with the following options:
> >> >>
> >> >>
> >> >> /configure --with-rdma=gen2 --enable-sharedlibs=gcc --enable-g=dbg
> >> >> --enable-debuginfo --with-ib-include=/opt/OFED/include
> >> >> --with-ib-libpath=/opt/OFED/lib64 --prefix=/home/apps/mvapich2-1.2
> >> >>
> >> >> After I submit a job, the job completes but the following errors
are
> >> >> reported on the console:
> >> >>
> >> >> -------------------------------------------------------------
> >> >> send desc error
> >> >> Exit code -5 signaled from ibc0-16
> >> >> Killing remote processes...[14] Abort: [] Got completion with
error 12,
> >> >> vendor code=81, dest rank=0
> >> >> at line 553 in file ibv_channel_manager.c
> >> >> MPI process terminated unexpectedly
> >> >> DONE
> >> >> ------------------------------------------------------------
> >> >>
> >> >> And in the redirected output file, following errors are reported
at the
> >> >> end:
> >> >> -----------------------------------------
> >> >> cleanupSignal 15 received.
> >> >> Signal 15 received.
> >> >> Signal 15 received.
> >> >> Signal 15 received.
> >> >> -----------------------------------------
> >> >>
> >> >> Do anyone know the reason for this?
> >> >>
> >> >> Thanks in advance.
> >> >> --
> >> >> Regards,
> >> >> Vivek Gavane
> >> >> _______________________________________________
> >> >> mvapich-discuss mailing list
> >> >> mvapich-discuss at cse.ohio-state.edu
> >> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >> >>
> >> >
> >>
> >>
> >
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
------------------------------
Message: 2
Date: Tue, 3 Mar 2009 20:45:16 -0700
From: Nathan Baca <nathan.baca at gmail.com>
Subject: [mvapich-discuss] MPI-IO Inconsistency over Lustre using
MVAPICH
To: mvapich-discuss at cse.ohio-state.edu
Message-ID:
<d1196de80903031945k3e7ac0c4yc04f2fad7f1a8b3b at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hello,
I am seeing inconsistent mpi-io behavior when writing to a Lustre file
system using mvapich2 1.2p1 and mvapich 1.1 both with romio. What follows
is
a simple reproducer and output. Essentially one or more of the running
processes does not read or write the correct amount of data to its part of
a
file residing on a Lustre (parallel) file system.
I have tried both isolating the output to a single OST and striping across
multiple OSTs. Both will reproduce the same result. I have tried compiling
with multiple versions of both pathscale and intel compilers all with the
same result.
The odd thing is that this seems to work using hpmpi 2.03 with pathscale
3.2
and intel 10.1.018. The operating system is XC 3.2.1 which is essentially
rhel4.5. The kernel is 2.6.9-67.9hp.7sp.XCsmp. Lustre version is
lustre-1.4.11-2.3_0.6_xc3.2.1_k2.6.9_67.9hp.7sp.XCsmp.
Any help figuring out what is happening is greatly appreciated. Thanks,
Nate
program gcrm_test_io
implicit none
include "mpif.h"
integer X_SIZE
integer w_me, w_nprocs
integer my_info
integer i
integer (kind=4) :: ierr
integer (kind=4) :: fileID
integer (kind=MPI_OFFSET_KIND) :: mylen
integer (kind=MPI_OFFSET_KIND) :: offset
integer status(MPI_STATUS_SIZE)
integer count
integer ncells
real (kind=4), allocatable, dimension (:) :: array2
logical sync
call mpi_init(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,w_nprocs,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,w_me,ierr)
call mpi_info_create(my_info, ierr)
! optional ways to set things in mpi-io
! call mpi_info_set (my_info, "romio_ds_read" , "enable" , ierr)
! call mpi_info_set (my_info, "romio_ds_write", "enable" , ierr)
! call mpi_info_set (my_info, "romio_cb_write", "enable" , ierr)
x_size = 410011 ! A 'big' number, with bigger numbers it is more
likely to fail
sync = .true. ! Extra file synchronization
ncells = (X_SIZE * w_nprocs)
! Use node zero to fill it with nines
if (w_me .eq. 0) then
call MPI_FILE_OPEN (MPI_COMM_SELF, "output.dat",
MPI_MODE_CREATE+MPI_MODE_WRONLY, my_info, fileID, ierr)
allocate (array2(ncells))
array2(:) = 9.0
mylen = ncells
offset = 0 * 4
call MPI_FILE_SET_VIEW(fileID,offset, MPI_REAL,MPI_REAL,
"native",MPI_INFO_NULL,ierr)
call MPI_File_write(fileID, array2, mylen , MPI_REAL,
status,ierr)
call MPI_Get_count(status,MPI_INTEGER, count, ierr)
if (count .ne. mylen) print*, "Wrong initial write count:",
count,mylen
deallocate(array2)
if (sync) call MPI_FILE_SYNC (fileID,ierr)
call MPI_FILE_CLOSE (fileID,ierr)
endif
! All nodes now fill their area with ones
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
allocate (array2( X_SIZE))
array2(:) = 1.0
offset = (w_me * X_SIZE) * 4 ! multiply by four, since it is real*4
mylen = X_SIZE
call MPI_FILE_OPEN (MPI_COMM_WORLD,"output.dat",MPI_MODE_WRONLY,
my_info, fileID, ierr)
print*,"node",w_me,"starting",(offset/4) +
1,"ending",(offset/4)+mylen
call MPI_FILE_SET_VIEW(fileID,offset, MPI_REAL,MPI_REAL,
"native",MPI_INFO_NULL,ierr)
call MPI_File_write(fileID, array2, mylen , MPI_REAL, status,ierr)
call MPI_Get_count(status,MPI_INTEGER, count, ierr)
if (count .ne. mylen) print*, "Wrong write count:", count,mylen,w_me
deallocate(array2)
if (sync) call MPI_FILE_SYNC (fileID,ierr)
call MPI_FILE_CLOSE (fileID,ierr)
! Read it back on node zero to see if it is ok data
if (w_me .eq. 0) then
call MPI_FILE_OPEN (MPI_COMM_SELF, "output.dat",
MPI_MODE_RDONLY,
my_info, fileID, ierr)
mylen = ncells
allocate (array2(ncells))
call MPI_File_read(fileID, array2, mylen , MPI_REAL,
status,ierr)
call MPI_Get_count(status,MPI_INTEGER, count, ierr)
if (count .ne. mylen) print*, "Wrong read count:", count,mylen
do i=1,ncells
if (array2(i) .ne. 1) then
print*, "ERROR", i,array2(i), ((i-1)*4),
((i-1)*4)/(1024d0*1024d0) ! Index, value, # of good bytes,MB
goto 999
end if
end do
print*, "All done with nothing wrong"
999 deallocate(array2)
call MPI_FILE_CLOSE (fileID,ierr)
call MPI_file_delete ("output.dat",MPI_INFO_NULL,ierr)
endif
call mpi_finalize(ierr)
end program gcrm_test_io
1.2p1 MVAPICH 2
node 1 starting 410012 ending 820022
node 2 starting 820023 ending 1230033
node 3 starting 1230034 ending 1640044
node 4 starting 1640045 ending 2050055
node 5 starting 2050056 ending 2460066
node 0 starting 1 ending 410011
All done with nothing wrong
node 1 starting 410012 ending 820022
node 4 starting 1640045 ending 2050055
node 3 starting 1230034 ending 1640044
node 5 starting 2050056 ending 2460066
node 2 starting 820023 ending 1230033
Wrong write count: 228554 410011 2
node 0 starting 1 ending 410011
Wrong read count: 1048576 2460066
ERROR 1048577 0.E+0 4194304 4.
node 1 starting 410012 ending 820022
node 3 starting 1230034 ending 1640044
node 4 starting 1640045 ending 2050055
node 2 starting 820023 ending 1230033
node 5 starting 2050056 ending 2460066
node 0 starting 1 ending 410011
Wrong read count: 1048576 2460066
ERROR 1048577 0.E+0 4194304 4.
1.1 MVAPICH
node 0 starting 1 ending
410011
node 4 starting 1640045 ending
2050055
node 3 starting 1230034 ending
1640044
node 2 starting 820023 ending
1230033
node 1 starting 410012 ending
820022
node 5 starting 2050056 ending
2460066
All done with nothing wrong
node 0 starting 1 ending
410011
node 5 starting 2050056 ending
2460066
node 2 starting 820023 ending
1230033
node 1 starting 410012 ending
820022
Wrong write count: 228554 410011 2
node 3 starting 1230034 ending
1640044
node 4 starting 1640045 ending
2050055
Wrong read count: 1048576 2460066
ERROR 1048577 0.0000000E+00 4194304 4.00000000000000
node 0 starting 1 ending
410011
node 3 starting 1230034 ending
1640044
node 4 starting 1640045 ending
2050055
node 1 starting 410012 ending
820022
node 5 starting 2050056 ending
2460066
node 2 starting 820023 ending
1230033
Wrong read count: 1229824 2460066
ERROR 1229825 0.0000000E+00 4919296 4.69140625000000
--
Nathan Baca
nathan.baca at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090303/ee3f6aa0/attachment-0001.html
------------------------------
Message: 3
Date: Wed, 4 Mar 2009 19:22:17 +0530 (IST)
From: nilesh awate <nilesh_awate at yahoo.com>
Subject: [mvapich-discuss] (no subject)
To: MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
Cc: Nilesh Awate <nilesha at cdac.in>
Message-ID: <22080.56777.qm at web94104.mail.in2.yahoo.com>
Content-Type: text/plain; charset="utf-8"
Hi all,
I am using mvapich2-1.2p1 with udapl adi over proprietary interconnect
previously we were using mvapich2-1.0.3.
I m using it over Intel Xeon X5472 @ 3.00GHz (8 nodes 8 cores each)
cluster
But I m not able to fire more than 6 processes over a cluster
it just get stuck after connection establishment(used debug messeges in
our library)
i have observed the same thing over Mellanox card(OpenIB-cma).
mpdboot way of firing process is working fine (but it is not recommended
as you say)
Foll. are environment variables that i set before run
$ export PATH=/home/user/pn_mpi/mpi-bin1.2/bin/:$PATH
$ which env
/usr/bin/env
$which mpispawn
/home/user/pn_mpi/mpi-bin1.2/bin/mpispawn
set in ~/.bashrc
I have read FAQ for the same but didn't find much information
waiting for reply
Nilesh
Connect with friends all over the world. Get Yahoo! India Messenger
at http://in.messenger.yahoo.com/?wm=n/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090304/be0589d1/attachment.html
------------------------------
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
End of mvapich-discuss Digest, Vol 39, Issue 2
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090304/b76115dc/attachment-0001.html
More information about the mvapich-discuss
mailing list