[mvapich-discuss] viacheck.c error?

Aquarijen aquarijen at gmail.com
Thu Feb 22 16:06:03 EST 2007


Hi Sayantan and everyone,

I had been pulled onto other projects for a while - sorry it has been
so long for an update!  But now I'm back on this as my first priority
to get working...  I've tried a few things.
osu_latency, osu_bw and osu_bibw still fail with 2 processors - it is
the same problem. :(  No user can run IMB - we get the same viacheck.c
error for it as well.

Your suggestions for cpilog and simpleio worked fine and these run
without problems now.  Just none of the benchmarks...

So I thought I would try out the new mvapich 0.9.9 beta and see how it
went.  I am having trouble compiling it and I think it may be a
related problem?
I have tried with icc and gcc.  We have gcc (GCC) 4.0.2 20051125 (Red
Hat 4.0.2-8) and icc (ICC) 9.1 20061101.

There are warnings in viainit.c and viarecv.c, but there is an error
in viacheck.c.  Here is the error in icc:
--------------------------------------------------------------------------------
icc -DHAVE_CONFIG_H -I.
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2
-I/root/mvapich/mvapich-0.9.9-beta/include
-I/root/mvapich/mvapich-0.9.9-beta/include
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2
-I/root/mvapich/mvapich-0.9.9-beta/mpid/util -DMPID_DEVICE_CODE
-DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1
-DMPID_DEBUG_NONE -DMPID_STAT_NONE  -D_GNU_SOURCE -fPIC -D_EM64T_
-DEARLY_SEND_COMPLETION -DMEMORY_RELIABLE -DVIADEV_RPUT_SUPPORT
-D_SMP_ -D_SMP_RNDV_ -DCH_GEN2 -D_ICC_  -I/usr/ofed/include -O3
-DHAVE_MPICHCONF_H -I/root/mvapich/mvapich-0.9.9-beta
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2 -I.   -c viacheck.c
viacheck.c(1036): warning #167: argument of type "unsigned char *" is
incompatible with parameter of type "char *"
                  update_crc(1, v->buffer, header->dma_len),
                                ^

viacheck.c(1557): warning #188: enumerated type mixed with another type
                          rhandle->protocol);
                          ^

viacheck.c(1749): warning #188: enumerated type mixed with another type
                                              rhandle->protocol);
                                              ^

viacheck.c(2570): error: identifier "IBV_EVENT_CLIENT_REREGISTER" is undefined
              case IBV_EVENT_CLIENT_REREGISTER:
                   ^

cm_user.h(6): warning #864: extern inline function
"odu_test_new_connection" was referenced but not defined
  inline void odu_test_new_connection(void);
              ^

compilation aborted for viacheck.c (code 2)
make[3]: *** [viacheck.o] Error 2
Exit status from make was 2
make[2]: *** [mpilib] Error 1
make[1]: *** [mpi-modules] Error 2
make: *** [mpi] Error 2
----------------------------------------------------------------------------------------------------------

and the error in gcc:
----------------------------------------------------------------------------------------------------------
gcc -DHAVE_CONFIG_H -I.
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2
-I/root/mvapich/mvapich-0.9.9-beta/include
-I/root/mvapich/mvapich-0.9.9-beta/include
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2
-I/root/mvapich/mvapich-0.9.9-beta/mpid/util -DMPID_DEVICE_CODE
-DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1
-DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -D_EM64T_
-DEARLY_SEND_COMPLETION -DMEMORY_RELIABLE -DVIADEV_RPUT_SUPPORT
-D_SMP_ -D_SMP_RNDV_ -DCH_GEN2   -I/usr/ofed/include -O3
-DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/root/mvapich/mvapich-0.9.9-beta
-I/root/mvapich/mvapich-0.9.9-beta/mpid/ch_gen2 -I.  -Wall  -c
viacheck.c
viacheck.c: In function 'viadev_process_recv':
viacheck.c:1036: warning: pointer targets in passing argument 2 of
'update_crc' differ in signedness
viacheck.c: In function 'async_thread':
viacheck.c:2570: error: 'IBV_EVENT_CLIENT_REREGISTER' undeclared
(first use in this function)
viacheck.c:2570: error: (Each undeclared identifier is reported only once
viacheck.c:2570: error: for each function it appears in.)
make[3]: *** [viacheck.o] Error 1
Exit status from make was 2
make[2]: *** [mpilib] Error 1
make[1]: *** [mpi-modules] Error 2
make: *** [mpi] Error 2
-----------------------------------------------------------------------------------------------------------------

What am I doing wrong?

Thanks so much for any help you can give!!!!

-Jen



On 1/23/07, Sayantan Sur <surs at cse.ohio-state.edu> wrote:
> Hello Jen,
>
> The OSU benchmarks should be ideally run for 2 processes. Can you try
> osu_latency, osu_bw, osu_bibw with just 2 processes? I have a feeling
> that the cluster isn't set up quite right, otherwise these simple
> benchmarks wouldn't fail. Are other users able to run IMB on the cluster?
>
> cpilog might have compilation problems since MPE might have not been
> compiled in when MVAPICH was built. To enable MPE, use --with-mpe as a
> configure parameter in make.mvapich.gen2. (assuming you have downloaded
> MVAPICH-0.9.8 from our website).
>
> Similarly, with simpleio, MPIIO component needs to be compiled in when
> building MVAPICH. Use --with-romio as a configure parameter.
>
> Thanks,
> Sayantan.
>
> Aquarijen wrote:
> > Hi Sayantan,
> >
> > Thank you for your help. :)
> >
> > A few things about my environment.  The compute nodes are 64 bit, so I
> > pointed the mvapich compilation to /usr/ofed/lib64 - I have no 32 bit
> > libs for ofed.  The compute nodes have 2 processors each.  When I have
> > tried jobs, I have submitted them through pbs (torque) and specified
> > that I want 1 processor per node - maui enforces this.  I have tried
> > all my runs with 40 nodes, one processor per node.
> >
> > cpi, cpip and hello++ run without problems.
> >
> > osu_bw fails with the error:
> > Connection closed by 172.16.4.36^M
> > [0] Abort: [b09n040.oic.ornl.gov:0] Got completion with error, code=1
> > at line 2355 in file viacheck.c
> > done.
> >
> > osu_bcast fails with error:
> > [39] Abort: [b09n001.oic.ornl.gov:39] Got completion with error, code=1
> > at line 2355 in file viacheck.c
> > done.
> >
> > osu_bibw fails with:
> > [0] Abort: [b09n040.oic.ornl.gov:0] Got completion with error, code=1
> > at line 2355 in file viacheck.c
> > [39] Abort: [32] Abort: [24] Abort: [38] Abort:
> > [b09n001.oic.ornl.gov:39] Got completion with error, code=12, dest
> > rank=0
> > at line 397 in file viacheck.c
> > [b09n008.oic.ornl.gov:32] Got completion with error, code=12, dest rank=0
> > at line 397 in file viacheck.c
> > [b09n016.oic.ornl.gov:24] Got completion with error, code=12, dest rank=0
> > at line 397 in file viacheck.c
> > [b09n002.oic.ornl.gov:38] Got completion with error, code=12, dest rank=0
> > at line 397 in file viacheck.c
> > [36] Abort: [b09n004.oic.ornl.gov:36] Got completion with error,
> > code=12, dest rank=0
> > at line 397 in file viacheck.c
> > done.
> >
> > osu_latency fails with:
> > [0] Abort: [b09n040.oic.ornl.gov:0] Got completion with error, code=1
> > at line 2355 in file viacheck.c
> > done.
> >
> > I can't compile cpilog.c, I get:
> > [2vt at b09l02 osu_benchmarks-mvapich]$ which mpicc
> > /opt/mvapich-gcc-0.9.8/bin/mpicc
> > [2vt at b09l02 osu_benchmarks-mvapich]$ mpicc cpilog.c -o cpilog
> > cpilog.o(.text+0xd2): In function `main':
> > cpilog.c: undefined reference to `MPE_Init_log'
> > cpilog.o(.text+0xd7):cpilog.c: undefined reference to
> > `MPE_Log_get_event_number'cpilog.o(.text+0xdf):cpilog.c: undefined
> > reference to `MPE_Log_get_event_number'cpilog.o(.text+0xe7):cpilog.c:
> > undefined reference to
> > `MPE_Log_get_event_number'cpilog.o(.text+0xef):cpilog.c: undefined
> > reference to `MPE_Log_get_event_number'cpilog.o(.text+0xf7):cpilog.c:
> > undefined reference to
> > `MPE_Log_get_event_number'cpilog.o(.text+0xff):cpilog.c: more
> > undefined references to `MPE_Log_get_event_number' follow
> > cpilog.o(.text+0x12e): In function `main':
> > cpilog.c: undefined reference to `MPE_Describe_state'
> > cpilog.o(.text+0x143):cpilog.c: undefined reference to
> > `MPE_Describe_state'
> > cpilog.o(.text+0x158):cpilog.c: undefined reference to
> > `MPE_Describe_state'
> > cpilog.o(.text+0x16d):cpilog.c: undefined reference to
> > `MPE_Describe_state'
> > cpilog.o(.text+0x1a2):cpilog.c: undefined reference to `MPE_Start_log'
> > cpilog.o(.text+0x1c0):cpilog.c: undefined reference to `MPE_Log_event'
> > cpilog.o(.text+0x1f0):cpilog.c: undefined reference to `MPE_Log_event'
> > cpilog.o(.text+0x202):cpilog.c: undefined reference to `MPE_Log_event'
> > cpilog.o(.text+0x21e):cpilog.c: undefined reference to `MPE_Log_event'
> > cpilog.o(.text+0x230):cpilog.c: undefined reference to `MPE_Log_event'
> > cpilog.o(.text+0x2da):cpilog.c: more undefined references to
> > `MPE_Log_event' follow
> > cpilog.o(.text+0x342): In function `main':
> > cpilog.c: undefined reference to `MPE_Finish_log'
> > collect2: ld returned 1 exit status
> >
> > I also can't compile simpleio.c.  I get:
> > [2vt at b09l02 osu_benchmarks-mvapich]$ mpicc simpleio.c
> > simpleio.o(.text+0x252): In function `main':
> > simpleio.c: undefined reference to `MPI_File_open'
> > simpleio.o(.text+0x26e):simpleio.c: undefined reference to
> > `MPI_File_write'
> > simpleio.o(.text+0x277):simpleio.c: undefined reference to
> > `MPI_File_close'
> > simpleio.o(.text+0x2c0):simpleio.c: undefined reference to
> > `MPI_File_open'
> > simpleio.o(.text+0x2dc):simpleio.c: undefined reference to
> > `MPI_File_read'
> > simpleio.o(.text+0x2e5):simpleio.c: undefined reference to
> > `MPI_File_close'
> > collect2: ld returned 1 exit status
> >
> > Intel MPI benchmarks (IMB-MPI1) fail with:
> >
> > [0] Abort: [b09n040.oic.ornl.gov:0] Got completion with error, code=4
> > at line 2355 in file viacheck.c
> > done.
> >
> >
> > I'd be happy to provide any opther logs or any other info you might
> > think would help!  Sorry it took me so long for this - I had a few
> > fires to put out.  Now, this is #1 priority.
> >
> > Thanks for all your help!!!!
> > Jen
> >
> >
> >
> > On 1/19/07, Sayantan Sur <surs at cse.ohio-state.edu> wrote:
> >> Hello Jen,
> >>
> >> > I am new.  And a little frusterated. :)
> >>
> >> Thanks for your post ... Hope your problems are short-lived :-)
> >>
> >> > I have compiled/installed mvapich 0.9.8 using ofed/gen2.
> >> >
> >> > I can run cpi on all my nodes just fine.  The problem comes in when I
> >> > try to use any of the osu benchmark programs.  They seem to compile
> >> > just fine, but when I try to run osu_bcast, I get the following error:
> >> >
> >> > [39] Abort: [b09n001.oic.ornl.gov:39] Got completion with error,
> >> code=1
> >> > at line 2355 in file viacheck.c
> >> > done.
> >> >
> >> > Where is this viacheck.c?  Has anyone seen this before?  I'd be happy
> >> > to provide more details if you tell me what to provide.
> >>
> >> viacheck.c is an internal file in the MVAPICH implementation. I have a
> >> couple of questions which you could answer ...
> >>
> >> 1) How many nodes was this run attempted? I have run osu_bcast on
> >> 64-nodes/128 processes and it seems to be OK.
> >>
> >> 2) Can you run IMB (Intel MPI benchmarks)? They will also call
> >> MPI_Bcast.
> >>
> >> 3) I'm wondering if you could run the other OSU benchmarks, such as
> >> latency, bandwidth, bi-directional bandwidth?
> >>
> >> Thanks,
> >> Sayantan.
> >>
> >> --
> >> http://www.cse.ohio-state.edu/~surs
> >>
> >
> >
>
> --
> http://www.cse.ohio-state.edu/~surs
>
>


-- 
When I play with my cat, who knows whether she is not amusing herself
with me more than I with her.
Michel de Montaigne


More information about the mvapich-discuss mailing list