[mvapich-discuss] Verify the application is really running

wgy at altair.com.cn wgy at altair.com.cn
Wed Sep 5 22:21:51 EDT 2007


Hello£º
I am away from the machine these days but I will do as you request when I
can.
Thanks.
Henry, Wu.

| Can you send us the output from
|
| /usr/local/topspin/mpi/mpich/bin/mpicc -v
|
| This will let us know what compiler and which version of it that you're
| using.  We do not see this problem when trying in our environment using
| a somewhat recent version of gcc.
|
| Below I pasted line 66 with a few lines of context around it.  There is
| no for loop here so I'm a bit confused as to why you're getting the
| errors that you posted.
|
|
| int main(int argc, char *argv[])
| {
|
|     int myid, numprocs, i;
|     int size;
|     MPI_Status reqstat;
|     char *s_buf, *r_buf;
|     int align_size;
|
|
| Also, as a sanity check, can you download the osu_latency.c file again
| from
|
https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk/osu_benchmarks/osu_latency.c
|
| to verify that we are referencing the same file.  Thanks for your input
| and we hope with further information we can solve this compilation issue
| that you're having.
|
|
| wgy at altair.com.cn wrote:
|> Hello:
|> I am quite sure I used the one you referred and got the compile error as
|> you can see from the message.. I just renamed osu_latency.c to
|> lantency.c
|> while uploading...
|> Thanks.
|> Henry, Wu.
|>
|> | Are you sure you are using the osu_latency.c file from mvapich web
|> | site?  Your e-mail indicates about using a `latency.c' file.
|> |
|> | FYI, the osu_latency.c benchmark (latest version v2.2) is available
|> | from the following URL:
|> |
|> |
|>
https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk/osu_benchmarks/osu_latency.c
|> |
|> | DK
|> |
|> |>
|> |> Hello, Jeff and Dr, Panda:
|> |> I get back with the following resluts and further ques:
|> |> 1)latency test reslut:
|> |> the latency.c I downloaded from mvapich website can be complied in
|> the
|> |> cluster with errors:
|> |> [radioss at hpc-node-01 job1]$ /usr/local/topspin/mpi/mpich/bin/mpicc
|> |> latency.c -o lat
|> |> latency.c:66: error: syntax error before "for"
|> |> latency.c:68: error: `i' undeclared here (not in a function)
|> |> latency.c:68: warning: data definition has no type or storage class
|> |> latency.c:69: error: syntax error before '}' token
|> |> latency.c:73: error: `skip_large' undeclared here (not in a function)
|> |> latency.c:73: warning: data definition has no type or storage class
|> |> latency.c:74: error: syntax error before '}' token
|> |> latency.c:76: warning: parameter names (without types) in function
|> |> declaration
|> |> latency.c:76: warning: data definition has no type or storage class
|> |> latency.c:78: error: syntax error before "if"
|> |> latency.c:82: error: syntax error before numeric constant
|> |> latency.c:82: warning: data definition has no type or storage class
|> |> latency.c:83: error: syntax error before numeric constant
|> |> latency.c:84: warning: data definition has no type or storage class
|> |> latency.c:86: error: initializer element is not constant
|> |> latency.c:86: warning: data definition has no type or storage class
|> |> latency.c:88: error: syntax error before '}' token
|> |> latency.c:92: error: syntax error before numeric constant
|> |> latency.c:92: warning: data definition has no type or storage class
|> |> latency.c:98: error: `t_start' undeclared here (not in a function)
|> |> latency.c:98: error: `loop' undeclared here (not in a function)
|> |> latency.c:98: warning: data definition has no type or storage class
|> |> latency.c:99: error: syntax error before string constant
|> |> latency.c:99: warning: conflicting types for built-in function
|> 'fprintf'
|> |> latency.c:99: warning: data definition has no type or storage class
|> |> latency.c:104: warning: data definition has no type or storage class
|> |> latency.c:105: error: syntax error before "return"
|> |> latency.c:107:2: warning: no newline at end of file
|> |> latency.c:68: error: storage size of `r_buf' isn't know
|> |>
|> |> I had to use mpi_latency.c shipped with mvapich in the cluster and
|> got
|> |> the
|> |> following latency test results.
|> |>
|> |> [radioss at hpc-node-01 job1]$
|> /usr/local/topspin/mpi/mpich/bin/mpirun_rsh
|> |> -np 2 -hostfile appfile ./lat 10000 1
|> |> 1       6.288650
|> |> [radioss at hpc-node-01 job1]$
|> /usr/local/topspin/mpi/mpich/bin/mpirun_rsh
|> |> -np 2 -hostfile appfile ./lat 10000 4
|> |> 4       6.410350
|> |> while Topspin's Host-Side Drivers User Guide for Linux Release 3.1.0
|> |> gives
|> |> the following latency test figure as an example:
|> |> [root at qa-bc1-blade2 root]#
|> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh
|> |> -np
|> |> 2 qabc1-
|> |> blade2 qa-bc1-blade3 /usr/local/topspin/mpi/mpich/bin/mpi_latency
|> 10000
|> |> 1
|> |> 1 6.684000
|> |> 2) Jeff Squyres once asked me:
|> |> >> I have 4-cores nodes here..
|> |> >> I would expect to run it as:
|> |> >> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 2 -hostfile hosts
|> |>
|> |> >^^ Is that the right path?  Or is it "mvapich"?  Regardless, I think
|> |> wherever you find mpirun_ssh under /usr/local/topspin/mpi is probably
|> |> the right one.
|> |> the path is right, and I am pretty sure it is mavapich because:
|> |> i)rpm -qf /usr/local/topspin/mpi/mpich/bin/mpirun_ssh gives:
|> |> topspin-ib-mpi-rhel4-3.2.0-118
|> |> ii)[radioss at hpc-node-01 local]$
|> |> /usr/local/topspin/mpi/mpich/bin/mpirun_rsh -v
|> |> OSU MVAPICH VERSION 0.9.5-SingleRail
|> |>
|> |> 3)when I try to use hp mpi 2.2.5 over the IB network I got the
|> |> following:
|> |> [radioss at hpc-node-01 job1]$ /opt/hpmpi/bin/mpirun -stdio=i0
|> |> -cpu_bind=cyclic -VAPI  -f appfile < PFTANKD01
|> |> dlopen test for MPI_ICLIB_VAPI__VAPI_MAIN could not open libs in list
|> |> libmtl_common.so   libmpga.so      libmosal.so     libvapi.so:
|> |> /usr/local/topspin/lib64/libmosal.so: undefined symbol:
|> pthread_create
|> |> dlopen test for MPI_ICLIB_VAPI__VAPI_CISCO could not open libs in
|> list
|> |> libpthread.so     libmosal.so     libvapi.so:
|> /usr/lib64/libpthread.so:
|> |> invalid ELF header
|> |> mpid: MPI BUG: VAPI requested but not available
|> |> what does it probably indicate? anything is wrong with the IB
|> |> configuration?
|> |>
|> |> RPM packages installed there:
|> |> [radioss at hpc-node-01 job1]$ rpm -qa|grep topspin
|> |> topspin-ib-rhel4-3.2.0-118
|> |> topspin-ib-mpi-rhel4-3.2.0-118
|> |> topspin-ib-mod-rhel4-2.6.9-42.ELsmp-3.2.0-118
|> |>
|> |> 4)You guys suggested me to use HP MPI (no native mvapich) and OFED IB
|> |> stack if possible.
|> |> now I have some questiosn hope you can have a quick comment or refer
|> me
|> |> some website link so that I can read through:
|> |> i)how to verify which IB stack is used here, OFED or Cisco/Topspin IB
|> |> stack? what's the advantages of OFED IB stack over Cisco/Topspin IB
|> |> stack?
|> |> ii)what's the advatages of HP HPI over "native mvapich"? what means
|> by
|> |> "native mvapich"? the one shipped with Cisco/Topspin? is it enough to
|> |> upgrade mvapich to the latest one which is availabel on mvapich
|> website?
|> |>
|> |> Thanks a lot for all of you for your kindly help!
|> |>
|> |> Henry, Wu.
|> |>
|> |>
|> |>
|> |>
|> |> | On Aug 29, 2007, at 12:54 PM, wgy at altair.com.cn wrote:
|> |> |
|> |> |> Yes, I think I used mavapich shipped with Topspin, but I am not
|> sure
|> |> |> unless I  know how to verify it.
|> |> |
|> |> | If it's in the /usr/local/topspin directory, it's the Topspin
|> (later
|> |> | Cisco) MVAPICH.
|> |> |
|> |> |> about latency test, I downloaded
|> |> |> https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk/
|> |> |> osu_benchmarks/osu_latency.c
|> |> |> and will compile it to run a benchmark. Can you please tell me how
|> |> |> should
|> |> |> I run it? how many nodes should be used and how many cpus should
|> be
|> |> |> involved?
|> |> |
|> |> | You typically run it with 2 MPI processes; one on each host.  It
|> |> | measures the MPI network latency between those two hosts.
|> |> |
|> |> |> I have 4-cores nodes here..
|> |> |> I would expect to run it as:
|> |> |> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 2 -hostfile hosts
|> |> |
|> |> | ^^ Is that the right path?  Or is it "mvapich"?  Regardless, I
|> think
|> |> | wherever you find mpirun_ssh under /usr/local/topspin/mpi is
|> probably
|> |> | the right one.
|> |> |
|> |> |> osu_latency.o
|> |> |
|> |> | Is your executable really named osu_latency.o?  That's uncommon.
|> |> | Regardless, run the executable that you got when you compiled
|> |> | osu_latency.c with mpicc.
|> |> |
|> |> |> and include the following in the hosts file
|> |> |> hpc-node-01
|> |> |> hpc-node-02
|> |> |
|> |> | Sounds right.  I'm not an MVAPICH expert, though -- so I defer to
|> the
|> |> | maintainers here on this list for the finer details...
|> |> |
|> |> |> Is it right?
|> |> |> Thanks a  lot, I am really a newbie with Infiniband....
|> |> |
|> |> | If this is your own system, I do want to stress that OFED is really
|> |> | the way to go with HPC InfiniBand installations these days.  The
|> |> | MPI's that are included are much more recent, and all new
|> development
|> |> | work is happening in the OFED arena.
|> |> |
|> |> | I recommend that you upgrade if you can.
|> |> |
|> |> |
|> |> |> Henry, Wu
|> |> |>
|> |> |>
|> |> |> | On Aug 29, 2007, at 12:25 PM, wgy at altair.com.cn wrote:
|> |> |> |
|> |> |> |> Hello, Jeff:
|> |> |> |> The mvapich version is OSU mvapich0.95.
|> |> |> |> does it mean that it is Cisco IB stack and therefor the
|> |> application
|> |> |> |> I run
|> |> |> |> with mvapich is really running over IB network?
|> |> |> |
|> |> |> | The version of MVAPICH, by itself, does not mean that it is or
|> is
|> |> |> not
|> |> |> | running over IB.
|> |> |> |
|> |> |> | What *implies* that you are running over IB is:
|> |> |> |
|> |> |> | - You implied that you are using the MVAPICH shipped with the
|> |> |> Topspin
|> |> |> | IB stack (which is not OFED).  Is that correct?
|> |> |> | - I *believe* that the Topspin MVAPICH did not have TCP support
|> |> |> | compiled into it (Topspin was before my time, but I am pretty
|> sure
|> |> |> | that the Cisco MVAPICH shipped with the Cisco IB stack does not)
|> |> |> |
|> |> |> | What would *prove* that you are using IB (vs. gige) is:
|> |> |> |
|> |> |> | - Run a simple latency test, as Dr. Panda suggested.  Your
|> latency
|> |> |> | should be single-digit microseconds (exact numbers depend on
|> your
|> |> |> | hardware -- this might be all older stuff since you mentioned
|> |> |> | "Topspin", not "Cisco"; Topspin was acquired by Cisco quite a
|> while
|> |> |> | ago...).  If your latency is much higher than that (e.g., 50
|> us),
|> |> |> | you're using gige.
|> |> |> |
|> |> |> |
|> |> |> |
|> |> |> |> Thanks.
|> |> |> |>
|> |> |> |> Henry, Wu.
|> |> |> |> | In addition to what Dr. Panda said, Cisco recommends that all
|> |> HPC
|> |> |> |> | customers upgrade to the OFED IB driver stack if possible
|> (some
|> |> |> |> | customers cannot upgrade for various reasons).  FWIW: all new
|> |> |> HPC/
|> |> |> |> MPI
|> |> |> |> | work is occurring in the OFED arena.
|> |> |> |> |
|> |> |> |> | I bring this up because you specifically mention Topspin
|> |> |> Infiniband,
|> |> |> |> | which I'm *assuming* is the Cisco IB stack (not the OFED IB
|> |> |> stack),
|> |> |> |> | and is therefore shipping with a somewhat older version of
|> |> |> MVAPICH
|> |> |> |> | that was derived from the OSU MVAPICH.  The Cisco MVAPICH
|> should
|> |> |> |> only
|> |> |> |> | be compiled with IB support enabled; a simple latency test
|> |> should
|> |> |> |> | prove that you're running over IB and not ethernet.
|> |> |> |> |
|> |> |> |> | Much more recent versions of MPI implementations are included
|> |> |> with
|> |> |> |> | the OFED stack (Cisco provides binary distributions of OFED
|> on
|> |> |> |> | www.cisco.com).
|> |> |> |> |
|> |> |> |> |
|> |> |> |> | On Aug 29, 2007, at 11:44 AM, Dhabaleswar Panda wrote:
|> |> |> |> |
|> |> |> |> |>
|> |> |> |> |>
|> |> |> |> |> On Wed, 29 Aug 2007 wgy at altair.com.cn wrote:
|> |> |> |> |>
|> |> |> |> |>> Hello, list:
|> |> |> |> |>> It might be a silly questions but I wonder how to verify
|> run
|> |> |> with
|> |> |> |> |>> mvapich
|> |> |> |> |>> (come with Topspin Infiniband) over Infiniband, NOT
|> Gigabite
|> |> |> |> network.
|> |> |> |> |>> Is there an option to force mvapich to use IB network
|> |> otherwise
|> |> |> |> |>> just exits?
|> |> |> |> |>
|> |> |> |> |> MVAPICH has several underlying interfaces: Gen2, uDAPL,
|> VAPI,
|> |> |> TCP/
|> |> |> |> |> IP and
|> |> |> |> |> shared memory. Please take a look at the user guide
|> |> |> (available from
|> |> |> |> |> mvapich project page) to see the differences and
|> capabilities
|> |> of
|> |> |> |> these
|> |> |> |> |> interfaces. Gen2 interface (corresponding to OFED) will give
|> |> you
|> |> |> |> |> the best performance and scalability. If you have OFED stack
|> |> |> |> |> installed,
|> |> |> |> |> you should be able to configure mvapich to run over Gen2
|> |> |> interface
|> |> |> |> |> (as per the instructions indicated in the user guide).
|> During
|> |> |> OFED
|> |> |> |> |> installation, you can also select mvapich from the package.
|> |> |> |> |>
|> |> |> |> |> On your existing installation, you can also run OSU
|> benchmarks
|> |> |> |> (such
|> |> |> |> |> as OSU latency). If you get latency number in the range of
|> 2~4
|> |> |> |> |> microsec
|> |> |> |> |> for short messages (say 4 bytes), it is already running over
|> |> the
|> |> |> |> |> native
|> |> |> |> |> IB.
|> |> |> |> |>
|> |> |> |> |> Hope this helps.
|> |> |> |> |>
|> |> |> |> |> DK
|> |> |> |> |>
|> |> |> |> |>> Thanks for your suggestion.
|> |> |> |> |>> Rdgs.
|> |> |> |> |>> Henry, Wu
|> |> |> |> |>>
|> |> |> |> |>> _______________________________________________
|> |> |> |> |>> mvapich-discuss mailing list
|> |> |> |> |>> mvapich-discuss at cse.ohio-state.edu
|> |> |> |> |>>
|> |> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|> |> |> |> |>>
|> |> |> |> |>
|> |> |> |> |> _______________________________________________
|> |> |> |> |> mvapich-discuss mailing list
|> |> |> |> |> mvapich-discuss at cse.ohio-state.edu
|> |> |> |> |>
|> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|> |> |> |> |
|> |> |> |> |
|> |> |> |> | --
|> |> |> |> | Jeff Squyres
|> |> |> |> | Cisco Systems
|> |> |> |> |
|> |> |> |> |
|> |> |> |
|> |> |> |
|> |> |> | --
|> |> |> | Jeff Squyres
|> |> |> | Cisco Systems
|> |> |> |
|> |> |> |
|> |> |
|> |> |
|> |> | --
|> |> | Jeff Squyres
|> |> | Cisco Systems
|> |> |
|> |> |
|> |>
|> |>
|> |> _______________________________________________
|> |> mvapich-discuss mailing list
|> |> mvapich-discuss at cse.ohio-state.edu
|> |> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|> |>
|> |
|> |
|>
|>
|> _______________________________________________
|> mvapich-discuss mailing list
|> mvapich-discuss at cse.ohio-state.edu
|> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|
|
| --
| Jonathan Perkins
| http://www.cse.ohio-state.edu/~perkinjo
|




More information about the mvapich-discuss mailing list