[mvapich-discuss] Verify the application is really running

wgy at altair.com.cn wgy at altair.com.cn
Tue Sep 4 02:22:47 EDT 2007


Hello:
I am quite sure I used the one you referred and got the compile error as
you can see from the message.. I just renamed osu_latency.c to lantency.c
while uploading...
Thanks.
Henry, Wu.

| Are you sure you are using the osu_latency.c file from mvapich web
| site?  Your e-mail indicates about using a `latency.c' file.
|
| FYI, the osu_latency.c benchmark (latest version v2.2) is available
| from the following URL:
|
|
https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk/osu_benchmarks/osu_latency.c
|
| DK
|
|>
|> Hello, Jeff and Dr, Panda:
|> I get back with the following resluts and further ques:
|> 1)latency test reslut:
|> the latency.c I downloaded from mvapich website can be complied in the
|> cluster with errors:
|> [radioss at hpc-node-01 job1]$ /usr/local/topspin/mpi/mpich/bin/mpicc
|> latency.c -o lat
|> latency.c:66: error: syntax error before "for"
|> latency.c:68: error: `i' undeclared here (not in a function)
|> latency.c:68: warning: data definition has no type or storage class
|> latency.c:69: error: syntax error before '}' token
|> latency.c:73: error: `skip_large' undeclared here (not in a function)
|> latency.c:73: warning: data definition has no type or storage class
|> latency.c:74: error: syntax error before '}' token
|> latency.c:76: warning: parameter names (without types) in function
|> declaration
|> latency.c:76: warning: data definition has no type or storage class
|> latency.c:78: error: syntax error before "if"
|> latency.c:82: error: syntax error before numeric constant
|> latency.c:82: warning: data definition has no type or storage class
|> latency.c:83: error: syntax error before numeric constant
|> latency.c:84: warning: data definition has no type or storage class
|> latency.c:86: error: initializer element is not constant
|> latency.c:86: warning: data definition has no type or storage class
|> latency.c:88: error: syntax error before '}' token
|> latency.c:92: error: syntax error before numeric constant
|> latency.c:92: warning: data definition has no type or storage class
|> latency.c:98: error: `t_start' undeclared here (not in a function)
|> latency.c:98: error: `loop' undeclared here (not in a function)
|> latency.c:98: warning: data definition has no type or storage class
|> latency.c:99: error: syntax error before string constant
|> latency.c:99: warning: conflicting types for built-in function 'fprintf'
|> latency.c:99: warning: data definition has no type or storage class
|> latency.c:104: warning: data definition has no type or storage class
|> latency.c:105: error: syntax error before "return"
|> latency.c:107:2: warning: no newline at end of file
|> latency.c:68: error: storage size of `r_buf' isn't know
|>
|> I had to use mpi_latency.c shipped with mvapich in the cluster and got
|> the
|> following latency test results.
|>
|> [radioss at hpc-node-01 job1]$ /usr/local/topspin/mpi/mpich/bin/mpirun_rsh
|> -np 2 -hostfile appfile ./lat 10000 1
|> 1       6.288650
|> [radioss at hpc-node-01 job1]$ /usr/local/topspin/mpi/mpich/bin/mpirun_rsh
|> -np 2 -hostfile appfile ./lat 10000 4
|> 4       6.410350
|> while Topspin's Host-Side Drivers User Guide for Linux Release 3.1.0
|> gives
|> the following latency test figure as an example:
|> [root at qa-bc1-blade2 root]# /usr/local/topspin/mpi/mpich/bin/mpirun_ssh
|> -np
|> 2 qabc1-
|> blade2 qa-bc1-blade3 /usr/local/topspin/mpi/mpich/bin/mpi_latency 10000
|> 1
|> 1 6.684000
|> 2) Jeff Squyres once asked me:
|> >> I have 4-cores nodes here..
|> >> I would expect to run it as:
|> >> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 2 -hostfile hosts
|>
|> >^^ Is that the right path?  Or is it "mvapich"?  Regardless, I think
|> wherever you find mpirun_ssh under /usr/local/topspin/mpi is probably
|> the right one.
|> the path is right, and I am pretty sure it is mavapich because:
|> i)rpm -qf /usr/local/topspin/mpi/mpich/bin/mpirun_ssh gives:
|> topspin-ib-mpi-rhel4-3.2.0-118
|> ii)[radioss at hpc-node-01 local]$
|> /usr/local/topspin/mpi/mpich/bin/mpirun_rsh -v
|> OSU MVAPICH VERSION 0.9.5-SingleRail
|>
|> 3)when I try to use hp mpi 2.2.5 over the IB network I got the
|> following:
|> [radioss at hpc-node-01 job1]$ /opt/hpmpi/bin/mpirun -stdio=i0
|> -cpu_bind=cyclic -VAPI  -f appfile < PFTANKD01
|> dlopen test for MPI_ICLIB_VAPI__VAPI_MAIN could not open libs in list
|> libmtl_common.so   libmpga.so      libmosal.so     libvapi.so:
|> /usr/local/topspin/lib64/libmosal.so: undefined symbol: pthread_create
|> dlopen test for MPI_ICLIB_VAPI__VAPI_CISCO could not open libs in list
|> libpthread.so     libmosal.so     libvapi.so: /usr/lib64/libpthread.so:
|> invalid ELF header
|> mpid: MPI BUG: VAPI requested but not available
|> what does it probably indicate? anything is wrong with the IB
|> configuration?
|>
|> RPM packages installed there:
|> [radioss at hpc-node-01 job1]$ rpm -qa|grep topspin
|> topspin-ib-rhel4-3.2.0-118
|> topspin-ib-mpi-rhel4-3.2.0-118
|> topspin-ib-mod-rhel4-2.6.9-42.ELsmp-3.2.0-118
|>
|> 4)You guys suggested me to use HP MPI (no native mvapich) and OFED IB
|> stack if possible.
|> now I have some questiosn hope you can have a quick comment or refer me
|> some website link so that I can read through:
|> i)how to verify which IB stack is used here, OFED or Cisco/Topspin IB
|> stack? what's the advantages of OFED IB stack over Cisco/Topspin IB
|> stack?
|> ii)what's the advatages of HP HPI over "native mvapich"? what means by
|> "native mvapich"? the one shipped with Cisco/Topspin? is it enough to
|> upgrade mvapich to the latest one which is availabel on mvapich website?
|>
|> Thanks a lot for all of you for your kindly help!
|>
|> Henry, Wu.
|>
|>
|>
|>
|> | On Aug 29, 2007, at 12:54 PM, wgy at altair.com.cn wrote:
|> |
|> |> Yes, I think I used mavapich shipped with Topspin, but I am not sure
|> |> unless I  know how to verify it.
|> |
|> | If it's in the /usr/local/topspin directory, it's the Topspin (later
|> | Cisco) MVAPICH.
|> |
|> |> about latency test, I downloaded
|> |> https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk/
|> |> osu_benchmarks/osu_latency.c
|> |> and will compile it to run a benchmark. Can you please tell me how
|> |> should
|> |> I run it? how many nodes should be used and how many cpus should be
|> |> involved?
|> |
|> | You typically run it with 2 MPI processes; one on each host.  It
|> | measures the MPI network latency between those two hosts.
|> |
|> |> I have 4-cores nodes here..
|> |> I would expect to run it as:
|> |> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 2 -hostfile hosts
|> |
|> | ^^ Is that the right path?  Or is it "mvapich"?  Regardless, I think
|> | wherever you find mpirun_ssh under /usr/local/topspin/mpi is probably
|> | the right one.
|> |
|> |> osu_latency.o
|> |
|> | Is your executable really named osu_latency.o?  That's uncommon.
|> | Regardless, run the executable that you got when you compiled
|> | osu_latency.c with mpicc.
|> |
|> |> and include the following in the hosts file
|> |> hpc-node-01
|> |> hpc-node-02
|> |
|> | Sounds right.  I'm not an MVAPICH expert, though -- so I defer to the
|> | maintainers here on this list for the finer details...
|> |
|> |> Is it right?
|> |> Thanks a  lot, I am really a newbie with Infiniband....
|> |
|> | If this is your own system, I do want to stress that OFED is really
|> | the way to go with HPC InfiniBand installations these days.  The
|> | MPI's that are included are much more recent, and all new development
|> | work is happening in the OFED arena.
|> |
|> | I recommend that you upgrade if you can.
|> |
|> |
|> |> Henry, Wu
|> |>
|> |>
|> |> | On Aug 29, 2007, at 12:25 PM, wgy at altair.com.cn wrote:
|> |> |
|> |> |> Hello, Jeff:
|> |> |> The mvapich version is OSU mvapich0.95.
|> |> |> does it mean that it is Cisco IB stack and therefor the
|> application
|> |> |> I run
|> |> |> with mvapich is really running over IB network?
|> |> |
|> |> | The version of MVAPICH, by itself, does not mean that it is or is
|> |> not
|> |> | running over IB.
|> |> |
|> |> | What *implies* that you are running over IB is:
|> |> |
|> |> | - You implied that you are using the MVAPICH shipped with the
|> |> Topspin
|> |> | IB stack (which is not OFED).  Is that correct?
|> |> | - I *believe* that the Topspin MVAPICH did not have TCP support
|> |> | compiled into it (Topspin was before my time, but I am pretty sure
|> |> | that the Cisco MVAPICH shipped with the Cisco IB stack does not)
|> |> |
|> |> | What would *prove* that you are using IB (vs. gige) is:
|> |> |
|> |> | - Run a simple latency test, as Dr. Panda suggested.  Your latency
|> |> | should be single-digit microseconds (exact numbers depend on your
|> |> | hardware -- this might be all older stuff since you mentioned
|> |> | "Topspin", not "Cisco"; Topspin was acquired by Cisco quite a while
|> |> | ago...).  If your latency is much higher than that (e.g., 50 us),
|> |> | you're using gige.
|> |> |
|> |> |
|> |> |
|> |> |> Thanks.
|> |> |>
|> |> |> Henry, Wu.
|> |> |> | In addition to what Dr. Panda said, Cisco recommends that all
|> HPC
|> |> |> | customers upgrade to the OFED IB driver stack if possible (some
|> |> |> | customers cannot upgrade for various reasons).  FWIW: all new
|> |> HPC/
|> |> |> MPI
|> |> |> | work is occurring in the OFED arena.
|> |> |> |
|> |> |> | I bring this up because you specifically mention Topspin
|> |> Infiniband,
|> |> |> | which I'm *assuming* is the Cisco IB stack (not the OFED IB
|> |> stack),
|> |> |> | and is therefore shipping with a somewhat older version of
|> |> MVAPICH
|> |> |> | that was derived from the OSU MVAPICH.  The Cisco MVAPICH should
|> |> |> only
|> |> |> | be compiled with IB support enabled; a simple latency test
|> should
|> |> |> | prove that you're running over IB and not ethernet.
|> |> |> |
|> |> |> | Much more recent versions of MPI implementations are included
|> |> with
|> |> |> | the OFED stack (Cisco provides binary distributions of OFED on
|> |> |> | www.cisco.com).
|> |> |> |
|> |> |> |
|> |> |> | On Aug 29, 2007, at 11:44 AM, Dhabaleswar Panda wrote:
|> |> |> |
|> |> |> |>
|> |> |> |>
|> |> |> |> On Wed, 29 Aug 2007 wgy at altair.com.cn wrote:
|> |> |> |>
|> |> |> |>> Hello, list:
|> |> |> |>> It might be a silly questions but I wonder how to verify run
|> |> with
|> |> |> |>> mvapich
|> |> |> |>> (come with Topspin Infiniband) over Infiniband, NOT Gigabite
|> |> |> network.
|> |> |> |>> Is there an option to force mvapich to use IB network
|> otherwise
|> |> |> |>> just exits?
|> |> |> |>
|> |> |> |> MVAPICH has several underlying interfaces: Gen2, uDAPL, VAPI,
|> |> TCP/
|> |> |> |> IP and
|> |> |> |> shared memory. Please take a look at the user guide
|> |> (available from
|> |> |> |> mvapich project page) to see the differences and capabilities
|> of
|> |> |> these
|> |> |> |> interfaces. Gen2 interface (corresponding to OFED) will give
|> you
|> |> |> |> the best performance and scalability. If you have OFED stack
|> |> |> |> installed,
|> |> |> |> you should be able to configure mvapich to run over Gen2
|> |> interface
|> |> |> |> (as per the instructions indicated in the user guide). During
|> |> OFED
|> |> |> |> installation, you can also select mvapich from the package.
|> |> |> |>
|> |> |> |> On your existing installation, you can also run OSU benchmarks
|> |> |> (such
|> |> |> |> as OSU latency). If you get latency number in the range of 2~4
|> |> |> |> microsec
|> |> |> |> for short messages (say 4 bytes), it is already running over
|> the
|> |> |> |> native
|> |> |> |> IB.
|> |> |> |>
|> |> |> |> Hope this helps.
|> |> |> |>
|> |> |> |> DK
|> |> |> |>
|> |> |> |>> Thanks for your suggestion.
|> |> |> |>> Rdgs.
|> |> |> |>> Henry, Wu
|> |> |> |>>
|> |> |> |>> _______________________________________________
|> |> |> |>> mvapich-discuss mailing list
|> |> |> |>> mvapich-discuss at cse.ohio-state.edu
|> |> |> |>>
|> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|> |> |> |>>
|> |> |> |>
|> |> |> |> _______________________________________________
|> |> |> |> mvapich-discuss mailing list
|> |> |> |> mvapich-discuss at cse.ohio-state.edu
|> |> |> |> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|> |> |> |
|> |> |> |
|> |> |> | --
|> |> |> | Jeff Squyres
|> |> |> | Cisco Systems
|> |> |> |
|> |> |> |
|> |> |
|> |> |
|> |> | --
|> |> | Jeff Squyres
|> |> | Cisco Systems
|> |> |
|> |> |
|> |
|> |
|> | --
|> | Jeff Squyres
|> | Cisco Systems
|> |
|> |
|>
|>
|> _______________________________________________
|> mvapich-discuss mailing list
|> mvapich-discuss at cse.ohio-state.edu
|> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
|>
|
|




More information about the mvapich-discuss mailing list