[mvapich-discuss] Problem with linpack/mvapich2/BLCR (fwd)
wei huang
huanwei at cse.ohio-state.edu
Wed Oct 3 16:53:04 EDT 2007
Hi Patrice,
Forgot to mention in my last email. We are using BLCR-0.6.1 (the latest
version) and OFED-1.2.5.1.
Thanks.
Regards,
Wei Huang
774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501
---------- Forwarded message ----------
Date: Wed, 3 Oct 2007 16:31:11 -0400 (EDT)
From: wei huang <huanwei at cse.ohio-state.edu>
To: Patrice Martinez <patrice.martinez at bull.net>
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Problem with linpack/mvapich2/BLCR
Hi Patrice,
We tried running hpl with BLCR support and it looks fine. We have run the
test on two set of machines. One is dual processor Intel Xeon nodes, we
run 4 processes with 2 processes on each node. We also ran the test on 880
Opteron (quad dual-core), hosting all 4 processes on one node.
We try to use as similar HPL input as you are using, see below:
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
WR00C2L4 5000 112 4 1 6.64 1.255e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0355903 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0234950 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0045862 ...... PASSED
The difference is that we don't have intel-mkl installed. So we are using
HPL with goto library. Could you let us know if you can reproduce the
problem with goto?
Thanks.
Regards,
Wei Huang
774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501
On Mon, 1 Oct 2007, Patrice Martinez wrote:
>
> Hello,
>
> I encounter problem running linpack benchmark with mvapich2 configured for BLCR support: computations are sometimes right, sometimes wrong.
> Let me describe the context:
>
>
> Hardware used:
>
> 1.
>
> Bull Novascale R422, 2xXeon Core 2 Duo 5150@ 2.66 Ghz, 8Gb de RAM
>
> 2.
>
> IB HCA Mellanox MT25208 dual-port
>
> Software used
>
> 1.
>
> RHEL4 U4, kernel 2.6.9.42-ELSmp,
>
> 2.
>
> gcc-3.4.6
>
> 3.
>
> intel mkl 9.1
>
> 4.
>
> blcr-0.6.0,
>
> 5.
>
> mvapich2-1.0,
>
> 6.
>
> OFED-1.2.5.1,
>
> 7.
>
> linpack-9.1
>
>
>
> Tests
>
>
> -For this test, the two ports of the IB HCA are connected together.
>
> -I made the following link to avoid problems forwarding environment variables:
>
> #l /lib64/libcr.so.0
> lrwxrwxrwx 1 root root 23 Sep 21 11:19 /lib64/libcr.so.0 -> /usr/local/lib/libcr.so
>
> - blcr modules are loaded:
>
> service blcr start
>
> - mpd daemon is run:
>
> mpdboot --ncpus=4
>
> - And finally, linpack is configured to invert a small matrix (N=5000), and linpack is executed:
>
> mpiexec -n 4 ./xhpl
>
>
> Analyse
>
>
> Depending on the parameters P and Q given in the HPL.dat file, computations are always right or always wrong...
> With P=4, Q=1:
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 4 1 4.28 1.948e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 25110713646301407346688.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 155458419119.8088379 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 17288875125.5442734 ...... FAILED
> ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 17973740643825.015625
> ||A||_oo . . . . . . . . . . . . . . . . . . . = 1283.266028
> ||A||_1 . . . . . . . . . . . . . . . . . . . = 1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 1459401545070.356201
> ||x||_1 . . . . . . . . . . . . . . . . . . . = 807634407595160.750000
> ============================================================================
>
> With P=2, Q=2
>
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 2 2 3.39 2.459e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0420265 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0277438 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0054156 ...... PASSED
> ============================================================================
>
>
> It is interesting to see that computations are faster when they're right...
>
> When using mvapich2 compiled without BLCR support, computations are always right, of course.
> Any idea?
>
> --
>
> Cordialement/Best regards
>
> Patrice Martinez
>
> Linux Kernel Architect.
>
> OFFICE : B1-405
> PHONE : +33 (0)4 76 29 74 69
> EMAIL : Patrice.martinez at bull.net
> ADDR : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
>
>
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list