[mvapich-discuss] Problem with linpack/mvapich2/BLCR
wei huang
huanwei at cse.ohio-state.edu
Thu Oct 11 22:20:32 EDT 2007
Hi Patrice,
Could you please apply this patch and see if this solves the problem. This
patch fixes a possible mis-calculation on receive size when using complex
datatypes.
Modified:
mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c
===================================================================
--- mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c 2007-10-10 17:42:09 UTC (rev 1570)
+++ mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c 2007-10-11 18:03:21 UTC (rev 1571)
@@ -141,7 +141,7 @@
MPIDI_CH3I_CR_lock();
#endif
- if (rreq->dev.iov_count == 1)
+ if (rreq->dev.iov_count == 1 && rreq->dev.OnDataAvail == NULL)
cts_pkt->recv_sz = rreq->dev.iov[0].MPID_IOV_LEN;
else
cts_pkt->recv_sz = rreq->dev.segment_size;
Thanks.
Regards,
Wei Huang
774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501
On Fri, 5 Oct 2007, Patrice Martinez wrote:
> wei huang a ecrit :
>
> Hi Patrice,
>
> We tried running hpl with BLCR support and it looks fine. We have run the
> test on two set of machines. One is dual processor Intel Xeon nodes, we
> run 4 processes with 2 processes on each node. We also ran the test on 880
> Opteron (quad dual-core), hosting all 4 processes on one node.
>
> We try to use as similar HPL input as you are using, see below:
>
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> WR00C2L4 5000 112 4 1 6.64 1.255e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0355903 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0234950 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0045862 ...... PASSED
>
>
> The difference is that we don't have intel-mkl installed. So we are using
> HPL with goto library. Could you let us know if you can reproduce the
> problem with goto?
>
> Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
>
>
> On Mon, 1 Oct 2007, Patrice Martinez wrote:
>
>
>
> Hello,
>
> I encounter problem running linpack benchmark with mvapich2 configured for BLCR support: computations are sometimes right, sometimes wrong.
> Let me describe the context:
>
>
> Hardware used:
>
> 1.
>
> Bull Novascale R422, 2xXeon Core 2 Duo 5150@ 2.66 Ghz, 8Gb de RAM
>
> 2.
>
> IB HCA Mellanox MT25208 dual-port
>
> Software used
>
> 1.
>
> RHEL4 U4, kernel 2.6.9.42-ELSmp,
>
> 2.
>
> gcc-3.4.6
>
> 3.
>
> intel mkl 9.1
>
> 4.
>
> blcr-0.6.0,
>
> 5.
>
> mvapich2-1.0,
>
> 6.
>
> OFED-1.2.5.1,
>
> 7.
>
> linpack-9.1
>
>
>
> Tests
>
>
> -For this test, the two ports of the IB HCA are connected together.
>
> -I made the following link to avoid problems forwarding environment variables:
>
> #l /lib64/libcr.so.0
> lrwxrwxrwx 1 root root 23 Sep 21 11:19 /lib64/libcr.so.0 -> /usr/local/lib/libcr.so
>
> - blcr modules are loaded:
>
> service blcr start
>
> - mpd daemon is run:
>
> mpdboot --ncpus=4
>
> - And finally, linpack is configured to invert a small matrix (N=5000), and linpack is executed:
>
> mpiexec -n 4 ./xhpl
>
>
> Analyse
>
>
> Depending on the parameters P and Q given in the HPL.dat file, computations are always right or always wrong...
> With P=4, Q=1:
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 4 1 4.28 1.948e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 25110713646301407346688.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 155458419119.8088379 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 17288875125.5442734 ...... FAILED
> ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 17973740643825.015625
> ||A||_oo . . . . . . . . . . . . . . . . . . . = 1283.266028
> ||A||_1 . . . . . . . . . . . . . . . . . . . = 1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 1459401545070.356201
> ||x||_1 . . . . . . . . . . . . . . . . . . . = 807634407595160.750000
> ============================================================================
>
> With P=2, Q=2
>
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 2 2 3.39 2.459e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0420265 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0277438 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0054156 ...... PASSED
> ============================================================================
>
>
> It is interesting to see that computations are faster when they're right...
>
> When using mvapich2 compiled without BLCR support, computations are always right, of course.
> Any idea?
>
> --
>
> Cordialement/Best regards
>
> Patrice Martinez
>
> Linux Kernel Architect.
>
> OFFICE : B1-405
> PHONE : +33 (0)4 76 29 74 69
> EMAIL : Patrice.martinez at bull.net
> ADDR : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
>
>
>
>
> Hi Huang,
>
> Following you advice, I built a running linpack with the libgoto (1.19),.
> Then I connected one port of my HCA to an IB switch, and I tried again running linpack
>
> Alas, results are still the same :
>
> With P=4, Q=1:
>
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 4 1 4.15 2.008e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 760941379859326173184.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 117700692102.2814941 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 12754053137.7746639 ...... FAILED
> ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 544666439966.367126
> ||A||_oo . . . . . . . . . . . . . . . . . . . = 1283.266028
> ||A||_1 . . . . . . . . . . . . . . . . . . . = 1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 59949485905.747444
> ||x||_1 . . . . . . . . . . . . . . . . . . . = 32325272106219.675781
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L2 5000 112 4 1 4.12 2.024e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 98412506262587736064.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 185983098083.3132324 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 15844011655.4434471 ...... FAILED
> ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 70441680335.640015
> ||A||_oo . . . . . . . . . . . . . . . . . . . = 1283.266028
> ||A||_1 . . . . . . . . . . . . . . . . . . . = 1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 6241193140.782863
> ||x||_1 . . . . . . . . . . . . . . . . . . . = 2645737899755.351562
> ============================================================================
>
> Finished 2 tests with the following results:
> 0 tests completed and passed residual checks,
> 2 tests completed and failed residual checks,
> 0 tests skipped because of illegal input values.
> ----------------------------------------------------------------------------
>
> End of Tests.
> ============================================================================
>
> With P=Q=2, computations are still right:
>
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L4 5000 112 2 2 3.36 2.479e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0144981 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0095710 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0018682 ...... PASSED
> ============================================================================
> T/V N NB P Q Time Gflops
> ----------------------------------------------------------------------------
> W00C2L2 5000 112 2 2 3.35 2.492e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0153810 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0101538 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0019820 ...... PASSED
> ============================================================================
>
> The only positive fact is that libgoto seems slighly faster than mkl when linpack works well, but it doesn't help much for this problem ;-)!
>
> If you have any idea, it would be great, because I really don't know what to do now!
> --
>
> Cordialement/Best regards
>
> Patrice Martinez
>
> Linux Kernel Architect.
> Bull, Architect of an Open World
>
> OFFICE : B1-405
> PHONE : +33 (0)4 76 29 74 69
> EMAIL : Patrice.martinez at bull.net
> ADDR : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
>
> Bull recrute : http://www.bull.fr/emploi
>
>
>
More information about the mvapich-discuss
mailing list