[mvapich-discuss] Problem with linpack/mvapich2/BLCR

wei huang huanwei at cse.ohio-state.edu
Thu Oct 11 22:20:32 EDT 2007


Hi Patrice,

Could you please apply this patch and see if this solves the problem. This
patch fixes a possible mis-calculation on receive size when using complex
datatypes.

Modified:
mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c
===================================================================
--- mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c  2007-10-10 17:42:09 UTC (rev 1570)
+++ mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c  2007-10-11 18:03:21 UTC (rev 1571)
@@ -141,7 +141,7 @@
     MPIDI_CH3I_CR_lock();
 #endif

-    if (rreq->dev.iov_count == 1)
+    if (rreq->dev.iov_count == 1 && rreq->dev.OnDataAvail == NULL)
        cts_pkt->recv_sz = rreq->dev.iov[0].MPID_IOV_LEN;
     else
        cts_pkt->recv_sz = rreq->dev.segment_size;


Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Fri, 5 Oct 2007, Patrice Martinez wrote:

> wei huang a ecrit :
>
>  Hi Patrice,
>
> We tried running hpl with BLCR support and it looks fine. We have run the
> test on two set of machines. One is dual processor Intel Xeon nodes, we
> run 4 processes with 2 processes on each node. We also ran the test on 880
> Opteron (quad dual-core), hosting all 4 processes on one node.
>
> We try to use as similar HPL input as you are using, see below:
>
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> WR00C2L4        5000   112     4     1               6.64          1.255e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0355903 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0234950 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0045862 ...... PASSED
>
>
> The difference is that we don't have intel-mkl installed. So we are using
> HPL with goto library. Could you let us know if you can reproduce the
> problem with goto?
>
> Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
>
>
> On Mon, 1 Oct 2007, Patrice Martinez wrote:
>
>
>
>  Hello,
>
> I encounter problem running linpack benchmark with mvapich2 configured for BLCR support: computations  are sometimes right, sometimes wrong.
> Let me describe the context:
>
>
>             Hardware used:
>
>  1.
>
>     Bull Novascale R422, 2xXeon Core 2 Duo 5150@ 2.66 Ghz, 8Gb de RAM
>
>  2.
>
>      IB HCA Mellanox MT25208 dual-port
>
>             Software used
>
>  1.
>
>     RHEL4 U4, kernel 2.6.9.42-ELSmp,
>
>  2.
>
>     gcc-3.4.6
>
>  3.
>
>     intel mkl 9.1
>
>  4.
>
>     blcr-0.6.0,
>
>  5.
>
>     mvapich2-1.0,
>
>  6.
>
>     OFED-1.2.5.1,
>
>  7.
>
>     linpack-9.1
>
>
>
>             Tests
>
>
> -For this test, the two ports of the  IB HCA are connected together.
>
> -I made the following link to avoid problems forwarding environment variables:
>
> #l /lib64/libcr.so.0
> lrwxrwxrwx  1 root root 23 Sep 21 11:19 /lib64/libcr.so.0 -> /usr/local/lib/libcr.so
>
> - blcr modules are loaded:
>
> service blcr start
>
> - mpd daemon is run:
>
> mpdboot --ncpus=4
>
> - And finally, linpack is configured to invert a small matrix (N=5000), and linpack is executed:
>
> mpiexec -n 4 ./xhpl
>
>
> Analyse
>
>
> Depending on the parameters P and Q given in the HPL.dat file, computations are always right or always wrong...
> With  P=4, Q=1:
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L4         5000   112     4     1               4.28          1.948e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) = 25110713646301407346688.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 155458419119.8088379 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 17288875125.5442734 ...... FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 17973740643825.015625
> ||A||_oo . . . . . . . . . . . . . . . . . . . =        1283.266028
> ||A||_1  . . . . . . . . . . . . . . . . . . . =        1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 1459401545070.356201
> ||x||_1  . . . . . . . . . . . . . . . . . . . = 807634407595160.750000
> ============================================================================
>
> With  P=2, Q=2
>
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L4         5000   112     2     2               3.39          2.459e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0420265 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0277438 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0054156 ...... PASSED
> ============================================================================
>
>
> It is interesting to see that computations are faster when they're right...
>
> When using mvapich2 compiled without BLCR support, computations are always right, of course.
> Any idea?
>
>  --
>
> Cordialement/Best regards
>
> Patrice Martinez
>
> Linux Kernel Architect.
>
> OFFICE : B1-405
> PHONE  : +33 (0)4 76 29 74 69
> EMAIL  : Patrice.martinez at bull.net
> ADDR   : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
>
>
>
>
> Hi Huang,
>
> Following you advice, I built a running linpack with the libgoto (1.19),.
> Then I connected one port of my HCA to an IB switch, and I tried again running linpack
>
> Alas, results are still the same :
>
> With P=4, Q=1:
>
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L4         5000   112     4     1               4.15          2.008e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) = 760941379859326173184.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 117700692102.2814941 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 12754053137.7746639 ...... FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 544666439966.367126
> ||A||_oo . . . . . . . . . . . . . . . . . . . =        1283.266028
> ||A||_1  . . . . . . . . . . . . . . . . . . . =        1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . = 59949485905.747444
> ||x||_1  . . . . . . . . . . . . . . . . . . . = 32325272106219.675781
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L2         5000   112     4     1               4.12          2.024e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) = 98412506262587736064.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 185983098083.3132324 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 15844011655.4434471 ...... FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 70441680335.640015
> ||A||_oo . . . . . . . . . . . . . . . . . . . =        1283.266028
> ||A||_1  . . . . . . . . . . . . . . . . . . . =        1289.434188
> ||x||_oo . . . . . . . . . . . . . . . . . . . =  6241193140.782863
> ||x||_1  . . . . . . . . . . . . . . . . . . . = 2645737899755.351562
> ============================================================================
>
> Finished      2 tests with the following results:
>               0 tests completed and passed residual checks,
>               2 tests completed and failed residual checks,
>               0 tests skipped because of illegal input values.
> ----------------------------------------------------------------------------
>
> End of Tests.
> ============================================================================
>
> With P=Q=2, computations are still right:
>
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L4         5000   112     2     2               3.36          2.479e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0144981 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0095710 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0018682 ...... PASSED
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W00C2L2         5000   112     2     2               3.35          2.492e+01
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0153810 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0101538 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0019820 ...... PASSED
> ============================================================================
>
> The only positive fact is that libgoto seems slighly faster than mkl when linpack works well, but it doesn't help much for this problem ;-)!
>
> If you have any idea, it would be great, because I really don't know what to do now!
> --
>
>  Cordialement/Best regards
>
> Patrice Martinez
>
> Linux Kernel Architect.
> Bull, Architect of an Open World
>
> OFFICE : B1-405
> PHONE  : +33 (0)4 76 29 74 69
> EMAIL  : Patrice.martinez at bull.net
> ADDR   : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
>
> Bull recrute : http://www.bull.fr/emploi
>
>
>




More information about the mvapich-discuss mailing list