[mvapich-discuss] global summation very slow

Greipel.Joachim at mh-hannover.de Greipel.Joachim at mh-hannover.de
Mon Jun 7 05:23:18 EDT 2010


Dear all,
 
I compiled CPMD for use with mvapich2 over Infiniband, the mvapich2 is
version 1.4.1. The program does not scale at all, because the global
summation, and, in part, all to all communication, is exceedingly slow
when the processes run on different nodes. See below the wat32 test of
CPMD as an example.
 
 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE               34813. BYTES               3855.  =
 = BROADCAST                  12720. BYTES                267.  =
 = GLOBAL SUMMATION           14293. BYTES                644.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           419075. BYTES               6258.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              265.685  MB/S           0.505 SEC  =
 = BROADCAST                  32.876  MB/S           0.103 SEC  =
 = GLOBAL SUMMATION            0.609  MB/S          60.432 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM            44.715  MB/S          58.651 SEC  =
 = SYNCHRONISATION                                   0.007 SEC  =
 ================================================================

When I use as much processes on one node as possible the results are
slightly better:
 
 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE               69639. BYTES               1799.  =
 = BROADCAST                  13112. BYTES                259.  =
 = GLOBAL SUMMATION           14293. BYTES                644.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           834296. BYTES               6258.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              309.035  MB/S           0.405 SEC  =
 = BROADCAST                  89.488  MB/S           0.038 SEC  =
 = GLOBAL SUMMATION           17.658  MB/S           1.564 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM           173.341  MB/S          30.120 SEC  =
 = SYNCHRONISATION                                   0.010 SEC  =
 ================================================================

But in no case the performance with global summation is nearly
satisfying. The global summation performance should be at least
50-100fold higher than it is.
 
Does anyone have a clue what is wrong?
 
Regards,
Joachim
 
 
--
Dr. rer. nat. Joachim Greipel
Med. Hochschule Hannover
Biophys. Chem. OE 4350
Carl-Neuberg-Str. 1
30625 Hannover
Germany
 
Fon: +49-511-532-3718
Fax: +49-511-532-8924
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100607/fe822846/attachment.html


More information about the mvapich-discuss mailing list