[mvapich-discuss] Allreduce performance issue on OSC Ruby system

Bayatpour, Mohammadreza bayatpour.1 at buckeyemail.osu.edu
Mon Feb 4 16:07:07 EST 2019


We had an offline discussion with the reporter. The following two runtime environmental parameters fix the performance on the Ruby cluster:

MV2_ENABLE_AFFINITY=1
MV2_ALLREDUCE_RED_SCAT_ALLGATHER_ALGO_THRESHOLD=4M

We are closing this report.

Thanks,
Mamzi
________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of You, Zhi-Qiang <zyou at osc.edu>
Sent: Wednesday, January 9, 2019 12:03 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] Allreduce performance issue on OSC Ruby system


Dear Developers,



I ran a benchmark for mvapich2 2.2 and 2.3 using allreduce from omb-5.4.3.  For each test,  I ran ‘osu_allreduce’ 10 times with maximum message size 1048576 bytes. With 2.3, I got few large latencies randomly  but 2.2 yields a reasonable median:

> grep '1048576'  omb_ruby_allreduce/2.3/*.out

1048576              2047.85

1048576             16742.61

1048576              2105.28

1048576             66418.19

1048576             32853.53

1048576              2199.08

1048576              2133.44

1048576              3069.55

1048576              2116.68

1048576              2057.53



The same benchmark performed  on OSC Owens system always gets a good median. The major differences between Ruby and Owens are



OS: RHEL 6.10 (Ruby), RHEL 7.4 (Owens)

Infiniband: Mellanox FDR (Ruby), Mellanox EDR (Owens)



I wonder if there is any variable I need to set to fix this problem?



I have provided the build information and verbose output. Please find them in the attachment.



Thank you,

ZQ















--

Zhi-Qiang You
Scientific Applications Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-8492<tel:+16142928492> • Fax: (614) 292-7168<tel:+16142927168>
zyou at osc.edu<mailto:zyou at osc.edu>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190204/7b24924c/attachment.html>


More information about the mvapich-discuss mailing list