[mvapich-discuss] Allreduce performance issue on OSC Ruby system
Bayatpour, Mohammadreza
bayatpour.1 at buckeyemail.osu.edu
Mon Feb 4 16:07:07 EST 2019
We had an offline discussion with the reporter. The following two runtime environmental parameters fix the performance on the Ruby cluster:
MV2_ENABLE_AFFINITY=1
MV2_ALLREDUCE_RED_SCAT_ALLGATHER_ALGO_THRESHOLD=4M
We are closing this report.
Thanks,
Mamzi
________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of You, Zhi-Qiang <zyou at osc.edu>
Sent: Wednesday, January 9, 2019 12:03 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] Allreduce performance issue on OSC Ruby system
Dear Developers,
I ran a benchmark for mvapich2 2.2 and 2.3 using allreduce from omb-5.4.3. For each test, I ran ‘osu_allreduce’ 10 times with maximum message size 1048576 bytes. With 2.3, I got few large latencies randomly but 2.2 yields a reasonable median:
> grep '1048576' omb_ruby_allreduce/2.3/*.out
1048576 2047.85
1048576 16742.61
1048576 2105.28
1048576 66418.19
1048576 32853.53
1048576 2199.08
1048576 2133.44
1048576 3069.55
1048576 2116.68
1048576 2057.53
The same benchmark performed on OSC Owens system always gets a good median. The major differences between Ruby and Owens are
OS: RHEL 6.10 (Ruby), RHEL 7.4 (Owens)
Infiniband: Mellanox FDR (Ruby), Mellanox EDR (Owens)
I wonder if there is any variable I need to set to fix this problem?
I have provided the build information and verbose output. Please find them in the attachment.
Thank you,
ZQ
--
Zhi-Qiang You
Scientific Applications Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-8492<tel:+16142928492> • Fax: (614) 292-7168<tel:+16142927168>
zyou at osc.edu<mailto:zyou at osc.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190204/7b24924c/attachment.html>
More information about the mvapich-discuss
mailing list