[mvapich-discuss] Allreduce performance issue on OSC Ruby system
You, Zhi-Qiang
zyou at osc.edu
Wed Jan 9 12:03:27 EST 2019
Dear Developers,
I ran a benchmark for mvapich2 2.2 and 2.3 using allreduce from omb-5.4.3. For each test, I ran ‘osu_allreduce’ 10 times with maximum message size 1048576 bytes. With 2.3, I got few large latencies randomly but 2.2 yields a reasonable median:
> grep '1048576' omb_ruby_allreduce/2.3/*.out
1048576 2047.85
1048576 16742.61
1048576 2105.28
1048576 66418.19
1048576 32853.53
1048576 2199.08
1048576 2133.44
1048576 3069.55
1048576 2116.68
1048576 2057.53
The same benchmark performed on OSC Owens system always gets a good median. The major differences between Ruby and Owens are
OS: RHEL 6.10 (Ruby), RHEL 7.4 (Owens)
Infiniband: Mellanox FDR (Ruby), Mellanox EDR (Owens)
I wonder if there is any variable I need to set to fix this problem?
I have provided the build information and verbose output. Please find them in the attachment.
Thank you,
ZQ
--
Zhi-Qiang You
Scientific Applications Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-8492<tel:+16142928492> • Fax: (614) 292-7168<tel:+16142927168>
zyou at osc.edu<mailto:zyou at osc.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190109/56270685/attachment.html>
More information about the mvapich-discuss
mailing list