[mvapich-discuss] Allreduce performance issue on OSC Ruby system

You, Zhi-Qiang zyou at osc.edu
Wed Jan 9 12:03:27 EST 2019


Dear Developers,

I ran a benchmark for mvapich2 2.2 and 2.3 using allreduce from omb-5.4.3.  For each test,  I ran ‘osu_allreduce’ 10 times with maximum message size 1048576 bytes. With 2.3, I got few large latencies randomly  but 2.2 yields a reasonable median:
> grep '1048576'  omb_ruby_allreduce/2.3/*.out
1048576              2047.85
1048576             16742.61
1048576              2105.28
1048576             66418.19
1048576             32853.53
1048576              2199.08
1048576              2133.44
1048576              3069.55
1048576              2116.68
1048576              2057.53

The same benchmark performed  on OSC Owens system always gets a good median. The major differences between Ruby and Owens are

OS: RHEL 6.10 (Ruby), RHEL 7.4 (Owens)
Infiniband: Mellanox FDR (Ruby), Mellanox EDR (Owens)

I wonder if there is any variable I need to set to fix this problem?

I have provided the build information and verbose output. Please find them in the attachment.

Thank you,
ZQ







--
Zhi-Qiang You
Scientific Applications Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-8492<tel:+16142928492> • Fax: (614) 292-7168<tel:+16142927168>
zyou at osc.edu<mailto:zyou at osc.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190109/56270685/attachment.html>


More information about the mvapich-discuss mailing list