[mvapich-discuss] mvapich compilation error with psm

Cabral, Matias A matias.a.cabral at intel.com
Wed Nov 30 19:19:21 EST 2016


Hello Chulwoo,

> Just to remind you, our ultimate goal is to saturate bandwidth of dual rail OPA

In summary, we have run some tests on the setup you mention (KNL with dual Omni-Path cards) and we have achieved nearly link speed by running different osu_ benchmarks for bandwidth and message rates measurements and using 4 ranks per node on a pair of nodes and not using any special build flags.
./configure  --without-mpe --disable-mcast --enable-shared --with-device=ch3:psm --with-psm2 --with-ch3-rank-bits=32 --disable-maintainer-mode

Taking the above as a baseline, there are way too many variations on what may be executing to reach link speed. E.g. CPU intense workload, message size, number of ranks, number of nodes, build time options, etc.

There is a general tuning guideline for OPA with a section (5.3) for Xeon Phi you may look at. (Extending the eager_buffer_size may also help for KNL) http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v5_0.pdf
Note that PSM2 multirail will *not* help in his setup. Multirail use case is to have a single process send over different HFI cards. Being this a KNL, you definitely want to run more ranks instead of a single one. On the other hand, PSM2 (and the driver) by default associate processes with HFI cards in a round robin fashion.

Regards,

_MAC

From: mvapich-discuss [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Hari Subramoni
Sent: Saturday, November 26, 2016 5:27 PM
To: Jung <chulwoo at quark.phy.bnl.gov>
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mvapich compilation error with psm


Hello Chulwoo,

We did some more investigation internally. It looks like the error stems from the "--enable-thread-cs=per-object" configure option. We are able to build it fine if that option is removed.

It looks like we inherited the error from MPICH (of which we are a derivative). Further investigation revealed that the error also exists with the latest mpich release.

While we follow up with the MPICH team on this issue, may I request that you try a build without the --enable-thread-cs=per-object option? If needed, feel free to post the issue directly on the mpich discuss list as well.

Regards,
Hari.

On Nov 26, 2016 7:40 PM, "Jung" <chulwoo at quark.phy.bnl.gov<mailto:chulwoo at quark.phy.bnl.gov>> wrote:
Dear Dhabaleswar,

Thank you very much for the quick reply. Could you confirm which --enable-threads and --enable-thread-cs options are available for single OPA, if you know? We are still trying to set our MPI strategy, and whether we can expect, in the near future if not at present, to be able to truly use multiple commnication on different threads on 1 MPI is the biggest question.

Best,
Chulwoo



On Fri, 25 Nov 2016, Panda, Dhabaleswar wrote:
Hi Jung,
 > The main unknown factor here is whether PSM layer can handle dual
 Omni-path (or QLogic/TrueLogic) adapters or not. We have asked some Intel
 folks about it and have not received any reply. Please check with your
 Intel representatives about it.

Thanks,

DK
________________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Jung [chulwoo at quark.phy.bnl.gov<mailto:chulwoo at quark.phy.bnl.gov>]
Sent: Friday, November 25, 2016 1:23 AM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
Subject: [mvapich-discuss] mvapich compilation error with psm

Hi,

I am getting an error when I'm trying to compile on a KNL machine with
dual omni-path. --enable-thread-cs=lock-free also fails to compile.

Is there something wrong with the configure options, or the
enable-thread-cs is limited to global on OPA?

Just to remind you, our ultimate goal is to saturate bandwidth of dual
rail OPA.

Best,
Chulwoo Jung

  $ ../mvapich2-2.2/configure --with-device=ch3:psm
--with-psm2-include=/usr/include --with-psm2-lib=/usr/lib64 --enable-mcast
--enable-threads=multiple --enable-thread-cs=per-object
--prefix=/share/test/chulwoo/mvapich2_test


  CC       src/mpid/ch3/src/lib_libmpi_la-ch3u_rndv.lo
In file included from ../mvapich2-2.2/src/include/mpiimpl.h(3878),
                 from ../mvapich2-2.2/src/mpid/ch3/include/mpidimpl.h(36),
                 from ../mvapich2-2.2/src/mpid/ch3/src/ch3u_rndv.c(18):
../mvapich2-2.2/src/include/mpiimplthreadpost.h(27): warning #159:
declaration is incompatible with previous
"MPIU_Thread_CS_enter_lockname_impl_" (declared at line 1343 of
"../mvapich2-2.2/src/include/mpiimpl.h")
  MPIU_Thread_CS_enter_lockname_impl_(enum MPIU_Nest_mutexes kind,
  ^

In file included from ../mvapich2-2.2/src/include/mpiimpl.h(3878),
                 from ../mvapich2-2.2/src/mpid/ch3/include/mpidimpl.h(36),
                 from ../mvapich2-2.2/src/mpid/ch3/src/ch3u_rndv.c(18):
../mvapich2-2.2/src/include/mpiimplthreadpost.h(46): warning #159:
declaration is incompatible with previous
"MPIU_Thread_CS_exit_lockname_impl_" (declared at line 1343 of
"../mvapich2-2.2/src/include/mpiimpl.h")
  MPIU_Thread_CS_exit_lockname_impl_(enum MPIU_Nest_mutexes kind,
  ^

../mvapich2-2.2/src/mpid/ch3/src/ch3u_rndv.c(210): error: expression must
have arithmetic or pointer type
      if (!found && rreq->cc == 0) {
                    ^

compilation aborted for ../mvapich2-2.2/src/mpid/ch3/src/ch3u_rndv.c (code
2)
make[2]: *** [src/mpid/ch3/src/lib_libmpi_la-ch3u_rndv.lo] Error 1
make[2]: Leaving directory `/root/chulwoo/mvapich2/testbuild'
make[1]: *** [all-recursive] Error 1


Chulwoo Jung
Physics Department
Brookhaven National Laboratory
U.S.A.
chulwoo at bnl.gov<mailto:chulwoo at bnl.gov>
1-631-344-5254<tel:1-631-344-5254>
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

Chulwoo Jung
Physics Department
Brookhaven National Laboratory
U.S.A.
chulwoo at bnl.gov<mailto:chulwoo at bnl.gov>
1-631-344-5254<tel:1-631-344-5254>
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161201/05ead537/attachment-0001.html>


More information about the mvapich-discuss mailing list