[Mvapich-discuss] Error compiling with CUDA 11.4 and GNU 11.1.0

Subramoni, Hari subramoni.1 at osu.edu
Tue Aug 16 15:33:42 EDT 2022


Dear, Dr. Levi.

We got the request and are working on it. We should be able to get back to soon.

Best,
Hari.

From: Levi, Mariana <m.levi at northeastern.edu>
Sent: Tuesday, August 16, 2022 3:26 PM
To: Subramoni, Hari <subramoni.1 at osu.edu>; mvapich-discuss at lists.osu.edu
Cc: Subramoni, Hari <subramoni.1 at osu.edu>
Subject: Re: Error compiling with CUDA 11.4 and GNU 11.1.0

Hi Hari, Thanks for the quick response. I wasn’t able to find the particular combination I’m using for the GDR rpm here: http://mvapich.cse.ohio-state.edu/downloads/ I did submit a request using the link you shared though Tesla GPU architecture
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
    Report Suspicious  <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQdMiaND6YApRdxfo6lecgCBXpRuUe_cFDoQsPH4oW4BvjL_4xKnwC9urrUNRciIUpwgjs71vEooPQ8Uop5jeDonZBC4VFTM4o5-uGyanvm_Rm73CTLLTSA4A_gwvOAusoh67riMhm2YA$>   ‌
ZjQcmQRYFpfptBannerEnd
Hi Hari,

Thanks for the quick response.
I wasn’t able to find the particular combination I’m using for the GDR rpm here:
http://mvapich.cse.ohio-state.edu/downloads/
I did submit a request using the link you shared though Tesla GPU architecture was not in the option list.

Would be looking forward to see that RPM package available soon.

Thanks again for your help.

Best,

Mariana Levi, Ph.D.
Computational Scientist
Research Computing, Information Technology Services
Northeastern University
617-470-4022


From: Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>>
Date: Tuesday, August 16, 2022 at 10:00 PM
To: Levi, Mariana <m.levi at northeastern.edu<mailto:m.levi at northeastern.edu>>, mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu> <mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>>
Cc: Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>>
Subject: RE: Error compiling with CUDA 11.4 and GNU 11.1.0
Hello, Dr. Levi.

We only have basic support for GPU-enabled clusters in MVAPICH2 2.3.7.

For best performance, functionality, and the latest features on GPU-enabled clusters, we strongly recommend using MVAPICH2-GDR. It is available as an RPM package from our download page.

If you do not find the exact version you’re looking for, kindly fill out this form and we can build it for you.

http://mvapich.cse.ohio-state.edu/GDRform/<https://urldefense.com/v3/__https:/nam12.safelinks.protection.outlook.com/?url=http*3A*2F*2Fmvapich.cse.ohio-state.edu*2FGDRform*2F&data=05*7C01*7Cm.levi*40northeastern.edu*7C930cf67f5f414b92b13e08da7fb989b4*7Ca8eec281aaa34daeac9b9a398b9215e7*7C0*7C0*7C637962732117709675*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*7C*7C*7C&sdata=J8E0tjx7qFKzc*2BudLpQL4Ulz8eFVVbsqoXzheS8KGPY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!KGKeukY!21Vssrn3dSTMDSVYCbDUQ3cMAE-a3Ozw6G6uPKtudq3_-R4eYA8Q9ml4kElg5OcQWEoKgwLyzMzqH9xYDHKhlCD6$>

Best,
Hari.

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu<mailto:mvapich-discuss-bounces at lists.osu.edu>> On Behalf Of Levi, Mariana via Mvapich-discuss
Sent: Tuesday, August 16, 2022 2:48 PM
To: mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Error compiling with CUDA 11.4 and GNU 11.1.0

Hi MVAPICH2 team, I’m trying to install a MVAPICH 2.3.7 with GNU 11.1.0 and CUDA 11.4 on an HPC cluster (Centos 7). I’m using the following command: FFLAGS="-w -fallow-argument-mismatch -O2" ../configure --prefix=/shared/centos7/mvapich2/2.3.7-gcc11.1-cuda11.4
Hi MVAPICH2 team,

I’m trying to install a MVAPICH 2.3.7 with GNU 11.1.0 and CUDA 11.4 on an HPC cluster (Centos 7). I’m using the following command:

FFLAGS="-w -fallow-argument-mismatch -O2" ../configure --prefix=/shared/centos7/mvapich2/2.3.7-gcc11.1-cuda11.4 --with-device=ch3:mrail --with-rdma=gen2 --enable-threads=multiple --enable-fortran=all --enable-fast --with-pmi=pmi2 --with-pm=slurm --enable-slurm=yes --with-libcuda=/shared/centos7/cuda/11.4/targets/x86_64-linux/lib/stubs --with-libcudart=/shared/centos7/cuda/11.4/targets/x86_64-linux/lib --with-cuda=/shared/centos7/cuda/11.4 --with-cuda-include=/shared/centos7/cuda/11.4/include --with-cuda-libpath=/shared/centos7/cuda/11.4/lib64

make -j

The architecture I’m building on is Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (Skylake_avx512 microarchitecture) with V100 NVIDIA GPUs, and InfiniBand network (Mellanox OFED version 5.3) support.

Could you please assist with the following errors I’m getting:

../../../../contrib/hwloc_v1/src/topology-opencl.c: In function ‘hwloc_opencl_query_devices’:
../../../../contrib/hwloc_v1/src/topology-opencl.c:108:5: error: unknown type name ‘cl_device_topology_amd’
  108 |     cl_device_topology_amd amdtopo;
      |     ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/hwloc_v1/src/topology-opencl.c:171:9: error: ‘CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD’ undeclared (first use in this function); did you mean ‘CL_DEVICE_TOPOLOGY_AMD’?
  171 |     if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |         CL_DEVICE_TOPOLOGY_AMD
../../../../contrib/hwloc_v1/src/topology-opencl.c:171:9: note: each undeclared identifier is reported only once for each function it appears in
../../../../contrib/hwloc_v1/src/topology-opencl.c:171:52: error: request for member ‘raw’ in something not a structure or union
  171 |     if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) {
      |                                                    ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:172:53: error: request for member ‘raw’ in something not a structure or union
  172 |       hwloc_debug("not a PCIe device: %u\n", amdtopo.raw.type);
      |                                                     ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:178:40: error: request for member ‘pcie’ in something not a structure or union
  178 |     info->specific.amd.pcibus = amdtopo.pcie.bus;
      |                                        ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:179:40: error: request for member ‘pcie’ in something not a structure or union
  179 |     info->specific.amd.pcidev = amdtopo.pcie.device;
      |                                        ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:180:41: error: request for member ‘pcie’ in something not a structure or union
  180 |     info->specific.amd.pcifunc = amdtopo.pcie.function;
      |                                         ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:183:35: error: request for member ‘pcie’ in something not a structure or union
  183 |                 (unsigned) amdtopo.pcie.bus, (unsigned) amdtopo.pcie.device, (unsigned) amdtopo.pcie.function);
      |                                   ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:183:64: error: request for member ‘pcie’ in something not a structure or union
  183 |                 (unsigned) amdtopo.pcie.bus, (unsigned) amdtopo.pcie.device, (unsigned) amdtopo.pcie.function);
      |                                                                ^
../../../../contrib/hwloc_v1/src/topology-opencl.c:183:96: error: request for member ‘pcie’ in something not a structure or union
  183 |                 (unsigned) amdtopo.pcie.bus, (unsigned) amdtopo.pcie.device, (unsigned) amdtopo.pcie.function);
      |                                                                                                ^
make[3]: *** [topology-opencl.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `/shared/centos7/mvapich2/src/mvapich2-2.3.7/build-gcc11.1-cuda11.4/contrib/hwloc_v1/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/shared/centos7/mvapich2/src/mvapich2-2.3.7/build-gcc11.1-cuda11.4/contrib/hwloc_v1'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/shared/centos7/mvapich2/src/mvapich2-2.3.7/build-gcc11.1-cuda11.4'
make: *** [all] Error 2

I’ve also attached the config.log file for reference.

Thanks in advance for your assistance.

Best,

Mariana Levi, Ph.D.
Computational Scientist
Research Computing, Information Technology Services
Northeastern University
617-470-4022

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220816/386049b1/attachment-0015.html>


More information about the Mvapich-discuss mailing list