[mvapich-discuss] Which mvapich version to install on a GPUcluster without Mellanox OFED ?
Yussuf Ali
Yussuf.ali at jaea.go.jp
Wed Apr 18 21:57:22 EDT 2018
Hi Hari,
Thank you for your answer!
What exactly you mean by interconnect? I can definitely say it does not have a Mellnox infniband or Intel Omni-Path adapter.
It is a two socket CPU system, each CPU connected(?) to four GPUS. I think the interconnect becomes than the (QPI/UPI) ?
The server seems to be some kind of “PCI-Express GPU server”
According to (nvidia-smi topo -m) there is some kind of PCI interconnect (PIX) between GPU pairs.
If its helps it is a box like this (All in Japanese: http://www.gdep.co.jp/products/list/v/56a9c55e26b1c , serial nr: MAS-XE5-SV4U/8X)
I don’t think OFED is present on this system. I downloaded MVAPICH-GDR and tried to compile a simple program, but it shows me the
Error message:
“libibumad.so.3, needed by /opt/mvapich2/gdr/2.3a/mcast/no-openacc/cuda8.0/mofed3.4/pbs/gnu4.8.5/lib64/libmpi.so, not found”
Is this error message related to OFED?
GDRCopy is currently not installed on the system.
Thank you for your help,
Yussuf
From: Subramoni, Hari
Sent: Wednesday, April 18, 2018 7:22 PM
To: Yussuf Ali; mvapich-discuss at cse.ohio-state.edu
Cc: Subramoni, Hari
Subject: RE: [mvapich-discuss] Which mvapich version to install on a GPUcluster without Mellanox OFED ?
Hi, Yussuf.
You can use the optimized MVAPICH2-GDR for single node application runs even if Mellanox hardware is not present. This should give you the best performance within one node.
Could you please let us know the answer to the following questions
a. What sort of interconnect the system has?
b. What version of OFED is available on the system?
c. Is GDRCopy available on the system?
a. https://github.com/NVIDIA/gdrcopy
This will enable us to help you better.
Best Regards,
Hari.
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Yussuf Ali
Sent: Wednesday, April 18, 2018 12:20 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Which mvapich version to install on a GPU cluster without Mellanox OFED ?
Dear Mvapich usergroup,
we have the following GPU cluster system with 8 GPUs (GeForce 1080Ti) but without any Mallanox hardware.
Out goal is to use MPI so send data between different GPUs directly from CUDA device buffers.
Is this possible with any Mvapich version and this particular GPU hardware cluster?
Output from: nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU-Affinity
GPU0 X PIX PHB PHB SYS SYS SYS SYS 0-7,16-23
GPU1 PIX X PHB PHB SYS SYS SYS SYS 0-7,16-23
GPU2 PHB PHB X PIX SYS SYS SYS SYS 0-7,16-23
GPU3 PHB PHB PIX X SYS SYS SYS SYS 0-7,16-23
GPU4 SYS SYS SYS SYS X PIX PHB PHB 8-15,24-31
GPU5 SYS SYS SYS SYS PIX X PHB PHB 8-15,24-31
GPU6 SYS SYS SYS SYS PHB PHB X PIX 8-15,24-31
GPU7 SYS SYS SYS SYS PHB PHB PIX X 8-15,24-31
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
Thank you for your help,
Yussuf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180419/a193c6d8/attachment.html>
More information about the mvapich-discuss
mailing list