[Mvapich-discuss] mvapich/3.0rc: Disabling CUDA at runtime?

Shineman, Nat shineman.5 at osu.edu
Wed Feb 14 15:02:30 EST 2024


Hi Ben,

No, this configuration is not supported. Firstly, MVAPICH 3.0rc does not officially support GPU buffers. Since we are an MPICH derivative, some support can be enabled via MPICH's GPU support, but we do not support those configurations. For GPU support use MVAPICH-Plus 3.0. This version includes our complete set of GPU optimizations. In either case, building on a CUDA supported system and running on a system without CUDA drivers will cause errors, this is a dependency of the Yaksa datatype engine from MPICH which we are dependent on.

Since we do not maintain Yaksa in any way, I cannot provide you much guidance on that front. If you would like to see Yaksa support building on a CUDA capable system and running on a non-CUDA system, please contact the MPICH developers on Github, they may be able to provide more assistance there.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of Ben Kirk via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Monday, February 12, 2024 10:33
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] mvapich/3.0rc: Disabling CUDA at runtime?

Hi, we're experimenting with mvapich/3. 0rc on a Cray-EX SS11 machine with a hybrid configuration: 2488 CPU nodes and 82 4-way GPU nodes. We build mvapich/3. 0rc with CUDA support but would like to be able to disable at runtime when on our
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06ipq8rthHHbc0LoO8ItnrIKNh6PhY-2PIRsIuNs3kMNrDb-a8CBM1-jMbUVYxCoXGNiIHU62hgaEpiy4309Yy8GxWc--dPWVM2fQ1LsnU0LKdb8YcSGwjVn3687b-7yOQ$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
Hi, we're experimenting with mvapich/3.0rc on a Cray-EX SS11 machine with a hybrid configuration: 2488 CPU nodes and 82 4-way GPU nodes.

We build mvapich/3.0rc with CUDA support but would like to be able to disable at runtime when on our CPU-only nodes, is this a supported configuration, and if so how?  I checked for various MV2_*/MVP_* environment variables and didn't find success, yet.

Thanks!!
--
Ben Kirk
NCAR Computational & Information Systems Laboratory


$ mpiexec -n 2 ./hello_world_mpi.mvapich
pbs_attach: process 115770 attached to job: 3047325.desched1
CUDA Error (yaksuri_cuda_init_hook:src/backend/cuda/hooks/yaksuri_cuda_init_hooks.c,114): no CUDA-capable device is detected

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 116214 RUNNING AT dec0001.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec0001.hsn.de.hpc.ucar.edu__;!!KGKeukY!0yHd4LCcCIJNzFvGn9dQQFkTRnyyNahd8hESfxkUa78oZ6FOOYU3lLQhmhN7p231PuB8oV5oTjaKdOp6aW7dSGY7kv0$>
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240214/f0ee91ab/attachment-0002.html>


More information about the Mvapich-discuss mailing list