[mvapich-discuss] MVAPICH2-GDR LD_PRELOAD Bug with Tensorflow

Awan, Ammar Ahmad awan.10 at buckeyemail.osu.edu
Wed Apr 15 12:13:17 EDT 2020


Dear Andreas,

Thank you for your question regarding TensorFlow.

Please refer to the MVAPICH2-GDR User guide section 7.2 from the link below.

http://mvapich.cse.ohio-state.edu/userguide/gdr/#_example_running_tensorflow_tf_cnn_benchmarks_with_mvapich2_gdr

7.2. Example running TensorFlow (tf_cnn_benchmarks) with MVAPICH2-GDR

MVAPICH2-GDR supports TensorFlow with Horovod/MPI design but a special flag is needed to run the jobs properly. Please use the MV2_SUPPORT_TENSOR_FLOW=1 runtime variable but do not use the LD_PRELOAD option.

Example:

    1: $ export MV2_PATH=/opt/mvapich2/gdr/2.3.3/gnu
    2: $ export MV2_USE_CUDA=1
    3: $ export MV2_SUPPORT_TENSOR_FLOW=1
    4:
    5: $ $MV2_PATH/bin/mpirun_rsh -export -np 2 hostA hostB \
    6:         python tf_cnn_benchmarks.py --model=resnet50 \
    7:                            --variable_update=horovod

Please let us know if this resolves your issue.

Regards,
Ammar

________________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Herten, Andreas <a.herten at fz-juelich.de>
Sent: Wednesday, April 15, 2020 12:04 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] MVAPICH2-GDR LD_PRELOAD Bug with Tensorflow

Dear all,

On our HPC system JUWELS we see another bug with MVAPICH 2.3.3-GDR.
As soon as MVAPICH2 is introduced to the environment (and with it, the recommended LD_PRELOAD variable), even a simple Tensorflow program seg faults.

Please see here for some more description:
https://gist.github.com/AndiH/4f29c4b2d1a21a115580086223bbb2d5<https://urldefense.com/v3/__https://gist.github.com/AndiH/4f29c4b2d1a21a115580086223bbb2d5__;!!KGKeukY!jse9OyOO7y0ltPRKrlm4EbfQVrgU5ITFktCRnXN1mI7-jL0aXl8Sct_oot3rJXMcw0ivgvpf3zYYf8Q$>

What do you recommend to debug this further? Any ideas?

Best,

-Andreas

—
NVIDIA Application Lab // POWER Acceleration and Design Centre
Jülich Supercomputing Centre
Forschungszentrum Jülich, Germany
+49 2461 61 1825

##########

Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt




More information about the mvapich-discuss mailing list