[mvapich-discuss] MVAPICH2-GDR LD_PRELOAD Bug with Tensorflow
Awan, Ammar Ahmad
awan.10 at buckeyemail.osu.edu
Wed Apr 15 12:13:17 EDT 2020
Dear Andreas,
Thank you for your question regarding TensorFlow.
Please refer to the MVAPICH2-GDR User guide section 7.2 from the link below.
http://mvapich.cse.ohio-state.edu/userguide/gdr/#_example_running_tensorflow_tf_cnn_benchmarks_with_mvapich2_gdr
7.2. Example running TensorFlow (tf_cnn_benchmarks) with MVAPICH2-GDR
MVAPICH2-GDR supports TensorFlow with Horovod/MPI design but a special flag is needed to run the jobs properly. Please use the MV2_SUPPORT_TENSOR_FLOW=1 runtime variable but do not use the LD_PRELOAD option.
Example:
1: $ export MV2_PATH=/opt/mvapich2/gdr/2.3.3/gnu
2: $ export MV2_USE_CUDA=1
3: $ export MV2_SUPPORT_TENSOR_FLOW=1
4:
5: $ $MV2_PATH/bin/mpirun_rsh -export -np 2 hostA hostB \
6: python tf_cnn_benchmarks.py --model=resnet50 \
7: --variable_update=horovod
Please let us know if this resolves your issue.
Regards,
Ammar
________________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Herten, Andreas <a.herten at fz-juelich.de>
Sent: Wednesday, April 15, 2020 12:04 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] MVAPICH2-GDR LD_PRELOAD Bug with Tensorflow
Dear all,
On our HPC system JUWELS we see another bug with MVAPICH 2.3.3-GDR.
As soon as MVAPICH2 is introduced to the environment (and with it, the recommended LD_PRELOAD variable), even a simple Tensorflow program seg faults.
Please see here for some more description:
https://gist.github.com/AndiH/4f29c4b2d1a21a115580086223bbb2d5<https://urldefense.com/v3/__https://gist.github.com/AndiH/4f29c4b2d1a21a115580086223bbb2d5__;!!KGKeukY!jse9OyOO7y0ltPRKrlm4EbfQVrgU5ITFktCRnXN1mI7-jL0aXl8Sct_oot3rJXMcw0ivgvpf3zYYf8Q$>
What do you recommend to debug this further? Any ideas?
Best,
-Andreas
—
NVIDIA Application Lab // POWER Acceleration and Design Centre
Jülich Supercomputing Centre
Forschungszentrum Jülich, Germany
+49 2461 61 1825
##########
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
More information about the mvapich-discuss
mailing list