[Mvapich-discuss] Issue with GPCNET in UD mode with mvapich2-2.3.7 + patch for Rockport

Tony Niro tniro at rockportnetworks.com
Wed May 11 09:41:26 EDT 2022


Hi all,

We have built the GPCNET 1.2 application against mvpaich2-2.3.7 (+ patch for Rockport). Frequently, when we run the network_load_test with MV2_USE_UD_ONLY=1, the application hangs. When we try to get back-traces of all the running applications, the test unhangs and completes. We get the back-races by calling gdb -batch -ex "attach $pid" -ex "bt" -ex "detach on every network_load_test process on every server.

We are looking for some guidance on how to debug this problem, for example, are there any other options that we should be setting, etc.

Note that the application runs with no issues if we us MV2_USE_UD_HYBRID=0 instead of MV2_USE_UD_ONLY=1. Also note that the results of the failed test seem reasonable. I've included output for the hang scenario as well as one where the application run normally.

Tony Niro

Example of run that hung/resumed:
________________________________________________

user at dell-s13-h1[11:41:04] ~>/usr/bin/time  --format="%e seconds" /opt/bm/hpc/mvapich2-2.3.7-wc-patched-ng/bin/mpiexec -np 1504 -f /home/user/mpi-host.cfg  -env MV2_HOMOGENEOUS_CLUSTER=1 -env MV2_NDREG_ENTRIES_MAX=100000 -env MV2_NDREG_ENTRIES=50000 -env MV2_IBA_HCA=mlx5_0 -env MV2_SHMEM_COLL_NUM_COMM=64 -env MV2_UD_ZCOPY_NUM_RETRY=1000000  -env MV2_NUM_QP_PER_PORT=1  -env MV2_USE_UD_ONLY=1 /opt/bm/hpc/mvapich-GPCNET-1.2/network_load_test_tn
[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Cannot have more than one QP with UD_ONLY / Hybrid mode.
[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Resetting MV2_NUM_QP_PER_PORT to 1.
NetworkLoad Tests v1.2
  Test with 1504 MPI ranks (47 nodes)
  10 nodes running Network Tests
  37 nodes running Congestion Tests (min 9 nodes per congestor)

  Legend
   RR = random ring communication pattern
   Lat = latency
   BW = bandwidth
   BW+Sync = bandwidth with barrier
+------------------------------------------------------------------------------+
|                            Isolated Network Tests                            |
+---------------------------------+--------------+--------------+--------------+
|                            Name |          Avg |          99% |        Units |
+---------------------------------+--------------+--------------+--------------+
|          RR Two-sided Lat (8 B) |          3.1 |          6.8 |         usec |
+---------------------------------+--------------+--------------+--------------+
| RR Two-sided BW+Sync (131072 B) |        282.1 |        189.9 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+
|        Multiple Allreduce (8 B) |         18.2 |         29.3 |         usec |
+---------------------------------+--------------+--------------+--------------+

+------------------------------------------------------------------------------+
|                 Network Tests running with Congestion Tests                  |
+---------------------------------+--------------+--------------+--------------+
|                            Name |          Avg |          99% |        Units |
+---------------------------------+--------------+--------------+--------------+
|          RR Two-sided Lat (8 B) |          3.7 |         11.5 |         usec |
+---------------------------------+--------------+--------------+--------------+


<HANG>
<attach debugger to all processes to get stack trace>


| RR Two-sided BW+Sync (131072 B) |        201.7 |        130.8 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+
|        Multiple Allreduce (8 B) |         21.0 |         38.1 |         usec |
+---------------------------------+--------------+--------------+--------------+

+------------------------------------------------------------------------------+
|          Network Tests running with Congestion Tests - Key Results           |
+---------------------------------+--------------------------------------------+
|                            Name |                   Congestion Impact Factor |
+---------------------------------+----------------------+---------------------+
|                                 |                  Avg |                 99% |
+---------------------------------+----------------------+---------------------+
|          RR Two-sided Lat (8 B) |                 1.2X |                1.7X |
+---------------------------------+----------------------+---------------------+
| RR Two-sided BW+Sync (131072 B) |                 1.4X |                1.5X |
+---------------------------------+----------------------+---------------------+
|        Multiple Allreduce (8 B) |                 1.2X |                1.3X |
+---------------------------------+----------------------+---------------------+
4851.38 seconds
user at dell-s13-h1[13:07:00] ~>

Example of successful run

user at dell-s13-h1[11:20:50] ~>/usr/bin/time  --format="%e seconds" /opt/bm/hpc/mvapich2-2.3.7-wc-patched-ng/bin/mpiexec -np 1504 -f /home/user/mpi-host.cfg  -env MV2_HOMOGENEOUS_CLUSTER=1 -env MV2_NDREG_ENTRIES_MAX=100000 -env MV2_NDREG_ENTRIES=50000 -env MV2_IBA_HCA=mlx5_0 -env MV2_SHMEM_COLL_NUM_COMM=64 -env MV2_UD_ZCOPY_NUM_RETRY=1000000  -env MV2_NUM_QP_PER_PORT=1  -env MV2_USE_UD_ONLY=1 /opt/bm/hpc/mvapich-GPCNET-1.2/network_load_test_tn
[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Cannot have more than one QP with UD_ONLY / Hybrid mode.
[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Resetting MV2_NUM_QP_PER_PORT to 1.
NetworkLoad Tests v1.2
  Test with 1504 MPI ranks (47 nodes)
  10 nodes running Network Tests
  37 nodes running Congestion Tests (min 9 nodes per congestor)

  Legend
   RR = random ring communication pattern
   Lat = latency
   BW = bandwidth
   BW+Sync = bandwidth with barrier
+------------------------------------------------------------------------------+
|                            Isolated Network Tests                            |
+---------------------------------+--------------+--------------+--------------+
|                            Name |          Avg |          99% |        Units |
+---------------------------------+--------------+--------------+--------------+
|          RR Two-sided Lat (8 B) |          3.0 |          6.8 |         usec |
+---------------------------------+--------------+--------------+--------------+
| RR Two-sided BW+Sync (131072 B) |        270.8 |        182.0 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+
|        Multiple Allreduce (8 B) |         19.0 |         29.8 |         usec |
+---------------------------------+--------------+--------------+--------------+

+------------------------------------------------------------------------------+
|                 Network Tests running with Congestion Tests                  |
+---------------------------------+--------------+--------------+--------------+
|                            Name |          Avg |          99% |        Units |
+---------------------------------+--------------+--------------+--------------+
|          RR Two-sided Lat (8 B) |          3.7 |         11.9 |         usec |
+---------------------------------+--------------+--------------+--------------+
| RR Two-sided BW+Sync (131072 B) |        188.0 |        122.5 |   MiB/s/rank |
+---------------------------------+--------------+--------------+--------------+
|        Multiple Allreduce (8 B) |         21.6 |         39.5 |         usec |
+---------------------------------+--------------+--------------+--------------+

+------------------------------------------------------------------------------+
|          Network Tests running with Congestion Tests - Key Results           |
+---------------------------------+--------------------------------------------+
|                            Name |                   Congestion Impact Factor |
+---------------------------------+----------------------+---------------------+
|                                 |                  Avg |                 99% |
+---------------------------------+----------------------+---------------------+
|          RR Two-sided Lat (8 B) |                 1.2X |                1.8X |
+---------------------------------+----------------------+---------------------+
| RR Two-sided BW+Sync (131072 B) |                 1.4X |                1.5X |
+---------------------------------+----------------------+---------------------+
|        Multiple Allreduce (8 B) |                 1.1X |                1.3X |
+---------------------------------+----------------------+---------------------+
204.81 seconds


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220511/7b8d3cec/attachment-0017.html>


More information about the Mvapich-discuss mailing list