[mvapich-discuss] Segmentation fault at some MPI functions after MPI_Put

khaled hamidouche khaledhamidouche at gmail.com
Wed Nov 4 14:11:41 EST 2015


Dear Akihiro,

There seems to be some interaction between IPC and MPI_Put, we are taking a
look at this and get back to you.
But in meantime, can you please  try to use Flush synchronization before
the unlock. This should help in fixing the behavior you are seeing.

Thanks



Dear Khaled and Jiri

Thank you for your reply.
I forgot to write that I set MV2_USE_GPUDIRECT_GDRCOPY=0 because GDRCOPY
for CUDA7.5 is not installed in the cluster.
osu_put_latency was passed but the result is unreasonable when
MV2_CUDA_IPC=1.

when MV2_CUDA_IPC=1
("mpirun_rsh -np 2 -hostfile $PBS_NODEFILE MV2_NUM_PORTS=2
MV2_USE_CUDA=1 MV2_CUDA_IPC=1 MV2_USE_GPUDIRECT_GDRCOPY=0
./local_rank.sh osu_put_latency -d cuda -w create -s lock D D")
(local_rank.sh is used for setting LOCAL_RANK=$MV2_COMM_WORLD_LOCAL_RANK
for GPU selection)
############################################################
#######################################
# OSU MPI_Put-CUDA Latency Test v5.0
# Window creation: MPI_Win_create
# Synchronization: MPI_Win_lock/unlock
# Rank 0 Memory on DEVICE (D) and Rank 1 Memory on DEVICE (D)
# Size          Latency (us)
0                       0.05
1                       3.30
2                       3.29
4                       3.29
8                       3.30
16                      3.31
32                      3.30
64                      3.26
128                     3.30
256                     3.29
512                     3.27
1024                    3.34
2048                    3.25
4096                    3.26
8192                    3.46
16384                   3.25
32768                   3.21
65536                   3.18
131072                  3.33
262144                  3.22
524288                  3.14
1048576                 3.20
2097152                 3.17
4194304                 3.21
############################################################
#######################################

when MV2_CUDA_IPC=0
############################################################
#######################################
# OSU MPI_Put-CUDA Latency Test v5.0
# Window creation: MPI_Win_create
# Synchronization: MPI_Win_lock/unlock
# Rank 0 Memory on DEVICE (D) and Rank 1 Memory on DEVICE (D)
# Size          Latency (us)
0                       0.05
1                       4.41
2                       4.40
4                       4.41
8                       4.41
16                      4.40
32                      4.41
64                      4.48
128                     4.80
256                     5.38
512                     6.47
1024                    8.94
2048                   13.58
4096                   21.33
8192                   36.63
16384                  38.95
32768                  55.44
65536                  82.53
131072                 65.37
262144                 94.06
524288                143.40
1048576               252.99
2097152               493.56
4194304               976.52
############################################################
#######################################


nvidia-smi topo -m
############################################################
#######################################
         ^[[4mGPU0       GPU1    GPU2    GPU3    mlx4_0  CPU Affinity^[[0m
GPU0     X      PHB     SOC     SOC     SOC     0-9
GPU1    PHB      X      SOC     SOC     SOC     0-9
GPU2    SOC     SOC      X      PHB     PHB     10-19
GPU3    SOC     SOC     PHB      X      PHB     10-19
mlx4_0  SOC     SOC     PHB     PHB      X

Legend:

   X   = Self
   SOC = Path traverses a socket-level link (e.g. QPI)
   PHB = Path traverses a PCIe host bridge
   PXB = Path traverses multiple PCIe internal switches
   PIX = Path traverses a PCIe internal switch
############################################################
#######################################


The system configuration is the below.
######################################
CPU: Intel Xeon-E5 2680v2 x 2socket
GPU: NVIDIA K20X x 4
IB:  Mellanox Connect-X3 Dual-port QDR
######################################

Best regards,
Akihiro Tabuchi

>
>
>
-- 
 K.H
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151104/2c94ce32/attachment-0001.html>


More information about the mvapich-discuss mailing list