[mvapich-discuss] MVAPICH2-GDR latency test return bad result

Wed Jul 22 12:34:13 EDT 2015

The following reply was sent to you by Khaled on July 10th. Please follow these steps.

DK
-----

Hi Makai,

Thank you for the information on your HCA installation. However you need to find out the placement of the GPU and HCA on your PCI slots. Todo so please run the command

lspci -tv

and then see if the NVIDIA GPU and the Mellanox HCA are on the same root or different. This will tell you if it is same socket or not.

Also run nvidia-smi to find out how many GPUs you have on your system.

Note that I'm moving this discussion to our internal list of developers.

Thanks

________________________________
From: makai [makailove123 at 163.com]
Sent: Wednesday, July 22, 2015 12:07 PM
To: Panda, Dhabaleswar
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] MVAPICH2-GDR latency test return bad result

I’m sorry, but khaled hamidouche had gave me some suggestion for my first two questions as the following:

Regarding the performance behavior, it might be because the QPI bottleneck as your GPU and HCA are in different socket. Can you please provide the details about your node configuration ?  and in meantime can you please try to explicitly select the HCA and GPU on the same socket and try? you can use the runtime parameter MV2_IBA_HCA.

After that, I provided my IB device information to him and ask the last two questions, but I have not gotten his reply.
I’m thinking that if my questions are not quite proper and could not be understood easily.

在 2015年7月22日，下午11:19，Panda, Dhabaleswar <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>> 写道：

One of the MVAPICH team members had already corresponded with you on this issue a few
weeks back outside of this list. Please follow those instructions.

Thanks,

DK
________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of makai [makailove123 at 163.com<mailto:makailove123 at 163.com>]
Sent: Tuesday, July 21, 2015 11:50 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
Subject: [mvapich-discuss] MVAPICH2-GDR latency test return bad result

I have installed MVAPICH2-GDR and gdrcopy, but when I run the osu_latency, it turned out to be weird.

makai at gpu-cluster-3:~$ $MV2_PATH/bin/mpiexec -hosts 192.168.2.3,192.168.2.4 -n 2 -env MV2_USE_CUDA 1 /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency D D
# OSU MPI-CUDA Latency Test
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size            Latency (us)
Warning *** The GPU and IB selected are not on the same socket. Do not delever the best performance
Warning *** The GPU and IB selected are not on the same socket. Do not delever the best performance
0                         1.67
1                         2.91
2                         3.92
4                         3.99
8                         3.92
16                        3.97
32                      160.67
64                      161.51
128                     162.05
256                     165.20
512                     165.88
1024                    168.92
2048                    176.08
4096                    185.95
8192                     72.63
16384                   261.26
32768                   148.08
65536                   518.37
131072                  143.93
262144                  260.03
524288                  254.41
1048576                 393.54
2097152                 672.47
4194304                1244.69

Why the result became so bad after size became larger than 32?
And I find there is only one CA on my node, why it told me “The GPU and IB selected are not on the same socket. Do not delever the best performance “?

The blow is my IB information.

root at gpu-cluster-4:/mnt/docs/readme_and_user_manual# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.11.500
Hardware version: 0
Node GUID: 0x00e08100002ae95b
System image GUID: 0x00e08100002ae95e
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 7
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0x00e08100002ae95c
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 8
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0x00e08100002ae95d
Link layer: InfiniBand

makai at gpu-cluster-4:~$ ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.11.500
node_guid: 00e0:8100:002a:e95b
sys_image_guid: 00e0:8100:002a:e95e
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: MITAC_QDR
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 7
port_lmc: 0x00
link_layer: InfiniBand

port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 8
port_lmc: 0x00
link_layer: InfiniBand

I’m not quiet understand the sockets which the GPU and the HCA belong to. How could I know whether the GPU and the HCA are belonging to the same socket?
According to the IB device Information, there only one HCA on my node. If the GPU and the HCA belong to different sockets, could I do some configuration to their sockets?
I have use the runtime parameter MV2_IBA_HCA=mlx4_0, but the result did not become better.

Could you give me some help?
Thanks!
[attachment]

winmail.dat
download: http://u.163.com/t0/Hqxdw3

<winmail.dat>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 14422 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150722/294ea9b3/attachment-0001.bin>