[mvapich-discuss] Bad performance of IVSHMEM with OSU benchmarks

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Tue Jul 11 17:25:23 EDT 2017



On 07/11/2017 11:14 PM, Xiaoyi Lu wrote:
> OK. Let us know the numbers you get if you disable hyper threads.
> 
> For the same NUMA, you can also bound each VM to two cores.


> Show us the Inter-VM-Intra-Node numbers for MV2-Virt (def), MV2-Virt (explicitly setting IVSHMEM=1), and MV2-Virt (explicitly setting IVSHMEM=0).

Here are the results for the default:

Latency:

|    Size | Intranode | Internode |
|---------+-----------+-----------|
|       0 |      2.55 |      2.92 |
|       1 |      2.47 |      2.71 |
|       2 |      2.38 |      2.55 |
|       4 |      2.33 |      2.45 |
|       8 |      2.30 |      2.39 |
|      16 |      2.30 |      2.36 |
|      32 |      2.35 |      2.39 |
|      64 |      2.42 |      2.46 |
|     128 |      2.61 |      2.66 |
|     256 |      3.75 |      3.78 |
|     512 |      4.21 |      4.19 |
|    1024 |      5.09 |      5.02 |
|    2048 |      5.91 |      5.76 |
|    4096 |      8.02 |      7.72 |
|    8192 |     11.75 |     11.09 |
|   16384 |     21.64 |     20.44 |
|   32768 |     34.25 |     31.38 |
|   65536 |     59.45 |     52.88 |
|  131072 |    109.42 |     96.09 |
|  262144 |    209.41 |    182.30 |
|  524288 |    409.13 |    354.58 |
| 1048576 |    808.61 |    698.58 |
| 2097152 |   1619.47 |   1386.03 |
| 4194304 |   3428.21 |   2769.11 |

Bandwidth:

|    Size | Intranode | Internode |
|---------+-----------+-----------|
|       1 |      1.37 |      1.51 |
|       2 |      2.71 |      3.02 |
|       4 |      5.61 |      6.12 |
|       8 |     11.31 |     12.32 |
|      16 |     22.66 |     24.93 |
|      32 |     44.82 |     49.83 |
|      64 |     86.44 |     98.55 |
|     128 |    173.66 |    185.80 |
|     256 |    255.93 |    319.40 |
|     512 |    434.19 |    537.56 |
|    1024 |    650.43 |    800.53 |
|    2048 |    921.19 |   1078.84 |
|    4096 |   1157.34 |   1351.85 |
|    8192 |   1219.67 |   1416.09 |
|   16384 |   1203.76 |   1398.91 |
|   32768 |   1254.92 |   1456.86 |
|   65536 |   1282.28 |   1489.19 |
|  131072 |   1297.96 |   1507.05 |
|  262144 |   1305.38 |   1515.66 |
|  524288 |   1308.71 |   1521.66 |
| 1048576 |   1310.76 |   1523.69 |
| 2097152 |   1311.05 |   1524.68 |
| 4194304 |   1307.02 |   1525.13 |

The result for the explicit setup I sent in the original email.

Just as some additional information, normal ping is a bit faster for the local VM:

[user at host11 ~]$ ping -c 4 host21 
PING host21 (141.76.84.9) 56(84) bytes of data.
64 bytes from host21 (141.76.84.9): icmp_seq=1 ttl=64 time=0.487 ms
64 bytes from host21 (141.76.84.9): icmp_seq=2 ttl=64 time=0.480 ms
64 bytes from host21 (141.76.84.9): icmp_seq=3 ttl=64 time=0.498 ms
64 bytes from host21 (141.76.84.9): icmp_seq=4 ttl=64 time=0.478 ms

--- host21 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.478/0.485/0.498/0.028 ms
[user at host11 ~]$ ping -c 4 host12
PING host12 (141.76.84.2) 56(84) bytes of data.
64 bytes from host12 (141.76.84.2): icmp_seq=1 ttl=64 time=0.447 ms
64 bytes from host12 (141.76.84.2): icmp_seq=2 ttl=64 time=0.429 ms
64 bytes from host12 (141.76.84.2): icmp_seq=3 ttl=64 time=0.555 ms
64 bytes from host12 (141.76.84.2): icmp_seq=4 ttl=64 time=0.432 ms

--- host12 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.429/0.465/0.555/0.058 ms


> 
> We can then discuss from there.
> 
> Thanks,
> Xiaoyi
> 
>> On Jul 11, 2017, at 4:57 PM, Maksym Planeta <mplaneta at os.inf.tu-dresden.de> wrote:
>>
>>
>>
>> On 07/11/2017 10:44 PM, Xiaoyi Lu wrote:
>>> Hi,
>>>
>>> Thanks for sending us the details.
>>>
>>> Can you first disable hyper threads?
>>>
>>
>> I can try to repeat the same thing tomorrow without hyperthreads, but actually nothing was running there (I was looking with htop).
>>
>>> And you can try to bound each VM to two cores in the same node.
>>
>> Do you mean the same NUMA domain? If yes, then I have only one NUMA domain. If not then I did this for previous email and saw no difference.
>>
>>> After that, let us know how the numbers look like for the cases of
>>> IVSHMEM enabled and disabled.
>>
>>
>>
>>> Thanks,
>>> Xiaoyi
>>>
>>>> On Jul 11, 2017, at 4:20 PM, Maksym Planeta <mplaneta at os.inf.tu-dresden.de> wrote:
>>>>
>>>>
>>>>
>>>> On 07/11/2017 10:08 PM, Xiaoyi Lu wrote:
>>>>> Hi, Maksym,
>>>>>
>>>>> Thanks for your feedback.
>>>>>
>>>>> Can you please let us know your system configurations? Like KVM, QEMU versions, cpu info, memory size, HCA, etc. How many VMs and number of processes per VM are run on your system?
>>>>>
>>>>
>>>> QEMU/KVM:
>>>> $ qemu-system-x86_64 --version
>>>> QEMU emulator version 2.8.1(Debian 1:2.8+dfsg-6)
>>>>
>>>> Kernel:
>>>> $ uname -a
>>>> Linux ib1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux
>>>>
>>>> CPU 4 cores + 4 hyperthreads:
>>>>
>>>> $ cat /proc/cpuinfo
>>>> processor	: 0
>>>> vendor_id	: GenuineIntel
>>>> cpu family	: 6
>>>> model		: 60
>>>> model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>>> stepping	: 3
>>>> microcode	: 0x9
>>>> cpu MHz		: 799.987
>>>> cache size	: 8192 KB
>>>> physical id	: 0
>>>> siblings	: 8
>>>> core id		: 0
>>>> cpu cores	: 4
>>>> apicid		: 0
>>>> initial apicid	: 0
>>>> fpu		: yes
>>>> fpu_exception	: yes
>>>> cpuid level	: 13
>>>> wp		: yes
>>>> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm arat pln pts
>>>> bugs		:
>>>> bogomips	: 6799.83
>>>> clflush size	: 64
>>>> cache_alignment	: 64
>>>> address sizes	: 39 bits physical, 48 bits virtual
>>>> power management:
>>>>
>>>> $ free -h
>>>>             total        used        free      shared  buff/cache available
>>>> Mem:            15G        4.6G         10G        270M        814M    10G
>>>> Swap:          3.7G          0B        3.7G
>>>>
>>>> HCA with 4 virtual functions:
>>>>
>>>> 05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>>>>
>>>> # ibv_devinfo
>>>> hca_id:	mlx4_0
>>>> 	transport:			InfiniBand (0)
>>>> 	fw_ver:				2.34.5000
>>>> 	node_guid:			f452:1403:0010:a4a0
>>>> 	sys_image_guid:			f452:1403:0010:a4a3
>>>> 	vendor_id:			0x02c9
>>>> 	vendor_part_id:			4099
>>>> 	hw_ver:				0x0
>>>> 	board_id:			MT_1090120019
>>>> 	phys_port_cnt:			2
>>>> 		port:	1
>>>> 			state:			PORT_DOWN (1)
>>>> 			max_mtu:		2048 (4)
>>>> 			active_mtu:		2048 (4)
>>>> 			sm_lid:			0
>>>> 			port_lid:		0
>>>> 			port_lmc:		0x00
>>>> 			link_layer:		InfiniBand
>>>>
>>>> 		port:	2
>>>> 			state:			PORT_ACTIVE (4)
>>>> 			max_mtu:		2048 (4)
>>>> 			active_mtu:		2048 (4)
>>>> 			sm_lid:			3
>>>> 			port_lid:		3
>>>> 			port_lmc:		0x00
>>>> 			link_layer:		InfiniBand
>>>>
>>>>
>>>> For this test I had two VMs at each host. For each test only two VMs were used, the rest were idling. I didn't do pinning.
>>>>
>>>> When I pinned the VMs there was no change. Pinning inside VMs is disabled, but there is only one vCPU.
>>>>
>>>> In the attachment virsh dumpxml for one of the VMs.
>>>>
>>>> Motherboard:
>>>>
>>>> 	Manufacturer: Gigabyte Technology Co., Ltd.
>>>> 	Product Name: Z87-HD3
>>>>
>>>>
>>>>> Thanks,
>>>>> Xiaoyi
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Maksym Planeta
>>>> <dump.xml>
>>>
>>
>> -- 
>> Regards,
>> Maksym Planeta
>>
> 

-- 
Regards,
Maksym Planeta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170711/530c52f2/attachment-0001.p7s>


More information about the mvapich-discuss mailing list