[mvapich-discuss] Bad performance of IVSHMEM with OSU benchmarks

Xiaoyi Lu lu.932 at osu.edu
Tue Jul 11 17:14:07 EDT 2017


OK. Let us know the numbers you get if you disable hyper threads.

For the same NUMA, you can also bound each VM to two cores. Show us the Inter-VM-Intra-Node numbers for MV2-Virt (def), MV2-Virt (explicitly setting IVSHMEM=1), and MV2-Virt (explicitly setting IVSHMEM=0).

We can then discuss from there.

Thanks,
Xiaoyi

> On Jul 11, 2017, at 4:57 PM, Maksym Planeta <mplaneta at os.inf.tu-dresden.de> wrote:
> 
> 
> 
> On 07/11/2017 10:44 PM, Xiaoyi Lu wrote:
>> Hi,
>> 
>> Thanks for sending us the details.
>> 
>> Can you first disable hyper threads?
>> 
> 
> I can try to repeat the same thing tomorrow without hyperthreads, but actually nothing was running there (I was looking with htop).
> 
> > And you can try to bound each VM to two cores in the same node.
> 
> Do you mean the same NUMA domain? If yes, then I have only one NUMA domain. If not then I did this for previous email and saw no difference.
> 
> > After that, let us know how the numbers look like for the cases of
> > IVSHMEM enabled and disabled.
> 
> 
> 
>> Thanks,
>> Xiaoyi
>> 
>>> On Jul 11, 2017, at 4:20 PM, Maksym Planeta <mplaneta at os.inf.tu-dresden.de> wrote:
>>> 
>>> 
>>> 
>>> On 07/11/2017 10:08 PM, Xiaoyi Lu wrote:
>>>> Hi, Maksym,
>>>> 
>>>> Thanks for your feedback.
>>>> 
>>>> Can you please let us know your system configurations? Like KVM, QEMU versions, cpu info, memory size, HCA, etc. How many VMs and number of processes per VM are run on your system?
>>>> 
>>> 
>>> QEMU/KVM:
>>> $ qemu-system-x86_64 --version
>>> QEMU emulator version 2.8.1(Debian 1:2.8+dfsg-6)
>>> 
>>> Kernel:
>>> $ uname -a
>>> Linux ib1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) x86_64 GNU/Linux
>>> 
>>> CPU 4 cores + 4 hyperthreads:
>>> 
>>> $ cat /proc/cpuinfo
>>> processor	: 0
>>> vendor_id	: GenuineIntel
>>> cpu family	: 6
>>> model		: 60
>>> model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>> stepping	: 3
>>> microcode	: 0x9
>>> cpu MHz		: 799.987
>>> cache size	: 8192 KB
>>> physical id	: 0
>>> siblings	: 8
>>> core id		: 0
>>> cpu cores	: 4
>>> apicid		: 0
>>> initial apicid	: 0
>>> fpu		: yes
>>> fpu_exception	: yes
>>> cpuid level	: 13
>>> wp		: yes
>>> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm arat pln pts
>>> bugs		:
>>> bogomips	: 6799.83
>>> clflush size	: 64
>>> cache_alignment	: 64
>>> address sizes	: 39 bits physical, 48 bits virtual
>>> power management:
>>> 
>>> $ free -h
>>>             total        used        free      shared  buff/cache available
>>> Mem:            15G        4.6G         10G        270M        814M    10G
>>> Swap:          3.7G          0B        3.7G
>>> 
>>> HCA with 4 virtual functions:
>>> 
>>> 05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>>> 
>>> # ibv_devinfo
>>> hca_id:	mlx4_0
>>> 	transport:			InfiniBand (0)
>>> 	fw_ver:				2.34.5000
>>> 	node_guid:			f452:1403:0010:a4a0
>>> 	sys_image_guid:			f452:1403:0010:a4a3
>>> 	vendor_id:			0x02c9
>>> 	vendor_part_id:			4099
>>> 	hw_ver:				0x0
>>> 	board_id:			MT_1090120019
>>> 	phys_port_cnt:			2
>>> 		port:	1
>>> 			state:			PORT_DOWN (1)
>>> 			max_mtu:		2048 (4)
>>> 			active_mtu:		2048 (4)
>>> 			sm_lid:			0
>>> 			port_lid:		0
>>> 			port_lmc:		0x00
>>> 			link_layer:		InfiniBand
>>> 
>>> 		port:	2
>>> 			state:			PORT_ACTIVE (4)
>>> 			max_mtu:		2048 (4)
>>> 			active_mtu:		2048 (4)
>>> 			sm_lid:			3
>>> 			port_lid:		3
>>> 			port_lmc:		0x00
>>> 			link_layer:		InfiniBand
>>> 
>>> 
>>> For this test I had two VMs at each host. For each test only two VMs were used, the rest were idling. I didn't do pinning.
>>> 
>>> When I pinned the VMs there was no change. Pinning inside VMs is disabled, but there is only one vCPU.
>>> 
>>> In the attachment virsh dumpxml for one of the VMs.
>>> 
>>> Motherboard:
>>> 
>>> 	Manufacturer: Gigabyte Technology Co., Ltd.
>>> 	Product Name: Z87-HD3
>>> 
>>> 
>>>> Thanks,
>>>> Xiaoyi
>>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Maksym Planeta
>>> <dump.xml>
>> 
> 
> -- 
> Regards,
> Maksym Planeta
> 




More information about the mvapich-discuss mailing list