[mvapich-discuss] Difference between MVAPICH2 and MVAPICH2-GDR

makai makailove123 at 163.com
Thu Jul 9 23:34:55 EDT 2015


Hi, khaled,

The blow is my IB information.

root at gpu-cluster-4:/mnt/docs/readme_and_user_manual# ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.11.500
	Hardware version: 0
	Node GUID: 0x00e08100002ae95b
	System image GUID: 0x00e08100002ae95e
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 7
		LMC: 0
		SM lid: 1
		Capability mask: 0x02514868
		Port GUID: 0x00e08100002ae95c
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 8
		LMC: 0
		SM lid: 1
		Capability mask: 0x02514868
		Port GUID: 0x00e08100002ae95d
		Link layer: InfiniBand

makai at gpu-cluster-4:~$ ibv_devinfo 
hca_id:	mlx4_0
	transport:			InfiniBand (0)
	fw_ver:				2.11.500
	node_guid:			00e0:8100:002a:e95b
	sys_image_guid:			00e0:8100:002a:e95e
	vendor_id:			0x02c9
	vendor_part_id:			4099
	hw_ver:				0x0
	board_id:			MITAC_QDR
	phys_port_cnt:			2
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			1
			port_lid:		7
			port_lmc:		0x00
			link_layer:		InfiniBand

		port:	2
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			1
			port_lid:		8
			port_lmc:		0x00
			link_layer:		InfiniBand

I’m not quiet understand the sockets which the GPU and the HCA belong to. How could I know whether the GPU and the HCA are belonging to the same socket?
According to the IB device Information, there only one HCA on my node. If the GPU and the HCA belong to different sockets, could I do some configuration to their sockets?
I have use the runtime parameter MV2_IBA_HCA=mlx4_0, but the result did not become better.
Could you give some help?
Thanks a lot.

> 在 2015年7月9日,下午11:27,khaled hamidouche <hamidouc at cse.ohio-state.edu> 写道:
> 
> Hi Makai, 
> 
> Good to know that you have switched to MV2-GDR and be able to installed. 
> 
> Regarding the performance behavior, it might be because the QPI bottleneck as your GPU and HCA are in different socket. Can you please provide the details about your node configuration ?  and in meantime can you please try to explicitly select the HCA and GPU on the same socket and try? you can use the runtime parameter MV2_IBA_HCA. 
> 
> Thanks a lot.  
> 
> On Thu, Jul 9, 2015 at 10:55 AM, makai <makailove123 at 163.com <mailto:makailove123 at 163.com>> wrote:
> --===============6704199941958321163==
> Content-Type: multipart/alternative;
>         boundary="Apple-Mail=_6426E929-5CE2-4F63-8FE2-EDF29E238E86"
> 
> --Apple-Mail=_6426E929-5CE2-4F63-8FE2-EDF29E238E86
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/plain; charset="gb2312"
> 
> I have installed MVAPICH2-GDR and gdrcopy, but when I run the =
> osu_latency, it turned out to be weird.
> 
> makai at gpu-cluster-3:~$ $MV2_PATH/bin/mpiexec -hosts =
> 192.168.2.3,192.168.2.4 -n 2 -env MV2_USE_CUDA 1 =
> /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency D D
> # OSU MPI-CUDA Latency Test
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size            Latency (us)
> Warning *** The GPU and IB selected are not on the same socket. Do not =
> delever the best performance=20
> Warning *** The GPU and IB selected are not on the same socket. Do not =
> delever the best performance=20
> 0                         1.67
> 1                         2.91
> 2                         3.92
> 4                         3.99
> 8                         3.92
> 16                        3.97
> 32                      160.67
> 64                      161.51
> 128                     162.05
> 256                     165.20
> 512                     165.88
> 1024                    168.92
> 2048                    176.08
> 4096                    185.95
> 8192                     72.63
> 16384                   261.26
> 32768                   148.08
> 65536                   518.37
> 131072                  143.93
> 262144                  260.03
> 524288                  254.41
> 1048576                 393.54
> 2097152                 672.47
> 4194304                1244.69
> 
> Why the result became so bad after size became larger than 32?
> And I find there is only one CA on my node, why it told me =A1=B0The GPU =
> and IB selected are not on the same socket. Do not delever the best =
> performance =A1=B0?
> 
> Could you give me some help?
> Thanks!
> 
> > =D4=DA 2015=C4=EA6=D4=C225=C8=D5=A3=AC=CF=C2=CE=E78:56=A3=ACPanda, =
> Dhabaleswar <panda at cse.ohio-state.edu <mailto:panda at cse.ohio-state.edu>> =D0=B4=B5=C0=A3=BA
> >=20
> > Thanks for your note. You are mixing-up two concepts: 1) CUDA-aware =
> MPI and 2) GPUDirect RDMA. The=20
> > CUDA-aware MPI concept allows MPI_Send and MPI_Recv to use data from =
> GPU device directly.=20
> > GPUDirect RDMA (GDR) allows data to be moved from a GPU to another GPU =
> through through PCI=20
> > interface using RDMA (say over InfiniBand) without going through the =
> host memory.=20
> >=20
> > MVAPICH2 supports only CUDA-aware MPI.
> >=20
> > MVAPICH2-GDR supports CUDA-aware MPI, GPUDirect RDMA (GDR), and many =
> other advanced designs
> > related to GPU clusters to exploit performance and scalability. For =
> example, you can get very low=20
> > D-D latency (close to 2 microsec) with MVAPICH2-GDR. Thus, for GPU =
> clusters with InfiniBand, =20
> > we strongly recommend the users to use MVAPICH2-GDR. Please take a =
> look at the MVAPICH2-GDR user=20
> > guide from the following URL for all features and usage guidelines:=20
> >=20
> > http://mvapich.cse.ohio-state.edu/userguide/gdr/ <http://mvapich.cse.ohio-state.edu/userguide/gdr/>
> >=20
> > Hope this helps.=20
> >=20
> > DK
> >=20
> >=20
> >=20
> > ________________________________________
> > From: mvapich-discuss-bounces at cse.ohio-state.edu <mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of makai =
> [makailove123 at 163.com <mailto:makailove123 at 163.com>]
> > Sent: Thursday, June 25, 2015 1:02 AM
> > To: mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> > Subject: [mvapich-discuss] Difference between MVAPICH2 and =
> MVAPICH2-GDR
> >=20
> > I have installed MVAPICH2, and it says that it supports GPUDirect =
> RDMA. MPI_Send and MPI_Recv could use addresses of device for data =
> transmission.
> > So, what=A1=AFs the difference between MVAPICH2 and MVAPICH2-GDR?
> >=20
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
> > [attachment]
> >=20
> > winmail.dat
> > download: http://u.163.com/t0/r4YIVzg <http://u.163.com/t0/r4YIVzg>
> >=20
> > <winmail.dat>
> 
> 
> --Apple-Mail=_6426E929-5CE2-4F63-8FE2-EDF29E238E86
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/html; charset="gb2312"
> 
> <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
> charset=3Dgb2312"></head><body style=3D"word-wrap: break-word; =
> -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
> class=3D""><div class=3D"">I have installed MVAPICH2-GDR and gdrcopy, =
> but when I run the osu_latency, it turned out to be weird.</div><div =
> class=3D""><br class=3D""></div><div class=3D""><div style=3D"margin: =
> 0px; font-family: 'Andale Mono'; color: rgb(41, 249, 20); =
> background-color: rgb(0, 0, 0);" class=3D"">makai at gpu-cluster-3:~$ =
> $MV2_PATH/bin/mpiexec -hosts 192.168.2.3,192.168.2.4 -n 2 -env =
> MV2_USE_CUDA 1 =
> /opt/mvapich2/gdr/2.1/cuda6.5/gnu/libexec/mvapich2/osu_latency D =
> D</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; color: =
> rgb(41, 249, 20); background-color: rgb(0, 0, 0);" class=3D""># OSU =
> MPI-CUDA Latency Test</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D""># Send Buffer on DEVICE (D) and Receive Buffer on DEVICE =
> (D)</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; color: =
> rgb(41, 249, 20); background-color: rgb(0, 0, 0);" class=3D""># =
> Size            Latency (us)</div><div =
> style=3D"margin: 0px; font-family: 'Andale Mono'; color: rgb(41, 249, =
> 20); background-color: rgb(0, 0, 0);" class=3D"">Warning *** The GPU and =
> IB selected are not on the same socket. Do not delever the best =
> performance </div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">Warning *** The GPU and IB selected are not on the same =
> socket. Do not delever the best performance </div><div =
> style=3D"margin: 0px; font-family: 'Andale Mono'; color: rgb(41, 249, =
> 20); background-color: rgb(0, 0, 0);" class=3D"">0       =
>                   =
> 1.67</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; color: =
> rgb(41, 249, 20); background-color: rgb(0, 0, 0);" class=3D"">1   =
>                     =
>   2.91</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; =
> color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" class=3D"">2 =
>                     =
>     3.92</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">4                 =
>         3.99</div><div style=3D"margin: 0px; =
> font-family: 'Andale Mono'; color: rgb(41, 249, 20); background-color: =
> rgb(0, 0, 0);" class=3D"">8             =
>             3.92</div><div style=3D"margin: =
> 0px; font-family: 'Andale Mono'; color: rgb(41, 249, 20); =
> background-color: rgb(0, 0, 0);" class=3D"">16      =
>                   =
> 3.97</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; color: =
> rgb(41, 249, 20); background-color: rgb(0, 0, 0);" class=3D"">32  =
>                     =
> 160.67</div><div style=3D"margin: 0px; font-family: 'Andale Mono'; =
> color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">64                =
>       161.51</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">128                 =
>     162.05</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">256                 =
>     165.20</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">512                 =
>     165.88</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">1024                =
>     168.92</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">2048                =
>     176.08</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">4096                =
>     185.95</div><div style=3D"margin: 0px; font-family: =
> 'Andale Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">8192                 =
>     72.63</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">16384                 =
>   261.26</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">32768                 =
>   148.08</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">65536                 =
>   518.37</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">131072                =
>   143.93</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">262144                =
>   260.03</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">524288                =
>   254.41</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">1048576               =
>   393.54</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">2097152               =
>   672.47</div><div style=3D"margin: 0px; font-family: 'Andale =
> Mono'; color: rgb(41, 249, 20); background-color: rgb(0, 0, 0);" =
> class=3D"">4194304              =
>   1244.69</div></div><div class=3D""><br class=3D""></div><div =
> class=3D"">Why the result became so bad after size became larger than =
> 32?</div><div class=3D""><span id=3D"x-apple-selection:end"></span>And I =
> find there is only one CA on my node, why it told me =A1=B0<span =
> style=3D"color: rgb(41, 249, 20); font-family: 'Andale Mono'; =
> background-color: rgb(0, 0, 0);" class=3D"">The GPU and IB selected are =
> not on the same socket. Do not delever the best performance</span><span =
> style=3D"color: rgb(41, 249, 20); font-family: 'Andale Mono'; =
> background-color: rgb(0, 0, 0);" class=3D""> </span>=A1=B0?</div><div=
>  class=3D""><br class=3D""></div><div class=3D"">Could you give me some =
> help?</div><div class=3D"">Thanks!</div><br class=3D""><div><blockquote =
> type=3D"cite" class=3D""><div class=3D"">=D4=DA =
> 2015=C4=EA6=D4=C225=C8=D5=A3=AC=CF=C2=CE=E78:56=A3=ACPanda, Dhabaleswar =
> <<a href=3D"mailto:panda at cse.ohio-state.edu <mailto:panda at cse.ohio-state.edu>" =
> class=3D"">panda at cse.ohio-state.edu <mailto:panda at cse.ohio-state.edu></a>> =D0=B4=B5=C0=A3=BA</div><br =
> class=3D"Apple-interchange-newline"><div class=3D"">Thanks for your =
> note. You are mixing-up two concepts: 1) CUDA-aware MPI and 2) GPUDirect =
> RDMA. The <br class=3D"">CUDA-aware MPI concept allows MPI_Send and =
> MPI_Recv to use data from GPU device directly. <br class=3D"">GPUDirect =
> RDMA (GDR) allows data to be moved from a GPU to another GPU through =
> through PCI <br class=3D"">interface using RDMA (say over InfiniBand) =
> without going through the host memory. <br class=3D""><br =
> class=3D"">MVAPICH2 supports only CUDA-aware MPI.<br class=3D""><br =
> class=3D"">MVAPICH2-GDR supports CUDA-aware MPI, GPUDirect RDMA (GDR), =
> and many other advanced designs<br class=3D"">related to GPU clusters to =
> exploit performance and scalability. For example, you can get very low =
> <br class=3D"">D-D latency (close to 2 microsec) with MVAPICH2-GDR. =
> Thus, for GPU clusters with InfiniBand,  <br class=3D"">we strongly =
> recommend the users to use MVAPICH2-GDR. Please take a look at the =
> MVAPICH2-GDR user <br class=3D"">guide from the following URL for all =
> features and usage guidelines: <br class=3D""><br class=3D""><a =
> href=3D"http://mvapich.cse.ohio-state.edu/userguide/gdr/ <http://mvapich.cse.ohio-state.edu/userguide/gdr/>" =
> class=3D"">http://mvapich.cse.ohio-state.edu/userguide/gdr/ <http://mvapich.cse.ohio-state.edu/userguide/gdr/></a><br =
> class=3D""><br class=3D"">Hope this helps. <br class=3D""><br =
> class=3D"">DK<br class=3D""><br class=3D""><br class=3D""><br =
> class=3D"">________________________________________<br class=3D"">From: =
> mvapich-discuss-bounces at cse.ohio-state.edu <mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of makai =
> [makailove123 at 163.com <mailto:makailove123 at 163.com>]<br class=3D"">Sent: Thursday, June 25, 2015 1:02 =
> AM<br class=3D"">To: mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu><br =
> class=3D"">Subject: [mvapich-discuss] Difference between MVAPICH2 and =
> MVAPICH2-GDR<br class=3D""><br class=3D"">I have installed MVAPICH2, and =
> it says that it supports GPUDirect RDMA. MPI_Send and MPI_Recv could use =
> addresses of device for data transmission.<br class=3D"">So, what=A1=AFs =
> the difference between MVAPICH2 and MVAPICH2-GDR?<br class=3D""><br =
> class=3D"">_______________________________________________<br =
> class=3D"">mvapich-discuss mailing list<br =
> class=3D"">mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu><br =
> class=3D"">http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-disc= <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-disc=>
> uss<br class=3D"">[attachment]<br class=3D""><br class=3D"">winmail.dat<br=
>  class=3D"">download: http://u.163.com/t0/r4YIVzg <http://u.163.com/t0/r4YIVzg><br class=3D""><br =
> class=3D""><span =
> id=3D"cid:564DF121-75AF-47FA-986A-51CCA2B11238"><winmail.dat></span>=
> </div></blockquote></div><br class=3D""></body></html>=
> 
> --Apple-Mail=_6426E929-5CE2-4F63-8FE2-EDF29E238E86--
> 
> --===============6704199941958321163==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
> 
> --===============6704199941958321163==--
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150710/e47945c0/attachment-0001.html>


More information about the mvapich-discuss mailing list