[mvapich-discuss] application failing with process_vm_readv fail [was: Re: osu_mbw_mr does not recognize option -d]

Subramoni, Hari subramoni.1 at osu.edu
Fri Jun 29 09:32:42 EDT 2018


Hi, Marius.

Sorry to head that you're facing issue. Can you please see if it can be reproduced using "osu_bcast –d cuda"? This would make it easier for us to debug the issue. If this does not reproduce the issue, would it be possible to share your application so that we can try it locally?

Could you please let us know for what sort of communication patterns this occurs? It is for non-contiguous send/recv for host buffers?

Can you try disabling CMA to see if the issues go away? You can do this at runtime by setting MV2_SMP_USE_CMA=0 on the command line.

Best Regards,
Hari.

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Marius Brehler
Sent: Wednesday, June 27, 2018 4:34 PM
To: Subramoni, Hari <subramoni.1 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] application failing with process_vm_readv fail [was: Re: osu_mbw_mr does not recognize option -d]

Hi Hari,

On 06/27/2018 01:43 PM, Subramoni, Hari wrote:
> Hi, Marius.
>
> I apologize for the error. We will correct that ASAP. I am sure that osu_mbw_mr does not support device to device transfers. Could you please try osu_bw, osu_bibw or osu_latency instead?

no problem. The intention was just to let you know :) osu_bw and osu_latency seem to work fine and also recognize '-d cuda'.


> Could you please let me know what sort of errors you're facing with running your application with our stacks? Were you using MVAPICH2 or MVAPICH2-GDR? Did you use the LD_PRELOAD option?

We have a GPU-accelerated implementation of an Runge-Kutta in the Interaction Picture (RK4IP) algorithm, which now additionally uses MPI to support multi-GPU. If you are interested in more details regarding the original implementation you might take a look at [1].
In a first step I would like to test GPU-to-GPU communication using MVAPICH-GDR 2.3a on a single node (equipped with two GPUs).
Only in a second step, one more machine (both connected with ConnectX-4
HCAs) will be involved. Currently, the applications fails even on the single node.

Our prior implementation used MPI_Bcast and is failing with:

[cli_1]: aborting job:
Fatal error in PMPI_Bcast:
Other MPI error, error stack:
PMPI_Bcast(1635)......................: MPI_Bcast(buf=0xb04760000, count=8192, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1471).................:
MPIR_Bcast_MV2(3928)..................:
MPIR_Bcast_index_tuned_intra_MV2(3573):
MPIR_Bcast_binomial_MV2(163)..........:
MPIC_Recv(439)........................:
MPIC_Wait(323)........................:
MPIDI_CH3I_Progress(215)..............:
MPIDI_CH3I_SMP_read_progress(1247)....:
MPIDI_CH3I_SMP_readv_rndv(4882).......: CMA: (MPIDI_CH3I_SMP_readv_rndv) process_vm_readv fail



Our new implementation uses MPI_I{send,recv} and fails with:

[cli_0]: aborting job:
Fatal error in PMPI_Test:
Other MPI error, error stack:
PMPI_Test(168)....................: MPI_Test(request=0x7f4468d3d310, flag=0x7f4468d3d330, status=0x1) failed
MPIDI_CH3I_Progress_test(503).....:
MPIDI_CH3I_SMP_read_progress(1247):
MPIDI_CH3I_SMP_readv_rndv(4882)...: CMA: (MPIDI_CH3I_SMP_readv_rndv) process_vm_readv fail

[cli_1]: aborting job:
Fatal error in PMPI_Test:
Other MPI error, error stack:
PMPI_Test(168)....................: MPI_Test(request=0x7fdb60138d90, flag=0x7fdb60138db0, status=0x1) failed
MPIDI_CH3I_Progress_test(503).....:
MPIDI_CH3I_SMP_read_progress(1247):
MPIDI_CH3I_SMP_readv_rndv(4882)...: CMA: (MPIDI_CH3I_SMP_readv_rndv) process_vm_readv fail


I use environment modules on the CentOS machine, environment variables are set as following:

[brehler at fermi tests]$ env|grep MV2
MV2_PATH=/opt/mvapich2/gdr/2.3a
MV2_USE_GPUDIRECT_GDRCOPY=1
MV2_USE_CUDA=1
MV2_GPUDIRECT_GDRCOPY_LIB=/opt/gdrcopy/lib64/libgdrapi.so
MV2_USE_GPUDIRECT=1

> http://mvapich.cse.ohio-state.edu/userguide/gdr/2.3a/#_example_use_of_
> ld_preload

Setting LD_PRELOAD fails with

[proxy:0:0 at fermi.hft.e-technik.tu-dortmund.de] HYDU_create_process
(utils/launch/launch.c:75): execvp error on file
LD_PRELOAD=/opt/mvapich2/gdr/2.3a/lib64/libmpi.so.12.0.5 (No such file or directory)

Actually the lib is present at the location

[brehler at fermi ~]$ ls -la /opt/mvapich2/gdr/2.3a/lib64/libmpi.so.12.0.5
-rwxr-xr-x. 1 root root 6592880 Jun  5 20:24
/opt/mvapich2/gdr/2.3a/lib64/libmpi.so.12.0.5


Our issue might not be directly related with mvapich-gdr and might be a configuration issue. Therefore, I appreciate any hint.
Best Regards

Marius



[1]
https://www.nvidia.com/content/dam/en-zz/Solutions/gtc-europe/posters/high-performance-computing/gtc-eu-europe-research-posters-11.jpg

> Best Regards,
> Hari.
>
> -----Original Message-----
> From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Marius 
> Brehler
> Sent: Wednesday, June 27, 2018 1:15 PM
> To: Subramoni, Hari <subramoni.1 at osu.edu>
> Cc: mvapich-discuss at cse.ohio-state.edu 
> <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: Re: [mvapich-discuss] osu_mbw_mr does not recognize option -d
>
> Hello Harri,
>
> actually I have chosen this benchmark randomly to verify our system setup, since our application is miserably failing whereas openmpi works.
> Both processes were executed on the same node, just picked osu_mbw_mr since the example is given in the docs:
>
> http://mvapich.cse.ohio-state.edu/userguide/gdr/2.3a/#_examples_using_
> osu_micro_benchmarks_with_multi_rail_support
>
> In this example the '-d cuda' option is passed. osu_mbw_mr reports
> 'pairs: 1' and both GPUs have some load, showing that a process is running on each of them with nvidia-smi.
>
> Regards
>
> Marius
>
> On 06/27/2018 12:51 PM, Subramoni, Hari wrote:
>> Hello, Marius.
>>
>> The multi-pair bandwidth/message rate benchmark does not support 
>> transfer from/to device buffers. This is a known limitation. We have 
>> plans to add this support in the future.
>>
>> Could you please take a look at the numbers reported with 
>> '--accelerator cuda' and compare it with numbers seen for host to 
>> host? It's likely that it is just running host to host and not 
>> throwing the error message correctly.
>>
>> Regards,
>> Hari.
>>
>> Sent from my phone
>>
>>
>> On Jun 27, 2018 12:34 PM, Marius Brehler 
>> <marius.brehler at tu-dortmund.de>
>> wrote:
>>
>>     Hi,
>>
>>     testing mvapich2-gdr on our machine, I noticed that passing '-d cuda' to
>>     the osu_mbw_mr benchmark (shipped with the RPM) fails:
>>
>>
>>     [brehler at fermi tests]$ mpirun -np 2
>>     /opt/mvapich2/gdr/2.3a/libexec/osu-micro-benchmarks/get_local_rank
>>     /opt/mvapich2/gdr/2.3a/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr
>>     -d cuda
>>     /opt/mvapich2/gdr/2.3a/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr:
>>     invalid option -- 'd'
>>     Invalid Option [-d]
>>
>>     /opt/mvapich2/gdr/2.3a/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr:
>>     invalid option -- 'd'
>>     Usage: (null) [options]
>>
>>
>>     Passing the option '--accelerator cuda' works as expected.
>>     Regards
>>
>>     Marius
>>     [..]
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 8652 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180629/228af51d/attachment.bin>


More information about the mvapich-discuss mailing list