[mvapich-discuss] MVAPICH2-GDR on AWS EC2 instance

Ammar Ahmad Awan ammar.ahmad.awan at gmail.com
Wed Aug 22 13:20:01 EDT 2018


Hi Marius,

MVAPICH2-GDR requires the nv_peer_mem module to be installed on a machine
to utilize the GPU Direct RDMA (GDR) support. The flag MV2_USE_GPUDIRECT
only applies to the GDR support.

If you don't have nv_peer_mem installed on your machine, please do
MV2_USE_GPUDIRECT=0.

Regarding the EC2 instance, if you don't have an IB HCA, GPU Direct RDMA is
not supported as that refers to the mechanism where an HCA can directly
read from the GPU's memory.

Regards,
Ammar



On Tue, Aug 21, 2018 at 4:02 AM Marius Brehler <
marius.brehler at tu-dortmund.de> wrote:

> Hi,
>
> yet another update. The issue occurs with MV2_USE_GPUDIRECT=1, with
> MV2_USE_GPUDIRECT=0 the application executes as expected. Currently
> testing on a p2.xlarge with only one GPU.
> It is mentioned in the GDR useguide that "For such systems [regular OFED
> or GPUs without GPUDirect RDMA], MVAPICH2-GDR 2.3a efficiently takes
> advantage of CUDA IPC and GDRCOPY features." Should GPUDIRECT therefore
> be disabled? Following https://developer.nvidia.com/gpudirect, I assumed
> that "GPUDirect peer-to-peer transfers and memory access" between cards
> would be affected by this parameter. Or does the MV2_USE_GPUDIRECT
> parameter only applies to "GPUDirect support for RDMA"?
> Please excuse the noise and thanks in advance for clarification.
> Best Regards
>
> Marius
>
>
> On 08/21/2018 12:49 AM, Marius Brehler wrote:
> > Hi,
> > in correction to my prior mail, the shutdown was initiated by the used
> > job script and is not caused by the segfault. Sorry for this misleading
> > information. Anyway, I would appreciate any hints if there is something
> > special to keep in mind setting up MVAPICH2-GDR on an EC2 instance.
> > Best Regards
> >
> > Marius
> >
> > Am 20. August 2018 23:18:22 MESZ schrieb Marius Brehler
> > <marius.brehler at tu-dortmund.de>:
> >
> >     Hi,
> >
> >     I am trying to use the CUDA-aware MPI_Ibcast within an application
> on an
> >     AWS EC2 instance of type p2.8xlarge. This instance does not has an
> >     Infiniband HCA. If I am informed correctly, it is nevertheless
> >     recommended to use the GDR version if GPUs are involved.
> >
> >     I noticed that libmpi.so is linked against libibmad, libibumad and
> >     libibverbs. For simplicity I installed the libs together with the
> >     Mellanox OFED (which might not be a good idea).
> >     As stated in the GDR userguide I also installed gdrcopy, but skipped
> the
> >     nv_peer_mem module. I skipped it since our application runs fine on a
> >     local node without loading the nv_peer_mem module. Our local node has
> >     two K40c and also a Mellanox HCA installed (not involved in any
> >     communication at the moment). The remaining configuration of the
> CentOS
> >     on the local node and the EC2 instance is quite similar.
> >
> >     Unfortunately, our application is crashing so badly that the EC2
> >     instanced instantly shuts down:
> >
> >     [.. .compute.internal:mpi_rank_0][error_sighandler] Caught error:
> >     Segmentation fault (signal 11)
> >     [.. .compute.internal:mpi_rank_0][error_sighandler] Caught error:
> >     Segmentation fault (signal 11)
> >     [.. .compute.internal:mpi_rank_1][error_sighandler] Caught error:
> >     Segmentation fault (signal 11)
> >
> >     Any hint what might have gone wrong or howto setup MVAPICH2-GDR on an
> >     EC2 instance correctly?
> >     Regards
> >
> >     Marius
> >
> >     --
> >     M.Sc. Marius Brehler
> >     Research Associate/Ph.D. Candidate
> >
> >     TU Dortmund University
> >     Chair for High Frequency Technology
> >     44227 Dortmund, Germany
> >     Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich.
> Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der
> für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den
> Absender und vernichten Sie diese Mail. Vielen Dank.
> >     Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines
> solchen Schriftstücks per Telefax erfolgen.
> >
> >     Important note: The information included in this e-mail is
> confidential. It is solely intended for the recipient. If you are not the
> intended recipient of this e-mail please contact the sender and delete this
> message. Thank you. Without prejudice of e-mail correspondence, our
> statements are only legally binding when they are made in the conventional
> written form (with personal signature) or when such documents are sent by
> fax.
> >
> >
>  ------------------------------------------------------------------------
> >
> >     mvapich-discuss mailing list
> >     mvapich-discuss at cse.ohio-state.edu
> >     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> > /Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich.
> > Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht
> > der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den
> > Absender und vernichten Sie diese Mail. Vielen Dank.
> > Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
> > ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
> > Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung
> > eines solchen Schriftstücks per Telefax erfolgen.
> >
> > Important note: The information included in this e-mail is confidential.
> > It is solely intended for the recipient. If you are not the intended
> > recipient of this e-mail please contact the sender and delete this
> > message. Thank you. Without prejudice of e-mail correspondence, our
> > statements are only legally binding when they are made in the
> > conventional written form (with personal signature) or when such
> > documents are sent by fax. /
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie
> ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für
> diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender
> und vernichten Sie diese Mail. Vielen Dank.
> Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines
> solchen Schriftstücks per Telefax erfolgen.
>
> Important note: The information included in this e-mail is confidential.
> It is solely intended for the recipient. If you are not the intended
> recipient of this e-mail please contact the sender and delete this message.
> Thank you. Without prejudice of e-mail correspondence, our statements are
> only legally binding when they are made in the conventional written form
> (with personal signature) or when such documents are sent by fax.
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180822/19b638ef/attachment-0001.html>


More information about the mvapich-discuss mailing list