[mvapich-discuss] MVAPICH2-GDR on AWS EC2 instance

Marius Brehler marius.brehler at tu-dortmund.de
Mon Aug 20 18:49:43 EDT 2018


Hi,
in correction to my prior mail, the shutdown was initiated by the used job script and is not caused by the segfault. Sorry for this misleading information. Anyway, I would appreciate any hints if there is something special to keep in mind setting up MVAPICH2-GDR on an EC2 instance.
Best Regards

Marius

Am 20. August 2018 23:18:22 MESZ schrieb Marius Brehler <marius.brehler at tu-dortmund.de>:

Hi,

I am trying to use the CUDA-aware MPI_Ibcast within an application on an
AWS EC2 instance of type p2.8xlarge. This instance does not has an
Infiniband HCA. If I am informed correctly, it is nevertheless
recommended to use the GDR version if GPUs are involved.

I noticed that libmpi.so is linked against libibmad, libibumad and
libibverbs. For simplicity I installed the libs together with the
Mellanox OFED (which might not be a good idea).
As stated in the GDR userguide I also installed gdrcopy, but skipped the
nv_peer_mem module. I skipped it since our application runs fine on a
local node without loading the nv_peer_mem module. Our local node has
two K40c and also a Mellanox HCA installed (not involved in any
communication at the moment). The remaining configuration of the CentOS
on the local node and the EC2 instance is quite similar.

Unfortunately, our application is crashing so badly that the EC2
instanced instantly shuts down:

[.. .compute.internal:mpi_rank_0][error_sighandler] Caught error:
Segmentation fault (signal 11)
[.. .compute.internal:mpi_rank_0][error_sighandler] Caught error:
Segmentation fault (signal 11)
[.. .compute.internal:mpi_rank_1][error_sighandler] Caught error:
Segmentation fault (signal 11)

Any hint what might have gone wrong or howto setup MVAPICH2-GDR on an
EC2 instance correctly?
Regards

Marius

--
M.Sc. Marius Brehler
Research Associate/Ph.D. Candidate

TU Dortmund University
Chair for High Frequency Technology
44227 Dortmund, Germany
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.

________________________________

mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180821/a169ffb5/attachment-0001.html>


More information about the mvapich-discuss mailing list