[mvapich-discuss] MPI_Ibcast issue with MVAPICH2-GDR 2.3a

Marius Brehler marius.brehler at tu-dortmund.de
Sat Sep 8 07:33:33 EDT 2018


Hi,
I am currently facing the following error while using CUDA-aware
MPI_Ibcast on a EC2 p2.8xlarge instance:

[ip-172-31-43-122.us-east-2.compute.internal:mpi_rank_0][cudaipc_register]
src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_ipc.c:283:
cudaIpcOpenMemHandle failed
: No such file or directory (2)

The application involves 8 GPUs, and needs to send extremely large
messages. In the simulation each GPU has to share 30x2^(21) real valued
elements, e.g. 480 MiB. The applications fails with the error message
above. Halving the problem size so that each GPU needs to share 240 MiB,
the algorithm passes. However, processing the halved problem on 4 GPUs,
so that each needs to share 480 MiB again, the algorithm passes.

Since the aws instance features no IB HCA, I set MV2_USE_GPUDIRECT=0.
Toggling MV2_CUDA_USE_IPC_BCAS has no influence on the issue. Due to
comparing different implementations relying on different communication
patterns, I am quite sure that the problem is linked to MPI_Ibcast. The
version of MVAPICH2-GDR used is 2.3a, build with GNU 4.8.5 (w/o SLURM)
for MLNX-OFED 4.3 and CUDA 9.2. Any idea what may have gone wrong?
Regards


Marius

-- M.Sc. Marius Brehler
Research Associate/Ph.D. Candidate

TU Dortmund University
Chair for High Frequency Technology
44227 Dortmund, Germany
Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.



More information about the mvapich-discuss mailing list