[mvapich-discuss] Some problem with osu_bw when I want to use GPU device buffer

Zhuangliang zhuangliang at huawei.com
Tue Sep 23 05:07:47 EDT 2014


To whom it may concern,

I'm trying to use "mvapich2-gdr" library.

When I use osu_bw example.
If I use "Send Buffer on HOST (H) and Receive Buffer on HOST (H)". Everything is ok.
(Command line :  mpirun_rsh -np 2 linux-dell RCA61 ./osu_bw -d 'cuda' H H)
Linux-dell and RCA61 are the hosts.

But if I try to allocator one of the send/recv buffer in GPU. Then some errors happened.
(e.g. Command line :  mpirun_rsh -np 2 linux-dell RCA61 ./osu_bw -d 'cuda' D H)

And the error information is as following:
# OSU MPI-CUDA Bandwidth Test
# Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)
# Size        Bandwidth (MB/s)
[linux-dell:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[linux-dell:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 6. MPI process died?
[linux-dell:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[linux-dell:mpispawn_0][child_handler] MPI process (rank: 0, pid: 14635) terminated with signal 11 -> abort job
[linux-dell:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node RCA61 aborted: Error while reading a PMI socket (4)
[RCA61:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died?
[RCA61:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died?
[RCA61:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI process died?
[RCA61:mpispawn_1][report_error] connect() failed: Connection refused (111)

It will be appreciated if you can give me some support.

Thank you very much!

Jacob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140923/9bceacdc/attachment.html>


More information about the mvapich-discuss mailing list