[mvapich-discuss] mvapich2 hang on startup
Chakraborty, Sourav
chakraborty.52 at buckeyemail.osu.edu
Wed Sep 26 18:18:17 EDT 2018
Hi Joe,
Can you please set the environment variable MV2_USE_RoCE=1 and see if it fixes the hang?
The userguide has some more information on setting up a RoCE environment:
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3-userguide.html#x1-420005.2.7
Thanks,
Sourav
On Wed, Sep 26, 2018 at 5:13 PM Kenny, Joseph P <jpkenny at sandia.gov<mailto:jpkenny at sandia.gov>> wrote:
Hi,
I’m trying to get mvapich2-2.3 up and running for testing on a Mellanox 100G Ethernet system (I’d like to test RoCE). I have a ‘--with-device=ch3:nemesis:tcp’ build that is working fine, but my ‘--with-device=ch3:mrail --with-rdma=gen2’ build hangs during startup:
PMI response: cmd=barrier_out
#0 0x00007fd946fbda20 in __poll_nocancel () from /lib64/libc.so.6
#1 0x000000000042bb55 in HYDT_dmxu_poll_wait_for_event (wtime=-1)
at ../../../../mvapich2-2.3/src/pm/hydra/tools/demux/demux_poll.c:39
#2 0x000000000040b10b in HYD_pmci_wait_for_completion (timeout=-1)
at ../../../../mvapich2-2.3/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:195
#3 0x0000000000403f3a in main (argc=<optimized out>, argv=<optimized out>)
at ../../../../mvapich2-2.3/src/pm/hydra/ui/mpich/mpiexec.c:339
The behavior looks very similar to this previous thread:
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-June/005634.html
Details on my HCA and OFED are:
02:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
CA type: MT4119
Firmware version: 16.23.1020
Hardware version: 0
MLNX_OFED_LINUX-4.4-2.0.7.0 (OFED-4.4-2.0.7)
I imagine it’s something that I’m misconfiguring. Any pointers on debugging this?
Thanks,
Joe
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180926/1270cc8f/attachment-0003.html>
More information about the mvapich-discuss
mailing list