[mvapich-discuss] Fatal error in MPI_Init: Internal MPI error!, error stack:

Hari Subramoni subramoni.1 at osu.edu
Wed Sep 20 07:15:22 EDT 2017


Hi Jason,

Can you see if there are any firewalls active between the two nodes? Could
you also send the output of ibstat or ibv_devinfo -v executed on both nodes?

Thx,
Hari.

On Wed, Sep 20, 2017 at 5:02 AM, Jason Collins <jasoncollinsw at gmail.com>
wrote:

> Hi Hari.
>
> Thank you so much for your help.
>
> I downloaded the latest version (mvapich2_2.3b) and configured with:
> # ./configure --prefix=/export/apps/gnu/libraries/mvapich
> --with-device=ch3:mrail --with-rdma=gen2
>
> Everything ok.
>
> I am going to perform the test to verify that everything is working
> correctly. I have 2 nodes. When only node 1 is active:
> # mpiexec -f hosts -n 4 ./cpi
> Process 3 of 4 is on compute1
> Process 0 of 4 is on compute1
> Process 1 of 4 is on compute1
> Process 2 of 4 is on compute1
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.000669
>
> When only node 2 is active:
> # mpiexec -f hosts -n 4 ./cpi
> Process 1 of 4 is on compute2
> Process 0 of 4 is on compute2
> Process 3 of 4 is on compute2
> Process 2 of 4 is on compute2
> pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> wall clock time = 0.000222
>
> When I activate the two nodes it happens the following:
> # mpiexec -f hosts -n 4 ./cpi
>
> The process tries to run but does not perform any action and does not give
> any error, just stands still.
>
> El mar., 19 sept. 2017 a las 14:48, Hari Subramoni (<subramoni.1 at osu.edu>)
> escribió:
>
>> Hi Jason,
>>
>> It looks like the maximum amount of memory that can be registered is
>> limited on your system. This needs to be updated for InfiniBand enabled MPI
>> libraries to work. The following section of the MVAPICH2 userguide has more
>> information on this.
>>
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/
>> mvapich2-2.3b-userguide.html#x1-1310009.1.4
>>
>> On a different note, I see that you are using an older version of
>> MVAPICH2. I would recommend updating to the latest version. You can
>> download this from our download page available at the following link
>>
>> http://mvapich.cse.ohio-state.edu/downloads/
>>
>> Please note that the Nemesis interface has been deprecated in the latest
>> MVAPICH2 release. We recommend using the OFA-IB-CH3 interface for best
>> performance and scalability. The following secion of the MVAPICH2
>> userguide has more information on how to build MVAPICH2 for the OFA-IB-CH3
>> interface.
>>
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/
>> mvapich2-2.3b-userguide.html#x1-120004.4
>>
>> Please let us know if you face any other issues.
>>
>> Thx,
>> Hari.
>>
>> On Tue, Sep 19, 2017 at 9:13 AM, Jason Collins <jasoncollinsw at gmail.com>
>> wrote:
>>
>>> Hello everyone.
>>>
>>> I have compiled mvapich2-2.2 with GNU. The flags that i post are:
>>> # export CC=gcc
>>> # export CXX=g++
>>> # export F77=gfortran
>>> # export FC=gfortran
>>> # export FCFLAGS=-fPIC
>>> # export FFLAGS=-fPIC
>>>
>>> I have configured with:
>>> # ./configure --prefix=/export/apps/gnu/libraries/mvapich
>>> --with-device=ch3:nemesis:ib
>>>
>>> And I get the following message:
>>>
>>> Hwloc optional build support status (more details can be found above):
>>> ------------------------------------------------------------
>>> -----------------
>>> Probe / display I/O devices: PCI(linux)
>>> Graphical output (Cairo):
>>> XML input / output: basic
>>> libnuma memory support: no
>>> Plugin support: no
>>> ------------------------------------------------------------
>>> -----------------
>>> Configuration completed.
>>>
>>> The following:
>>> # make clean && make && make check && make install
>>>
>>> Up to this point everything is correct. Then I did a test with test
>>> "./icp" and the message that comes out of me is the following:
>>> # mpiexec -f hosts -n 4 ./cpi
>>> Fatal error in MPI_Init: Internal MPI error!, error stack:
>>> MPIR_Init_thread(514)...............:
>>> MPID_Init(365)......................: channel initialization failed
>>> MPIDI_CH3_Init(104).................:
>>> MPID_nem_init(320)..................:
>>> MPID_nem_ib_init(379)...............: Failed to setup startup ring
>>> MPID_nem_ib_setup_startup_ring(1747):
>>> rdma_ring_based_allgather(1558).....: ibv_reg_mr failed for addr_hndl
>>> =================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = PID 70664 RUNNING AT compute1
>>> = EXIT CODE: 1
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> =================================================================
>>>
>>> The hosts file is:
>>> # cat hosts
>>> compute1
>>> #compute2
>>> #compute3
>>>
>>> Could someone help me? Thank you
>>>
>>> <https://audio1.spanishdict.com/audio?lang=en&text=could-someone-help-me-thank-you>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170920/ea26a77d/attachment.html>


More information about the mvapich-discuss mailing list