[mvapich-discuss] MVAPICH2-2.3.3 giving me floating point error (signal 8)

Subramoni, Hari subramoni.1 at osu.edu
Mon Jun 22 11:07:56 EDT 2020


Hi, Shaleen.

Is this a single socket system?

We recently released a newer version of MVAPICH2 (2.3.4). Can you please try that? That fixes some issue similar to this.

If you observe a similar issue with MVAPICH2 2.3.4, can you do the following.


  1.  Reconfigure MVAPICH2 with “./configure --with-device=ch3:mrail --with-rdma=gen2 –enable-g=all and –enable-fast=none”
  2.  Add MV2_DEBUG_SHOW_BACKTRACE=2 when running it

That will tell us where the seg fault occurs.

Thx,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Shaleen Garg
Sent: Monday, June 22, 2020 9:48 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] MVAPICH2-2.3.3 giving me floating point error (signal 8)

Hi All,

I am trying to install mvapich on a machine with Mellanox IB:


$lspci | grep “Mellanox”

Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

To install, I followed the user guide. Since this is a new machine, I have installed the following packages ( on ubuntu 18.04 with linux version 4.15.0-55-generic): libibmad-dev libibumad-dev libibumad3 libibverbs-dev gfortran infiniband-diags rdma-core.

Installation Method:

$ ./configure --with-device=ch3:mrail --with-rdma=gen2

$ make -j

$ sudo make install


Now this installs fine. But, when I run a hello world program:


$ mpirun -env MV2_SMP_USE_CMA=0 -np 10 ./a.out


I get the following error:

[apt140:mpi_rank_2][error_sighandler] Caught error: Floating point exception (signal 8)
…

===================================================================================

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

=   PID 13854 RUNNING AT apt140

=   EXIT CODE: 8

=   CLEANING UP REMAINING PROCESSES

=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================

Is there something I am missing ? I don’t know why even within the node, mpi hello world is not working. The code I am testing on comes from https://mpitutorial.com/tutorials/mpi-hello-world/<https://urldefense.com/v3/__https:/mpitutorial.com/tutorials/mpi-hello-world/__;!!KGKeukY!ndjrdSb_kPhyBYYFaVpap2wx7Sjs9GYbPHbGiLhdLYm6Fywr1qbtlIARBeAoBItHbbeNCglyHo71nTU$>


Regards,
Shaleen Garg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200622/c1774235/attachment-0001.html>


More information about the mvapich-discuss mailing list