[mvapich-discuss] MVAPICH2-2.3.3 giving me floating point error (signal 8)
Subramoni, Hari
subramoni.1 at osu.edu
Mon Jun 22 11:07:56 EDT 2020
Hi, Shaleen.
Is this a single socket system?
We recently released a newer version of MVAPICH2 (2.3.4). Can you please try that? That fixes some issue similar to this.
If you observe a similar issue with MVAPICH2 2.3.4, can you do the following.
1. Reconfigure MVAPICH2 with “./configure --with-device=ch3:mrail --with-rdma=gen2 –enable-g=all and –enable-fast=none”
2. Add MV2_DEBUG_SHOW_BACKTRACE=2 when running it
That will tell us where the seg fault occurs.
Thx,
Hari.
From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Shaleen Garg
Sent: Monday, June 22, 2020 9:48 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] MVAPICH2-2.3.3 giving me floating point error (signal 8)
Hi All,
I am trying to install mvapich on a machine with Mellanox IB:
$lspci | grep “Mellanox”
Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
To install, I followed the user guide. Since this is a new machine, I have installed the following packages ( on ubuntu 18.04 with linux version 4.15.0-55-generic): libibmad-dev libibumad-dev libibumad3 libibverbs-dev gfortran infiniband-diags rdma-core.
Installation Method:
$ ./configure --with-device=ch3:mrail --with-rdma=gen2
$ make -j
$ sudo make install
Now this installs fine. But, when I run a hello world program:
$ mpirun -env MV2_SMP_USE_CMA=0 -np 10 ./a.out
I get the following error:
[apt140:mpi_rank_2][error_sighandler] Caught error: Floating point exception (signal 8)
…
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 13854 RUNNING AT apt140
= EXIT CODE: 8
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Is there something I am missing ? I don’t know why even within the node, mpi hello world is not working. The code I am testing on comes from https://mpitutorial.com/tutorials/mpi-hello-world/<https://urldefense.com/v3/__https:/mpitutorial.com/tutorials/mpi-hello-world/__;!!KGKeukY!ndjrdSb_kPhyBYYFaVpap2wx7Sjs9GYbPHbGiLhdLYm6Fywr1qbtlIARBeAoBItHbbeNCglyHo71nTU$>
Regards,
Shaleen Garg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200622/c1774235/attachment-0001.html>
More information about the mvapich-discuss
mailing list