[mvapich-discuss] Problem with MPI_init

Subramoni, Hari subramoni.1 at osu.edu
Mon Feb 3 15:59:48 EST 2020


Hi, Harald.

Sorry to hear that you are facing issue with MVAPICH2.

Can you give some more information about the following


  1.  Version of MVAPICH2 you are using
     *   Output of mpiname -a will help
  2.  How many nodes and processes per node
  3.  Lscpu from your system
  4.  What version of OFED do you have on your system
  5.  What Network adapter does your system have
     *   If InfiniBand, please send the output of ibv_devinfo

One that I can suggest is as follows - can you try setting MV2_ENABLE_AFFINITY to 0 and see if the program is able to execute fine?

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of alf
Sent: Monday, February 3, 2020 1:57 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Problem with MPI_init


Hello ,

when starting my mpi-program an error occurs in the MPI-routine: MPI_init called from the fortran -source:

subroutine Initialize_MPI
!+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

use IO_PARAMETERS
use AUXILLIARY
use SYSTEM_CONSTANTS
use DOMAIN_TOPOLOGY
use MPIHEADER

!+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
!++++  MPI initialization


write(*,*) 'Initialize_MPI 111'

call MPI_INIT (ierr)

write(*,*) 'Initialize_MPI 111'


call MPI_COMM_RANK (MPI_COMM_WORLD, MPI_process_id, ierr)   !+++  get MPI process ID
call MPI_COMM_SIZE (MPI_COMM_WORLD, num_MPI_procs , ierr)   !+++  get the total no. of MPI processes
.

.

.
the error-report in the console is as follows:



*** Saving parameters...
***    No changes in this job's data.
*** New start: Creating process
*** Writing status file
***    File: "%config%/Status/BlueFlame_statusfile"
***    Absolute file location: "/media/alf/BlueFlame/Projects/proj_BlueGrid/dsgn_test/conf_test/Status/BlueFlame_statusfile"
***    Content: "irest : 0"
***    OK!
*** Writing hosts file
***    File: "%job%/hostfile"
***    Absolute file location: "/media/alf/BlueFlame/BlueFlame/GUI/settings/jobs/1578073215633/hostfile"
***    OK!
*** Getting work directory
***    Directory: "%exepath%/%os%"
***    Absolute directory location: "/media/alf/BlueFlame/BlueFlame/GUI/executables/Linux"
***    OK!
*** Getting executable command
***    File: "%exepath%/%os%/BlueFlame_Run_Script.bat"
***    Absolute file location: "/media/alf/BlueFlame/BlueFlame/GUI/executables/Linux/BlueFlame_Run_Script.bat"
***    OK!
*** Loading environment settings
***    Using system, variables and blueflame environments
***    OK!
***
*** Process command: /media/alf/BlueFlame/BlueFlame/GUI/executables/Linux/BlueFlame_Run_Script.bat
***
*** Trying to start the process
*** Time: 2020-02-03 19:40:55
*** Success! Process is running, new PID is 2430
00000 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
00001
00002 ++++ BlueFlame Run  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
00003
00004 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
00005 Exe_Dir    : /media/alf/BlueFlame/BlueFlame/GUI/executables/Linux
00006 OS    : Linux
00007 libpath    : /media/alf/BlueFlame/BlueFlame/GUI/libs
00008 config    : /media/alf/BlueFlame/Projects/proj_BlueGrid/dsgn_test/conf_test
00009 slots    : 3
00010 hostfile    : /media/alf/BlueFlame/BlueFlame/GUI/settings/jobs/1578073215633/hostfile
00011 MPI dir    : /media/alf/BlueFlame/BlueFlame/GUI/libs/Linux/mvapich_GFORTRAN
00012 PATH     : /media/alf/BlueFlame/BlueFlame/GUI/libs/Linux/mvapich_GFORTRAN/bin:/home/alf/bin:/home/alf/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
00013 LIB_PATH : :/media/alf/BlueFlame/BlueFlame/GUI/libs/Linux/mvapich_GFORTRAN/lib
00014 Executable : /media/alf/BlueFlame/BlueFlame/GUI/executables/Linux/BlueFlame_GFORTRAN_mvapich.exe
00015 MVAPICH Processes :   /media/alf/BlueFlame/BlueFlame/GUI/libs/Linux/mvapich_GFORTRAN/bin/mpirun -np 3 -bind-to core -prepend-rank -env MV2_SMP_USE_CMA 0 /media/alf/BlueFlame/BlueFlame/GUI/executables/Linux/BlueFlame_GFORTRAN_mvapich.exe BlueFlame /media/alf/BlueFlame/Projects/proj_BlueGrid/dsgn_test/conf_test /media/alf/BlueFlame/BlueFlame/GUI/libs Linux /media/alf/BlueFlame/BlueFlame/GUI/settings/jobs/1578073215633/hostfile
00016 [0]  Initialize_MPI 111
00017 [1]  Initialize_MPI 111
00018 [2]  Initialize_MPI 111
[0]
[0] Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
[0]
[0] Backtrace for this error:
[2]
[2] Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
[2]
[2] Backtrace for this error:
[1]
[1] Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
[1]
[1] Backtrace for this error:
[0] #0  0x7f23ac58f2da in ???
[0] #1  0x7f23ac58e503 in ???
[0] #2  0x7f23ab9cbf1f in ???
[0] #3  0x7f23acecf8e3 in ???
[0] #4  0x7f23ace19600 in ???
[0] #5  0x7f23acd7dc1e in ???
[0] #6  0x7f23acd7d65b in ???
[0] #7  0x7f23ad2347ce in ???
[1] #0  0x7f1613a142da in ???
[1] #1  0x7f1613a13503 in ???
[1] #2  0x7f1612e50f1f in ???
[1] #3  0x7f16143548e3 in ???
[1] #4  0x7f161429e600 in ???
[1] #5  0x7f1614202c1e in ???
[1] #6  0x7f161420265b in ???
[1] #7  0x7f16146b97ce in ???
[0] #8  0x55ec210b919a in initialize_mpi_
[0]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/MPI/initialize_MPI.F90:18
[0] #9  0x55ec210389e7 in initialize_bluesystem_
[0]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Pre/initialize_BlueSystem.F90:38
[0] #10  0x55ec2103692c in bluesystem
[0]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:15
[2] #0  0x7f26b20392da in ???
[0] #11  0x55ec21036a0d in main
[0]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:7
[2] #1  0x7f26b2038503 in ???
[2] #2  0x7f26b1475f1f in ???
[2] #3  0x7f26b29798e3 in ???
[2] #4  0x7f26b28c3600 in ???
[2] #5  0x7f26b2827c1e in ???
[2] #6  0x7f26b282765b in ???
[2] #7  0x7f26b2cde7ce in ???
[1] #8  0x56141578819a in initialize_mpi_
[1]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/MPI/initialize_MPI.F90:18
[1] #9  0x5614157079e7 in initialize_bluesystem_
[1]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Pre/initialize_BlueSystem.F90:38
[1] #10  0x56141570592c in bluesystem
[1]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:15
[1] #11  0x561415705a0d in main
[1]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:7
[2] #8  0x55f9a046b19a in initialize_mpi_
[2]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/MPI/initialize_MPI.F90:18
[2] #9  0x55f9a03ea9e7 in initialize_bluesystem_
[2]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Pre/initialize_BlueSystem.F90:38
[2] #10  0x55f9a03e892c in bluesystem
[2]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:15
[2] #11  0x55f9a03e8a0d in main
[2]     at /media/alf/BlueFlame/Sources/BlueFlame/BlueSystem/Kernel/BlueSystem.F90:7
00019
00020 ===================================================================================
00021 =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
00022 =   PID 2434 RUNNING AT alf-home-pc
00023 =   EXIT CODE: 136
00024 =   CLEANING UP REMAINING PROCESSES
00025 =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
00026 ===================================================================================
00027 YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)
00028 This typically refers to a problem with your application.
00029 Please see the FAQ page for debugging suggestions
*** End of process
*** Time: 2020-02-03 19:40:55
*** Total process time: 0 hours 0 minutes 0.183 seconds
*** Process terminated. Exit value: 136 (error!)
*** Caution! Some output was written to the error stream. The program may has encountered a problem.
*** The program exited irregularly!





the same program works well with mpich or openmpi.



Thank you in advance for your reply.   Best regards,   Harald

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200203/27f44344/attachment-0001.html>


More information about the mvapich-discuss mailing list