[mvapich-discuss] Benchmark results

admin at genome.arizona.edu admin at genome.arizona.edu
Thu Jan 25 19:55:05 EST 2018


Hashmi, Jahanzeb wrote on 01/25/2018 12:49 PM:
 > It appears that you are running the benchmarks without specifying the
 > -hostfile <hostsfile> parameter to the mpirun.

Thanks Jahanzeb, that helped!  I thought default location of hostfile 
would be used, but now I can verify the IB is being used and with good 
results.

However, when I try to use MPICH (3.2.1) which was compiled with MXM 
libraries, I see segmentation faults.  Is this normal since the OSU 
benchmarks are not compatible with other MPI implementations?  Or 
perhaps our MPICH installation is broken?  (I tried recompiling MPICH 
without MXM so it would just use 1Gb ethernet and the OSU benchmarks 
worked fine but with slower performance of course).

For example,

/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt$ mpirun -np 2 
-hostfile /opt/machinelist ./osu_bibw
[1516927587.938298] [n002:26470:0]         sys.c:744  MXM  WARN 
Conflicting CPU frequencies detected, using: 2101.00
[1516927588.116287] [n001:26968:0]    proto_ep.c:179  MXM  WARN  tl dc 
is requested but not supported
[1516927588.604548] [n002:26470:0]    proto_ep.c:179  MXM  WARN  tl dc 
is requested but not supported
# OSU MPI Bi-Directional Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
1                       0.68
2                       1.41
4                       6.35
8                      12.78
16                     11.18
32                     22.04
64                     42.82
[n001:26968:0] Caught signal 11 (Segmentation fault)
==== backtrace ====
  2 0x000000000005767c mxm_handle_error() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641
  3 0x00000000000577ec mxm_error_signal_handler() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616
  4 0x0000003c80832510 killpg()  ??:0
  5 0x0000000000056258 mxm_mpool_put() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/datatype/mpool.c:210
  6 0x00000000000689ce mxm_cib_ep_poll_tx() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/tl/cib/cib_progress.c:527
  7 0x000000000006913d mxm_cib_ep_progress() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/tl/cib/cib_progress.c:552
  8 0x000000000004268a mxm_notifier_chain_call() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/./mxm/util/datatype/callback.h:74
  9 0x000000000004268a mxm_progress_internal() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/core/mxm.c:64
10 0x000000000004268a mxm_progress() 
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/core/mxm.c:346
11 0x0000000000177a49 MPID_nem_mxm_poll()  ??:0
12 0x0000000000169be8 MPIDI_CH3I_Progress()  ??:0
13 0x00000000000d0ba7 MPIR_Waitall_impl()  ??:0
14 0x00000000000d1308 PMPI_Waitall()  ??:0
15 0x00000000004016f5 main() 
/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt/osu_bibw.c:124
16 0x0000003c8081ed1d __libc_start_main()  ??:0
17 0x0000000000401269 _start()  ??:0
===================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 26968 RUNNING AT n001
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at n002.genome.arizona.edu] HYD_pmcd_pmip_control_cmd_cb 
(pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
[proxy:0:1 at n002.genome.arizona.edu] HYDT_dmxu_poll_wait_for_event 
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at n002.genome.arizona.edu] main (pm/pmiserv/pmip.c:202): demux 
engine error waiting for event
[mpiexec at pac.genome.arizona.edu] HYDT_bscu_wait_for_completion 
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated 
badly; aborting
[mpiexec at pac.genome.arizona.edu] HYDT_bsci_wait_for_completion 
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting 
for completion
[mpiexec at pac.genome.arizona.edu] HYD_pmci_wait_for_completion 
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for 
completion
[mpiexec at pac.genome.arizona.edu] main (ui/mpich/mpiexec.c:340): process 
manager error waiting for completion
/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt$


Thanks


More information about the mvapich-discuss mailing list