[mvapich-discuss] Benchmark results

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Jan 25 20:56:32 EST 2018


Good to know that with Jahanzeb's suggestion option, you can verify the IB being used and with good results. 

OSU MPI benchmarks should work with any MPI library. For the exact issues you are seeing with MPICH installation, please contact the MPICH team. 

Thanks, 

DK
________________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of admin at genome.arizona.edu [admin at genome.arizona.edu]
Sent: Thursday, January 25, 2018 7:55 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Benchmark results

Hashmi, Jahanzeb wrote on 01/25/2018 12:49 PM:
 > It appears that you are running the benchmarks without specifying the
 > -hostfile <hostsfile> parameter to the mpirun.

Thanks Jahanzeb, that helped!  I thought default location of hostfile
would be used, but now I can verify the IB is being used and with good
results.

However, when I try to use MPICH (3.2.1) which was compiled with MXM
libraries, I see segmentation faults.  Is this normal since the OSU
benchmarks are not compatible with other MPI implementations?  Or
perhaps our MPICH installation is broken?  (I tried recompiling MPICH
without MXM so it would just use 1Gb ethernet and the OSU benchmarks
worked fine but with slower performance of course).

For example,

/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt$ mpirun -np 2
-hostfile /opt/machinelist ./osu_bibw
[1516927587.938298] [n002:26470:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 2101.00
[1516927588.116287] [n001:26968:0]    proto_ep.c:179  MXM  WARN  tl dc
is requested but not supported
[1516927588.604548] [n002:26470:0]    proto_ep.c:179  MXM  WARN  tl dc
is requested but not supported
# OSU MPI Bi-Directional Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
1                       0.68
2                       1.41
4                       6.35
8                      12.78
16                     11.18
32                     22.04
64                     42.82
[n001:26968:0] Caught signal 11 (Segmentation fault)
==== backtrace ====
  2 0x000000000005767c mxm_handle_error()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641
  3 0x00000000000577ec mxm_error_signal_handler()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616
  4 0x0000003c80832510 killpg()  ??:0
  5 0x0000000000056258 mxm_mpool_put()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/util/datatype/mpool.c:210
  6 0x00000000000689ce mxm_cib_ep_poll_tx()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/tl/cib/cib_progress.c:527
  7 0x000000000006913d mxm_cib_ep_progress()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/tl/cib/cib_progress.c:552
  8 0x000000000004268a mxm_notifier_chain_call()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/./mxm/util/datatype/callback.h:74
  9 0x000000000004268a mxm_progress_internal()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/core/mxm.c:64
10 0x000000000004268a mxm_progress()
/scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u9-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.0.0-gcc-inbox-redhat6.9-x86_64/mxm-v3.6/src/mxm/core/mxm.c:346
11 0x0000000000177a49 MPID_nem_mxm_poll()  ??:0
12 0x0000000000169be8 MPIDI_CH3I_Progress()  ??:0
13 0x00000000000d0ba7 MPIR_Waitall_impl()  ??:0
14 0x00000000000d1308 PMPI_Waitall()  ??:0
15 0x00000000004016f5 main()
/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt/osu_bibw.c:124
16 0x0000003c8081ed1d __libc_start_main()  ??:0
17 0x0000000000401269 _start()  ??:0
===================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 26968 RUNNING AT n001
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at n002.genome.arizona.edu] HYD_pmcd_pmip_control_cmd_cb
(pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
[proxy:0:1 at n002.genome.arizona.edu] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at n002.genome.arizona.edu] main (pm/pmiserv/pmip.c:202): demux
engine error waiting for event
[mpiexec at pac.genome.arizona.edu] HYDT_bscu_wait_for_completion
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at pac.genome.arizona.edu] HYDT_bsci_wait_for_completion
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at pac.genome.arizona.edu] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
completion
[mpiexec at pac.genome.arizona.edu] main (ui/mpich/mpiexec.c:340): process
manager error waiting for completion
/opt/downloads/osu-micro-benchmarks-5.4/mpi/pt2pt$


Thanks
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 6152 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180126/115f5127/attachment.bin>


More information about the mvapich-discuss mailing list