[mvapich-discuss] MVAPICH2 on aarch64

Sreenidhi Bharathkar Ramesh sreenira at broadcom.com
Mon Dec 21 06:22:44 EST 2015


Hello,

Thank you for the response and suggestion.   Please note:

- I had tried with MV2_SMP_USE_CMA=0 earlier.  Forgot to mention this in the original email.

- I re-compiled and executed OMB on Qemu for aarch64.  Qemu was slower than Juno evaluation board, but there were no run-time issues on qemu.  This probably means problem (seg fault, etc..) is specific to Juno board.

- patch for aarch64 compilation: thanks for offering to take it.  Please review and accept this patch (SVN DIFF). 
Testing:  done on qemu for aarch64.  OSU micro benchmark basic tests successful

Thanks,
- Sreenidhi.

From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] 
Sent: Saturday, 19 December, 2015 12:07 AM
To: Sreenidhi Bharathkar Ramesh; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] MVAPICH2 on aarch64

Thanks for the report.  Can you try disabling CMA by setting MV2_SMP_USE_CMA=0 when you run OMB?

We'll be happy to take in the provided patch as well.  Thanks again.
On Thu, Dec 17, 2015 at 11:34 PM Sreenidhi Bharathkar Ramesh <sreenira at broadcom.com> wrote:
Hello,

I am trying the MVAPICH2 on aarch64, specifically Juno evaluation board and am observing run time errors.  Please note that this is a single node.

Also, please note that same procedure is flawless on x86 platform.

1. For compilation, I had to include the following patch:

src/mpid/ch3/channels/common/include/mv2_clock.h
+#elif defined(__aarch64__)
+typedef unsigned long cycles_t;
+static inline cycles_t get_cycles()
+{
+    cycles_t ret;
+
+    asm volatile ("isb" : : : "memory");
+    asm volatile ("mrs %0, cntvct_el0" : "=r" (ret));
+
+    return ret;
+}

2. While executing OSD benchmarks, one of the following errors was seen, every time.

a> test hangs
b> seg fault observed

3. Questions:

a> Has MVAPICH been tried on aarch64 platform ?  Am I missing any code delta, apart from above, in point #1 ?
b> what could be the root cause for the error ?

Please let me know.

Thanks,
- Sreenidhi.


----------

Error appendix:

sreenira at berlin:osu_benchmarks$ mpirun --version
HYDRA build details:
    Version:                                 3.1.4
    Release Date:                            Thu Apr  2 17:15:15 EDT 2015
    CC:                              gcc
    CXX:                             g++
    F77:                             gfortran
    F90:                             gfortran
    Configure options:                       '--disable-option-checking' '--prefix=/home/sreenira/install-mvapich2' '--with-ib-libpath=/home/sreenira/install-libibverbs/lib' '--with-ib-include=/home/sreenira/install-libibverbs/include' '--disable-mcast' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -DNDEBUG -DNVALGRIND -O2' 'LDFLAGS=-L/home/sreenira/install-libibverbs/lib -L/lib -L/lib -L/lib -Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/home/sreenira/install-libibverbs/lib -L/lib -L/lib' 'LIBS=-libverbs -ldl -lrt -lm -lpthread ' 'CPPFLAGS=-I/home/sreenira/install-libibverbs/include -I/home/sreenira/hpcc/mvapich2-2.1/src/mpl/include -I/home/sreenira/hpcc/mvapich2-2.1/src/mpl/include -I/home/sreenira/hpcc/mvapich2-2.1/src/openpa/src -I/home/sreenira/hpcc/mvapich2-2.1/src/openpa/src -D_REENTRANT -I/home/sreenira/hpcc/mvapich2-2.1/src/mpi/romio/include -I/include -I/include -I/home/sreenira/install-libibverbs/include -I/include -I/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:
    Demux engines available:                 poll select


sreenira at berlin:osu_benchmarks$ mpirun -np 2 ./osu_bw
# OSU MPI Bandwidth Test
# Size        Bandwidth (MB/s)
1                         0.39
2                         0.78
4                         1.58
8                         3.16
16                        6.34
32                       12.61
64                       24.92
128                      49.09
256                      95.43
512                     179.93
1024                    323.46
2048                    560.85
4096                    898.34
8192                   1299.09
[berlin:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 673 RUNNING AT berlin
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

sreenira at berlin:osu_benchmarks$ mpirun -np 2 ./osu_bw
# OSU MPI Bandwidth Test
# Size        Bandwidth (MB/s)
1                         0.39
2                         0.78
4                         1.58
8                         3.16
16                        6.35
32                       12.63
64                       24.91
128                      49.05
256                      95.30
512                     179.89
1024                    328.76
2048                    569.84
4096                    911.26
8192                   1326.21
[berlin:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[cli_1]: aborting job:
Fatal error in PMPI_Waitall:
Other MPI error, error stack:
PMPI_Waitall(323).................: MPI_Waitall(count=64, req_array=0x822100, status_array=0xc330a0) failed
MPIR_Waitall_impl(166)............:
_MPIDI_CH3I_Progress(214).........:
MPIDI_CH3I_SMP_read_progress(1110):
MPIDI_CH3I_SMP_readv_rndv(4550)...: CMA: (MPIDI_CH3I_SMP_readv_rndv) process_vm_readv fail


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 993 RUNNING AT berlin
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions



_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aarch64_get_cycles_patch.diff
Type: application/octet-stream
Size: 750 bytes
Desc: aarch64_get_cycles_patch.diff
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151221/729ccbc7/attachment.obj>


More information about the mvapich-discuss mailing list