[mvapich-discuss] MVAPICH2 on aarch64

Sreenidhi Bharathkar Ramesh sreenira at broadcom.com
Thu Dec 17 23:34:01 EST 2015


Hello,

I am trying the MVAPICH2 on aarch64, specifically Juno evaluation board and am observing run time errors.  Please note that this is a single node.

Also, please note that same procedure is flawless on x86 platform.

1. For compilation, I had to include the following patch:

src/mpid/ch3/channels/common/include/mv2_clock.h
+#elif defined(__aarch64__)
+typedef unsigned long cycles_t;
+static inline cycles_t get_cycles()
+{
+    cycles_t ret;
+
+    asm volatile ("isb" : : : "memory");
+    asm volatile ("mrs %0, cntvct_el0" : "=r" (ret));
+
+    return ret;
+}

2. While executing OSD benchmarks, one of the following errors was seen, every time.

a> test hangs
b> seg fault observed

3. Questions:

a> Has MVAPICH been tried on aarch64 platform ?  Am I missing any code delta, apart from above, in point #1 ?
b> what could be the root cause for the error ?

Please let me know.

Thanks,
- Sreenidhi.


----------

Error appendix:

sreenira at berlin:osu_benchmarks$ mpirun --version
HYDRA build details:
    Version:                                 3.1.4
    Release Date:                            Thu Apr  2 17:15:15 EDT 2015
    CC:                              gcc
    CXX:                             g++
    F77:                             gfortran
    F90:                             gfortran
    Configure options:                       '--disable-option-checking' '--prefix=/home/sreenira/install-mvapich2' '--with-ib-libpath=/home/sreenira/install-libibverbs/lib' '--with-ib-include=/home/sreenira/install-libibverbs/include' '--disable-mcast' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -DNDEBUG -DNVALGRIND -O2' 'LDFLAGS=-L/home/sreenira/install-libibverbs/lib -L/lib -L/lib -L/lib -Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/home/sreenira/install-libibverbs/lib -L/lib -L/lib' 'LIBS=-libverbs -ldl -lrt -lm -lpthread ' 'CPPFLAGS=-I/home/sreenira/install-libibverbs/include -I/home/sreenira/hpcc/mvapich2-2.1/src/mpl/include -I/home/sreenira/hpcc/mvapich2-2.1/src/mpl/include -I/home/sreenira/hpcc/mvapich2-2.1/src/openpa/src -I/home/sreenira/hpcc/mvapich2-2.1/src/openpa/src -D_REENTRANT -I/home/sreenira/hpcc/mvapich2-2.1/src/mpi/romio/include -I/include -I/include -I/home/sreenira/install-libibverbs/include -I/include -I/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:
    Demux engines available:                 poll select

	
sreenira at berlin:osu_benchmarks$ mpirun -np 2 ./osu_bw
# OSU MPI Bandwidth Test
# Size        Bandwidth (MB/s)
1                         0.39
2                         0.78
4                         1.58
8                         3.16
16                        6.34
32                       12.61
64                       24.92
128                      49.09
256                      95.43
512                     179.93
1024                    323.46
2048                    560.85
4096                    898.34
8192                   1299.09
[berlin:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 673 RUNNING AT berlin
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

sreenira at berlin:osu_benchmarks$ mpirun -np 2 ./osu_bw
# OSU MPI Bandwidth Test
# Size        Bandwidth (MB/s)
1                         0.39
2                         0.78
4                         1.58
8                         3.16
16                        6.35
32                       12.63
64                       24.91
128                      49.05
256                      95.30
512                     179.89
1024                    328.76
2048                    569.84
4096                    911.26
8192                   1326.21
[berlin:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[cli_1]: aborting job:
Fatal error in PMPI_Waitall:
Other MPI error, error stack:
PMPI_Waitall(323).................: MPI_Waitall(count=64, req_array=0x822100, status_array=0xc330a0) failed
MPIR_Waitall_impl(166)............:
_MPIDI_CH3I_Progress(214).........:
MPIDI_CH3I_SMP_read_progress(1110):
MPIDI_CH3I_SMP_readv_rndv(4550)...: CMA: (MPIDI_CH3I_SMP_readv_rndv) process_vm_readv fail


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 993 RUNNING AT berlin
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

	



More information about the mvapich-discuss mailing list