[mvapich-discuss] 'Error getting event!' when combining MVAPICH2 (v 2.3.1) and DDN IME (v 1.2.2)

Subramoni, Hari subramoni.1 at osu.edu
Wed Jul 31 10:08:28 EDT 2019


Hi, All.

Setting MV2_USE_RDMA_CM=0 resolved the issues for the user. The next release will have a patch which will ensure that this will be handled internally.

Best,
Hari.

From: Subramoni, Hari <subramoni.1 at osu.edu>
Sent: Friday, July 19, 2019 12:16 PM
To: Judit Planas <judit.planas at epfl.ch>; mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Cc: Jean-Thomas Acquaviva <jtacquaviva at ddn.com>; Subramoni, Hari <subramoni.1 at osu.edu>
Subject: RE: [mvapich-discuss] 'Error getting event!' when combining MVAPICH2 (v 2.3.1) and DDN IME (v 1.2.2)

Hi, Judit.

Sorry to hear that you are facing issues. MVAPIC2 2.3.1 has the patch for IME contributed by Sylvain Didelot @DDN.

Since we do not have access to DDN IME’s locally, I am not able to check it. Would it be possible to get access to the systems so that we can try things out?

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> <mvapich-discuss-bounces at mailman.cse.ohio-state.edu<mailto:mvapich-discuss-bounces at mailman.cse.ohio-state.edu>> On Behalf Of Judit Planas
Sent: Friday, July 19, 2019 11:56 AM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Cc: Jean-Thomas Acquaviva <jtacquaviva at ddn.com<mailto:jtacquaviva at ddn.com>>
Subject: [mvapich-discuss] 'Error getting event!' when combining MVAPICH2 (v 2.3.1) and DDN IME (v 1.2.2)

Dear all,

I'm trying to run an IOR benchmark with MVAPICH2 (v 2.3.1) compiled and linked with DDN IME libraries on our GPFS+IME cluster, but I get an error (infinite number of 'Error getting event!' messages until my disk quota is exceeded, or job times out).


This is the IOR output I get:
$ /usr/bin/srun -N 2 --ntasks-per-node=1 ior -o ime:///gpfs-path/test -a MPIIO -b 8M -t 4K -i 3 -w -W -r -R -k
srun: error: OverSubscribe specified more than once, latest value used
IOR-3.3.0+dev: MPI Coordinated Test of Parallel I/O
Began               : Fri Jul 19 17:36:40 2019
Command line        : ior -o ime:///gpfs-path/test -a MPIIO -b 8M -t 4K -i 3 -w -W -r -R -k
Machine             : Linux
TestID              : 0
StartTime           : Fri Jul 19 17:36:40 2019

Options:
api                 : MPIIO
apiVersion          : (3.1)
test filename       : ime:///gpfs-path/test
access              : single-shared-file
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : no tasks offsets
tasks               : 2
clients per node    : 1
repetitions         : 3
xfersize            : 4096 bytes
blocksize           : 8 MiB
aggregate filesize  : 16 MiB

Results:

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
write     78.92      8192       4.00       0.180681   0.006516   0.015874   0.202729   0
read      1276.27    8192       4.00       0.000382   0.011304   0.001179   0.012537   0
write     1062.50    8192       4.00       0.001177   0.005829   0.008381   0.015059   1
read      1263.84    8192       4.00       0.000399   0.011309   0.001280   0.012660   1
write     984.97     8192       4.00       0.000625   0.003328   0.012621   0.016244   2
read      1355.32    8192       4.00       0.000375   0.010890   0.000870   0.011805   2
Max Write: 1062.50 MiB/sec (1114.12 MB/sec)
Max Read:  1355.32 MiB/sec (1421.16 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
write        1062.50      78.92     708.80     446.51  272001.22   20204.36  181452.51  114307.26    0.07801     0      2   1    3   0     0        1         0    0      1  8388608     4096      16.0 MPIIO      0
read         1355.32    1263.84    1298.48      40.51  346962.92  323544.12  332410.78   10371.55    0.01233     0      2   1    3   0     0        1         0    0      1  8388608     4096      16.0 MPIIO      0
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
Error getting event!
[infinite times until I Ctrl-C execution]
(The 'Finished' message from IOR was never printed)


If I run the exact same command without the 'ime://' prefix, IOR runs as expected:
$ /usr/bin/srun -N 2 --ntasks-per-node=1 ior -o /gpfs-path/test -a MPIIO -b 8M -t 4K -i 3 -w -W -r -R -k
--> execution OK!

Also, if I run on 1 single node (disregarding the number of tasks per node), IOR runs as expected:
$ /usr/bin/srun -N 1 --ntasks-per-node=16 ior -o ime:///gpfs-path/test -a MPIIO -b 8M -t 4K -i 3 -w -W -r -R -k
--> execution OK!

In addition, when using older MVAPICH version provided with DDN IME package, IOR works always as expected (tried many combinations).


Software versions and configurations:
DDN IME:
version 1.2.2-1573

MVAPICH:
MVAPICH2 2.3.1: http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.1.tar.gz
./configure CC=gcc CXX=g++ FC=gfortran F77=gfortran CFLAGS=-pipe --enable-dc --with-file-system=ime+gpfs+nfs+ufs CFLAGS=-I/opt/ddn/ime/include/ LDFLAGS=-L/opt/ddn/ime/lib/ -lim_client --with-pmi=pmi2 --with-pm=slurm
Using GCC 6.4.0

IOR:
cloned repository from github.com:hpc/ior.git
commit 749a06dcbbef93833432f758e970e8927b81fa53 (Jun 29 2019)
No special flags to configure, just setting PATH and LD_LIBRARY path to MVAPICH2 installation (bin and lib directories respectively)


Am I missing something? Any help will be welcome.

Thanks in advance,
Judit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190731/627fa2d2/attachment-0001.html>


More information about the mvapich-discuss mailing list