[mvapich-discuss] Datatypes Error
James Dinan
dinan at mcs.anl.gov
Tue Dec 14 19:51:53 EST 2010
Hi,
I've run into an error using MPI datatypes on MVAPICH-2 1.6rc1 on our IB
cluster. I've attached a test case that uses datatypes to do a
one-sided gather via an MPI_Get with datatypes. The program builds two
datatypes (origin and target) to transfer a patch of a 2d array into a
local buffer. For transfers smaller than ~16kB
(SUB_XDIM*SUB_YDIM*sizeof(double) < 16384) the program works, when
larger it fails with no error messages.
I tried using the datatype only for the remote data (included in a
comment) and saw the same type of failure. I didn't see any errors
running this test under the latest version of MPICH.
Thanks for your help,
~Jim.
-------------- next part --------------
/* One-Sided MPI 2-D Strided Get Test
*
* Author: James Dinan <dinan at mcs.anl.gov>
* Date : December, 2010
*
* This code performs N strided get operations from a 2d patch of a shared
* array. The array has dimensions [X, Y] and the subarray has dimensions
* [SUB_X, SUB_Y] and begins at index [0, 0]. The input and output buffers are
* specified using an MPI indexed type.
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define XDIM 8
#define YDIM 1024
#define SUB_XDIM 8
#define SUB_YDIM 256
int main(int argc, char **argv) {
int i, j, rank, nranks, peer, bufsize, errors;
double *win_buf, *loc_buf;
MPI_Win buf_win;
MPI_Aint idx_loc[SUB_YDIM];
int idx_rem[SUB_YDIM];
int blk_len[SUB_YDIM];
MPI_Datatype loc_type, rem_type;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nranks);
bufsize = XDIM * YDIM * sizeof(double);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &win_buf);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &loc_buf);
if (rank == 0)
printf("MPI RMA Strided Get Test:\n");
for (i = 0; i < XDIM*YDIM; i++)
*(win_buf + i) = 1.0 + rank;
MPI_Win_create(win_buf, bufsize, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &buf_win);
peer = (rank+1) % nranks;
// Build the datatype
for (i = 0; i < SUB_YDIM; i++) {
MPI_Get_address(&loc_buf[i*XDIM], &idx_loc[i]);
idx_rem[i] = i*XDIM;
blk_len[i] = SUB_XDIM;
}
MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, &loc_type);
MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, &rem_type);
MPI_Type_commit(&loc_type);
MPI_Type_commit(&rem_type);
// Perform get operation
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, peer, 0, buf_win);
MPI_Get(loc_buf, 1, loc_type, peer, 0, 1, rem_type, buf_win);
// Use the datatype only on the remote side (must have SUB_XDIM == XDIM)
// MPI_Get(loc_buf, SUB_XDIM*SUB_YDIM, MPI_DOUBLE, peer, 0, 1, rem_type, buf_win);
MPI_Win_unlock(peer, buf_win);
MPI_Type_free(&loc_type);
MPI_Type_free(&rem_type);
MPI_Barrier(MPI_COMM_WORLD);
// Verify that the results are correct
errors = 0;
for (i = 0; i < SUB_XDIM; i++) {
for (j = 0; j < SUB_YDIM; j++) {
const double actual = *(loc_buf + i + j*XDIM);
const double expected = (1.0 + peer);
if (actual - expected > 1e-10) {
printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
rank, j, i, expected, actual);
errors++;
fflush(stdout);
}
}
}
for (i = SUB_XDIM; i < XDIM; i++) {
for (j = 0; j < SUB_YDIM; j++) {
const double actual = *(loc_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
rank, j, i, expected, actual);
errors++;
fflush(stdout);
}
}
}
for (i = 0; i < XDIM; i++) {
for (j = SUB_YDIM; j < YDIM; j++) {
const double actual = *(loc_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
rank, j, i, expected, actual);
errors++;
fflush(stdout);
}
}
}
MPI_Win_free(&buf_win);
MPI_Free_mem(win_buf);
MPI_Free_mem(loc_buf);
MPI_Finalize();
if (errors == 0) {
printf("%d: Success\n", rank);
return 0;
} else {
printf("%d: Fail\n", rank);
return 1;
}
}
More information about the mvapich-discuss
mailing list