[mvapich-discuss] Noticed two issues in OMB-5.4

Akshay Venkatesh akshay.v.3.14 at gmail.com
Tue Dec 19 18:28:46 EST 2017


Hi,

I noticed a bug in the one-sided benchmarks with the use of
the allocate_memory_one_sided function. OMB 5.3 uses allocate_memory which
among other things does the following:

    if (mem_on_dev) {
         CHECK(allocate_device_buffer(sbuf, *size*));
         set_device_memory(*sbuf, 'a', size);
         CHECK(allocate_device_buffer(rbuf, *size*));

But allocate_memory_one_sided used in 5.4 instead does the following:

    if (mem_on_dev) {*
        CHECK(allocate_device_buffer(sbuf));*
        set_device_memory(*sbuf, 'a', size);*
        CHECK(allocate_device_buffer(rbuf));*
        set_device_memory(*rbuf, 'b', size);

which calls allocate_device_buffer without a buffer size and allocates
max message size each time.

int allocate_device_buffer (char ** buffer)
{
#ifdef _ENABLE_CUDA_
    cudaError_t cuerr = cudaSuccess;
#endif

    switch (options.accel) {
#ifdef _ENABLE_CUDA_
        case CUDA:
            cuerr = cudaMalloc((void **)buffer, *options.max_message_size*);

When value of *size *used in set_memory_device exceeds max_message_size,
the call to cudamemset results in an error that isn't handled by the
benchmark. Hence the next subsequent call to cuda (by the MPI library that
uses CUDA) returns an error. Checking if cuda calls return cudaSuccess
within OMB would be good for catching similar errors but for the fix, it's
probably better to add size argument back to allocate_device_buffer call.

The other issue is a minor one. 5.4 makes of util/ subdirectory for sources
but when I build in a non-root directory like omb-src/build instead of
omb-src, then I get an error saying osu_util.h can't be found. The error
goes away if I build on omb-src. This is because omb-src/Makefile.am has
the following:

SUBDIRS =

if CUDA
    dist_pkglibexec_SCRIPTS = get_local_rank
endif

if MPI
    SUBDIRS += mpi
endif

if OSHM
    SUBDIRS += openshmem
endif

if UPC
    SUBDIRS += upc
endif

if UPCXX
    SUBDIRS += upcxx
endif

EXTRA_DIST = README CHANGES COPYRIGHT

As there isn't a util directory, headers aren't automatically searched
under omb-src/util. It's better to have the build system automatically find
out the paths for all necessary headers.

Please let me know your thoughts on this when possible.

Thanks

-- 
-Akshay
NVIDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20171219/f1052d26/attachment-0001.html>


More information about the mvapich-discuss mailing list