[mvapich-discuss] Noticed two issues in OMB-5.4

Wed Dec 20 09:45:25 EST 2017

Thanks for the information, Ammar. Looking forward to the changes in the
next omb release.

On Dec 20, 2017 4:51 AM, "Ammar Ahmad Awan" <ammar.ahmad.awan at gmail.com>
wrote:

Hi, Akshay.

Thanks for your report. The issue was detected and fixed during the
MVAPICH2-GDR 2.3a release process. In fact, the OMB version included
with
MVAPICH2-GDR 2.3a has the fix. Please double-check and use that version.
The fix will be available in the next release of OMB 5.4.x series in the
near future.

Regards,
Ammar

On Tue, Dec 19, 2017 at 6:28 PM, Akshay Venkatesh <akshay.v.3.14 at gmail.com>
wrote:

> Hi,
>
> I noticed a bug in the one-sided benchmarks with the use of
> the allocate_memory_one_sided function. OMB 5.3 uses allocate_memory which
> among other things does the following:
>
>     if (mem_on_dev) {
>          CHECK(allocate_device_buffer(sbuf, *size*));
>          set_device_memory(*sbuf, 'a', size);
>          CHECK(allocate_device_buffer(rbuf, *size*));
>
> But allocate_memory_one_sided used in 5.4 instead does the following:
>
>     if (mem_on_dev) {*
>         CHECK(allocate_device_buffer(sbuf));*
>         set_device_memory(*sbuf, 'a', size);*
>         CHECK(allocate_device_buffer(rbuf));*
>         set_device_memory(*rbuf, 'b', size);
>
> which calls allocate_device_buffer without a buffer size and allocates max message size each time.
>
> int allocate_device_buffer (char ** buffer)
> {
> #ifdef _ENABLE_CUDA_
>     cudaError_t cuerr = cudaSuccess;
> #endif
>
>     switch (options.accel) {
> #ifdef _ENABLE_CUDA_
>         case CUDA:
>             cuerr = cudaMalloc((void **)buffer, *options.max_message_size*);
>
> When value of *size *used in set_memory_device exceeds max_message_size,
> the call to cudamemset results in an error that isn't handled by the
> benchmark. Hence the next subsequent call to cuda (by the MPI library that
> uses CUDA) returns an error. Checking if cuda calls return cudaSuccess
> within OMB would be good for catching similar errors but for the fix, it's
> probably better to add size argument back to allocate_device_buffer call.
>
> The other issue is a minor one. 5.4 makes of util/ subdirectory for
> sources but when I build in a non-root directory like omb-src/build instead
> of omb-src, then I get an error saying osu_util.h can't be found. The error
> goes away if I build on omb-src. This is because omb-src/Makefile.am has
> the following:
>
> SUBDIRS =
>
> if CUDA
>     dist_pkglibexec_SCRIPTS = get_local_rank
> endif
>
> if MPI
>     SUBDIRS += mpi
> endif
>
> if OSHM
>     SUBDIRS += openshmem
> endif
>
> if UPC
>     SUBDIRS += upc
> endif
>
> if UPCXX
>     SUBDIRS += upcxx
> endif
>
> EXTRA_DIST = README CHANGES COPYRIGHT
>
> As there isn't a util directory, headers aren't automatically searched
> under omb-src/util. It's better to have the build system automatically find
> out the paths for all necessary headers.
>
> Please let me know your thoughts on this when possible.
>
> Thanks
>
> --
> -Akshay
> NVIDIA
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20171220/42fb422f/attachment-0001.html>