[mvapich-discuss] Bug in Allreduce for user-defined ops

Jack Poulson poulson.jack at gmail.com
Sun Sep 28 01:13:42 EDT 2008


I believe I've run into a bug in the implementation of Allreduce for
user-defined functions in MVAPICH 1.0 and 1.0.1 (0.9.8 works).

In 0.9.8, for power-of-two processes, the user-op is called log2 times
with the correct length. In the new versions, it appears to be called
log2+2 times, where the first call to the user-op passes in a count of
zero (I found this by simply printing it from within the user-op).
I've looked through the intra_Allreduce routine in
src/coll/intra_fns_new.c, but I don't see why the user-op is called
more than log2 times for power-of-two processes.

Should user-defined ops check to ensure the length is nonzero? I've
attached a driver and output that demonstrate the problem. The issue
causes problems in operations such as a custom pivoting operation in
an LU factorization, where an integer is tacked onto the end of a set
of doubles, and a zero length in bytes would cause the routine to
decide negative doubles are being operated on. I've been working
around the problem with a custom Allreduce implementation that uses a
reduce-to-one/bcast, but I would like to take advantage of your team's
multicore optimizations.

Thank you,
Jack Poulson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: user_op.c
Type: text/x-csrc
Size: 1938 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user_op-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: user-op-0.9.8
Type: application/octet-stream
Size: 23200 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-0.9-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: user-op-1.0
Type: application/octet-stream
Size: 2709 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-1-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: user-op-1.0.1
Type: application/octet-stream
Size: 2692 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-1.0-0001.obj


More information about the mvapich-discuss mailing list