[mvapich-discuss] MVAPICH 2 Progress Code improvement for
RDMA_FAST_PATH
Sylvain Jeaugey
sylvain.jeaugey at bull.net
Fri Mar 23 10:27:04 EDT 2007
Hi all,
[ADAPTIVE_]RDMA_FAST_PATH is an optimization to provide low latency on
mvapich2. The issue is, latency increases as the number of total processes
grows. Finally, when you launch a job with over 32 processes, latency is
worse than the standard send/recv protocol.
The reason for that is very simple. Contrary to the send/recv protocol
which gets its receives in a single completion queue, the RDMA fast path
has to poll _every_ RDMA queue to find out from which queue to receive
data.
My first try to improve that was to poll only on the VCs associated to
requests passed to MPID_Progress. That didn't work well because
unfortunately, well-written MPI applications are scarce, and calling
MPI_Wait on the wrong request resulted in a deadlock.
My second try is a lot better. The RDMA polling set is now restrained to :
* VCs on which we have waiting posted receives;
* VCs on which we have a rendez-vous send in progress.
.. and it seems to work fine and quickly, since polling is quite always
directed to the right VC.
Has anyone already a good (better) solution for that ? Am I totally
mistaken in my understanding of the MVAPICH 2 code ? If I'm not, I will
consider cleaning things and proposing a patch against 0.9.8, unless I
should wait until 0.9.9 ?
Thanks in advance for your opinions/comments/flames on that,
Sylvain
More information about the mvapich-discuss
mailing list