[mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2

LEI CHAI chai.15 at osu.edu
Fri Dec 21 18:29:37 EST 2007


Hi Eric,

Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests.

Thanks,
Lei


----- Original Message -----
From: "Eric A. Borisch" <eborisch at ieee.org>
Date: Friday, December 21, 2007 10:23 am
Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2

> I seem to be running into a memory registration issue.
> 
> Observations:
> 
> 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) 
> into a
> local buffer on the root rank, I receive all of the data from any
> ranks that are running on the same machine, but only part (or none at
> all) of the data from ranks running on external machines. The transfer
> length is above the eager/rendezvous threshold.
> 2) Once the problem occurs, it is persistent. However, if I force
> MVAPICH to re-register by calling "while(dreg_evict())" at this point
> and then re-transfer, the correct data is received. (Same memory being
> transferred from / to.)
> 3) I've only witnessed problems occurring above the 4G (as 
> returned by
> malloc()) memory range.
> 4) When I receive partial data from ranks, the data ends on a (4k)
> page bound. Data past this bound (which should have been updated) is
> unchanged during the transfer, yet both the sender and receiver report
> no errors. (This is very bad!)
> 5) Stepping through the code on both ends of the transfer shows the
> software agreeing on the (correct) length and location as far down as
> I can follow it.
> 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows
> no issues. (Other than the expected performance hit.)
> 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2-
> 1.0 (vapi)
> 8) The user code is also sending data out (from a different buffer)
> over ethernet to a remote gui from the root node.
> 
> I can't move to gen2 at this point -- we are using a vendor library
> for interfacing to another system, and this library uses VAPI.
> 
> uname -a output:
> Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST
> 2006 x86_64 x86_64 x86_64 GNU/Linux
> 
> Intel SE7520JR2 motherboards. 4G physical ram on each node.
> 
> It appears (perhaps this is obvious) that the assumption that memory
> registered (by the dreg.c code) remains registered until explicitly
> unregistered (again, by the dreg.c code) is being violated in some
> way. This, however, is wading in to uncharted (for me, at least) linux
> memory management waters. The user code is doing nothing to fiddle
> with registration in any explicit way. (With the exception of as
> mentioned in (2))
> 
> Please let me know what other information I can provide to resolve
> this. I'm still trying to put together a small test program to cause
> the problem, but have been unsuccessful so far.
> 
> Thanks,
> Eric
> -- 
> Eric A. Borisch
> eborisch at ieee.org
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 



More information about the mvapich-discuss mailing list