[mvapich-discuss] MPI and posix shared memory.

Ben Benjamin.M.Auer at nasa.gov
Fri Aug 30 15:02:22 EDT 2013


We have a code that makes use of posix shared memory on each node to 
help with the memory footprint. As part of the larger shared memory 
package in the code we have been trying to add a set of node broadcast 
routines that broadcast a piece of data in shared memory on one node to 
the shared memory in the other nodes. This code has not been working and 
we have traced it to the actual call to MPI_BCast. We also tried just 
doing sends and recieves with no luck as well. It seems as though the 
first time the broadcast is called the routine functions properly but on 
subsequent calls it fails. The mpi_status itself returns without error 
but the results of the broadcast are just plain wrong.

If before calling the MPI_Bcast we allocate a local, non-shared memory 
variable of the same size of the data to be broadcast on each process in 
the communicator, copy from the shared memory to the local memory. Then 
MPI_Bcast with the local copy and finally copy the from the local back 
to the shared memory the routine functions properly. It seems as though 
the broadcast itself just does not function properly when then data is 
posix shared memory.  I tried setting MV2_USE_SHARED_MEM=0 to turn off 
the shared memory routines in mvapich itself which did not fix the bcasts.

Are there just issues with trying to do mpi communications with shared 
memory data? Is it possible this is a bug? We are using mvapich 1.9. If 
this is a possible bug I can try to come up with a reproducer.

-- 
Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
NASA GSFC,  Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
Phone: 301-286-9176               Fax: 301-614-6246



More information about the mvapich-discuss mailing list