[mvapich-discuss] MPI and posix shared memory.
Ben
Benjamin.M.Auer at nasa.gov
Fri Aug 30 15:02:22 EDT 2013
We have a code that makes use of posix shared memory on each node to
help with the memory footprint. As part of the larger shared memory
package in the code we have been trying to add a set of node broadcast
routines that broadcast a piece of data in shared memory on one node to
the shared memory in the other nodes. This code has not been working and
we have traced it to the actual call to MPI_BCast. We also tried just
doing sends and recieves with no luck as well. It seems as though the
first time the broadcast is called the routine functions properly but on
subsequent calls it fails. The mpi_status itself returns without error
but the results of the broadcast are just plain wrong.
If before calling the MPI_Bcast we allocate a local, non-shared memory
variable of the same size of the data to be broadcast on each process in
the communicator, copy from the shared memory to the local memory. Then
MPI_Bcast with the local copy and finally copy the from the local back
to the shared memory the routine functions properly. It seems as though
the broadcast itself just does not function properly when then data is
posix shared memory. I tried setting MV2_USE_SHARED_MEM=0 to turn off
the shared memory routines in mvapich itself which did not fix the bcasts.
Are there just issues with trying to do mpi communications with shared
memory data? Is it possible this is a bug? We are using mvapich 1.9. If
this is a possible bug I can try to come up with a reproducer.
--
Ben Auer, PhD SSAI, Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-286-9176 Fax: 301-614-6246
More information about the mvapich-discuss
mailing list