[mvapich-discuss] Shmem error
Katherine Holcomb
kah3f at eservices.virginia.edu
Thu Aug 21 17:06:00 EDT 2014
In trying to prepare a new system with OFED we have a test code that
fails under MVAPICH2 1.9 on ALLREDUCE with the following error:
[udc-ba38-4d:mpi_rank_0][mv2_shm_coll_init] shmem open failed for
file:/dev/shm/
slot_shmem-coll-kvs_236134_0-udc-ba38-4d-0-1614.tmp
[cli_2]: [cli_0]: aborting job:
Fatal error in PMPI_Reduce:
Other MPI error, error stack:
create_2level_comm(885): collective shmem allocation failed: No such
file or directory
(one for each rank).
The same code with the same inputs works fine under OpenMPI. It also
works at a different site with MVAPICH2 1.9a2.
I am not even sure where to start to debug this.
--
Katherine Holcomb
UVACSE kholcomb at virginia.edu
112 Albert Small Building (434) 982-5948
University of Virginia Charlottesville, VA 22904
More information about the mvapich-discuss
mailing list