[mvapich-discuss] Shmem error

Katherine Holcomb kah3f at eservices.virginia.edu
Thu Aug 21 17:06:00 EDT 2014


In trying to prepare a new system with OFED we have a test code that 
fails under MVAPICH2 1.9 on ALLREDUCE with the following error:

[udc-ba38-4d:mpi_rank_0][mv2_shm_coll_init] shmem open failed for 
file:/dev/shm/
slot_shmem-coll-kvs_236134_0-udc-ba38-4d-0-1614.tmp

[cli_2]: [cli_0]: aborting job:
Fatal error in PMPI_Reduce:
Other MPI error, error stack:
create_2level_comm(885): collective shmem allocation failed: No such 
file or directory

(one for each rank).

The same code with the same inputs works fine under OpenMPI.  It also 
works at a different site with MVAPICH2 1.9a2.

I am not even sure where to start to debug this.

-- 
Katherine Holcomb
UVACSE                       kholcomb at virginia.edu
112 Albert Small Building    (434) 982-5948
University of Virginia       Charlottesville, VA 22904



More information about the mvapich-discuss mailing list