[mvapich-discuss] smpi_init: error in opening shared memory file

Shaun Rowland rowland at cse.ohio-state.edu
Thu Sep 13 17:27:24 EDT 2007


Troy Telford wrote:
> I'm running into an issue with MVAPICH when using the OFED IB stack.  We've 
> been able to reproduce it on MVAPICH 0.9.7, 0.9.8, and 0.9.9.  (Linux x86_64)
> 
> When attempting to start an app as a non-root user, we get the following 
> error:
>  open: Permission denied    [2:n03] Abort: [2] smpi_init:error in opening 
> shared memory file </tmp/ib_shmem-15826-n03-502.tmp>: 29   
>   at line 817 in file mpid_smpi.c
> 
> (Repeated with obvious variation for each node in the run).
> 
> We can run the program as root without any issues.
> 
> naturally, /tmp is writable by the user, and there is no ib_shmem* files in 
> place; so I can't figure out why I'm getting a permission problem.  Any 
> ideas?

The line it fails on should be doing this open() call before:

     smpi.fd =
         open (shmem_file, O_RDWR | O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO);

That is from MVAPICH 0.9.9. Can you check to make sure that /tmp is mode
1777 on all machines (or at least the ones you see an error for, but I
am thinking all nodes?). Can you check to make sure /tmp is not totally
full? Can you possibly try some simple C code that does the same type of
open, such as (just written very quickly):

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int
main(void)
{
         int fd;

         if ((fd = open ("/tmp/test.shmem-file", O_RDWR | O_CREAT, 
S_IRWXU | S_IRWXG | S_IRWXO)) < 0) {
                 perror("open error");
                 exit(EXIT_FAILURE);
         }

         printf("Opened /tmp/test.shmem-file.\n");

         if (unlink("/tmp/test.shmem-file") < 0) {
                 perror("unlink error");
                 exit(EXIT_FAILURE);
         }

         printf("Unlinked /tmp/test.shmem-file.\n");
         exit(EXIT_SUCCESS);
}


It might even be good to ssh the binary from that code to another
machine to make sure it's working when ssh-ed, though I can't think of a
reason this would be a problem. If that code does not work, then it has
to be some kind of permissions issue. Even though it seems /tmp is fine
here. If it does fail, you can run it with strace to easily see what's
going on.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


More information about the mvapich-discuss mailing list