[mvapich-discuss] smpi_init: error in opening shared memory file
Shaun Rowland
rowland at cse.ohio-state.edu
Thu Sep 13 17:27:24 EDT 2007
Troy Telford wrote:
> I'm running into an issue with MVAPICH when using the OFED IB stack. We've
> been able to reproduce it on MVAPICH 0.9.7, 0.9.8, and 0.9.9. (Linux x86_64)
>
> When attempting to start an app as a non-root user, we get the following
> error:
> open: Permission denied [2:n03] Abort: [2] smpi_init:error in opening
> shared memory file </tmp/ib_shmem-15826-n03-502.tmp>: 29
> at line 817 in file mpid_smpi.c
>
> (Repeated with obvious variation for each node in the run).
>
> We can run the program as root without any issues.
>
> naturally, /tmp is writable by the user, and there is no ib_shmem* files in
> place; so I can't figure out why I'm getting a permission problem. Any
> ideas?
The line it fails on should be doing this open() call before:
smpi.fd =
open (shmem_file, O_RDWR | O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO);
That is from MVAPICH 0.9.9. Can you check to make sure that /tmp is mode
1777 on all machines (or at least the ones you see an error for, but I
am thinking all nodes?). Can you check to make sure /tmp is not totally
full? Can you possibly try some simple C code that does the same type of
open, such as (just written very quickly):
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int
main(void)
{
int fd;
if ((fd = open ("/tmp/test.shmem-file", O_RDWR | O_CREAT,
S_IRWXU | S_IRWXG | S_IRWXO)) < 0) {
perror("open error");
exit(EXIT_FAILURE);
}
printf("Opened /tmp/test.shmem-file.\n");
if (unlink("/tmp/test.shmem-file") < 0) {
perror("unlink error");
exit(EXIT_FAILURE);
}
printf("Unlinked /tmp/test.shmem-file.\n");
exit(EXIT_SUCCESS);
}
It might even be good to ssh the binary from that code to another
machine to make sure it's working when ssh-ed, though I can't think of a
reason this would be a problem. If that code does not work, then it has
to be some kind of permissions issue. Even though it seems /tmp is fine
here. If it does fail, you can run it with strace to easily see what's
going on.
--
Shaun Rowland rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/
More information about the mvapich-discuss
mailing list