[mvapich-discuss] How to keep gid status
Satoshi Isono
isono at cray.com
Tue Jun 23 03:59:52 EDT 2009
Dear Prof. Panda, Dr. Barth,
Thanks for your advices. I have edited mpirun_rsh.c directly in order to
keep the order of options on mpirun_rsh command. I show you differential
lines as below. My MVAPICH2 version is mvapich2-1.2p1.
$ diff mpirun_rsh.c mpirun_rsh.c.org
67d66
< #include <grp.h>
260d258
< struct group *grpptr;
275,276d272
< int sg_index;
<
279d274
< grpptr = getgrgid(getgid());
417,456d411
< //isono
< for (i = aout_index; i < argc; i++) {
< if (strchr(argv[i], '=') == NULL) {
< sg_index = i;
< break;
< }
< }
< fprintf(stdout, "\n# INPUT PARAMETERS\n");
< fprintf(stdout, "%15s = %d\n", "argc", argc);
< fprintf(stdout, "%15s = %d\n", "option_index", option_index);
< fprintf(stdout, "%15s = %d\n", "aout_index", aout_index);
< fprintf(stdout, "%15s = %d\n", "sg_index", sg_index);
< for (i = 0; i < argc; i++) {
< fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]);
< }
< char add_argv[argc+2][31];
< for (i = 0; i < argc; i++) {
< strcpy(add_argv[i], argv[i]);
< }
< for (i = 0; i < argc+2; i++) {
< if (i < sg_index) {
< argv[i]=add_argv[i];
< } else if (i == sg_index) {
< strcpy(argv[i], "/usr/bin/sg");
< i++;
< argv[i] = grpptr->gr_name;
< } else {
< argv[i]=add_argv[i-2];
< }
< }
< argc = argc + 2;
< fprintf(stdout, "\n# RUNNING PARAMETERS\n");
< fprintf(stdout, "%15s = %d\n", "argc", argc);
< fprintf(stdout, "%15s = %d\n", "option_index", option_index);
< fprintf(stdout, "%15s = %d\n", "aout_index", aout_index);
< fprintf(stdout, "%15s = %d\n", "sg_index", sg_index);
< for (i = 0; i < argc; i++) {
< fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]);
< }
<
And then, this is test result using new mpirun_rsh.
$ mpirun_rsh -np 4 -hostfile hostfile MV2_NUM_HCAS=2
MV2_SM_SCHEDULING=ROUND_ROBIN ./gid-mv2-itl
# INPUT PARAMETERS
argc = 8
option_index = 3
aout_index = 5
sg_index = 7
argv[ 0] = mpirun_rsh
argv[ 1] = -np
argv[ 2] = 4
argv[ 3] = -hostfile
argv[ 4] = hostfile
argv[ 5] = MV2_NUM_HCAS=2
argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN
argv[ 7] = ./gid-mv2-itl
# RUNNING PARAMETERS
argc = 10
option_index = 3
aout_index = 5
sg_index = 7
argv[ 0] = mpirun_rsh
argv[ 1] = -np
argv[ 2] = 4
argv[ 3] = -hostfile
argv[ 4] = hostfile
argv[ 5] = MV2_NUM_HCAS=2
argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN
argv[ 7] = /usr/bin/sg
argv[ 8] = GAUSSIAN
argv[ 9] = ./gid-mv2-itl
However, there is a problem. When we embed /usr/bin/sg command in the
line of mpirun_rsh, how can we deal with an input file? In case of using
your wrapper script, it also occurs.
I show you a simple example with test code.
1) mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data
And the following is gid3 code.
$ cat gid3.c
#include <stdio.h>
#include <mpi.h>
#include <string.h>
#define MAX_DATA_SIZE 1000000
double a[MAX_DATA_SIZE];
int main(int argc,char *argv[])
{
int rank,size,namelen;
char name[MPI_MAX_PROCESSOR_NAME],comm[512];
int i,ret,dsize;
char str[80];
FILE *fp;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Get_processor_name(name,&namelen);
//printf("%4d/%-d: %s\n",rank,size,name);
fp=fopen(argv[1],"r");
for(i=0;i<MAX_DATA_SIZE;i++){
if(fgets(str,80,fp)==NULL) break;
ret=sscanf(str,"%lf",&a[i]);
fprintf(stdout,"%s_%d: data = %lf\n",name,rank,a[i]);
}
fclose(fp);
dsize=i;
fprintf(stdout,"n = %d\n",i);
sprintf(comm,"touch testfile_%s_%d",name,rank);
system(comm);
MPI_Finalize();
return 0;
}
When I try to run this code using new mpirun_rsh, I cannot run with
errors as below.
$ mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data
# INPUT PARAMETERS
argc = 7
option_index = 3
aout_index = 5
sg_index = 5
argv[ 0] = mpirun_rsh
argv[ 1] = -np
argv[ 2] = 4
argv[ 3] = -hostfile
argv[ 4] = hostfile
argv[ 5] = ./gid3
argv[ 6] = ./data
# RUNNING PARAMETERS
argc = 9
option_index = 3
aout_index = 5
sg_index = 5
argv[ 0] = mpirun_rsh
argv[ 1] = -np
argv[ 2] = 4
argv[ 3] = -hostfile
argv[ 4] = hostfile
argv[ 5] = /usr/bin/sg
argv[ 6] = GAUSSIAN
argv[ 7] = ./gid3
argv[ 8] = ./data
MPI process terminated unexpectedly
Exit code -5 signaled from com-0643
cleanupKilling remote processes...DONE
Signal 15 received.
In additional information, I was able to run it using other ways showing
below. Both (2) and (3) need the re-editing for users source code.
2) mpirun_rsh -np 4 -hostfile hostfile ./gid4 < ./data
3) mpirun_rsh -np 4 -hostfile hostfile INPUT_FILENAME=data ./gid5
I used the way (2) to specify "stdin" for input file. About (3), source
gid5.c includes getenv("INPUT_FILENAME") function and I exported this
environment variable on option line of mpirun_rsh.
Sorry for my long explanation. My question is how do we handle the case
of (1). Do you have any ideas to solve it? I think it is NOT good to
modify each users code.
Please let me know some advices.
Best regards,
Satoshi Isono
-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu]
Sent: Wednesday, June 17, 2009 4:35 AM
To: Satoshi Isono
Cc: mvapich-discuss at cse.ohio-state.edu; Bill Barth; Dhabaleswar Panda
Subject: Re: [mvapich-discuss] How to keep gid status
Hi,
Thanks for your note. I shared your question with Dr. Bill Barth from
TACC. Folks from TACC have been using MVAPICH with mpirun_rsh in their
production environment on Ranger for quite some time. I am including his
reply below. I hope his suggested approach will work for you. Let us
know.
I am cc'ing Dr. Barth on this e-mail also. If there are any additional
questions, two of you might exchange additional information on this
issue.
Thanks,
DK
====================================================================
As you may recall, we have wrapper scripts that we use on Ranger and
Lonestar to hide the details of the mpirun_rsh command line from the
users. We call it 'ibrun'. It interacts with the scheduler (through the
environment) to generate the host list and establish the number of tasks
to start. I don't see why it would be hard to add a call to /usr/bin/sg
in
there.
If the user would have invoked
mpirun_rsh -np 5 -hostfile hosts ./foo
he simply runs
ibrun ./foo
on Ranger or Lonestar. 'ibrun' is basically structured as:
#!/bin/bash
....find NP from the envrionment....
....find host list....
$MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE "$@"
So it just takes the command line args of ibrun and passes them directly
to mpirun_rsh
There's no reason it couldn't do
#!/bin/bash
....find NP....
....find host list....
GROUP_ID=`id -gn`
$MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE /usr/bin/sg
$GROUP_ID
"$@"
It should be this straightforward.
Bill.
======
On Sun, 14 Jun 2009, Satoshi Isono wrote:
> Dear all,
>
> I would like to know how to keep gid status when launching MPI
> processes. We know that, with sg command in mpirun_rsh command line,
it
> is successful in this case. Can you please advise me. I show a example
> as below.
>
> Most of users belong multiple group. And accounting system is managed
> based on a group ID (GID). So, all files created from each user must
be
> owned with appropriate group owner information.
>
> A problem here is that the state of GID not saved. I would show you a
> example. Could you read it, according to numbers.
>
> 1) User logins into a login node.
>
> $ id
> uid=1002(craysp) gid=1002(cray)
> groups=10(wheel),1002(cray),8001(GAUSSIAN)
>
> This is showing default gid is 1002(cray). This "cray" is primary
group
> ID.
>
> 2) User changes arbitrary group with newgrp command.
>
> $ newgrp GAUSSIAN
> $ id
> uid=1002(craysp) gid=8001(GAUSSIAN)
> groups=10(wheel),1002(cray),8001(GAUSSIAN)
>
> This case is that a user wants to change another group like
"GAUSSIAN".
> Certainly, I make sure it was changed to GAUSSIAN from cray.
>
> 3) User runs a MPI job with mpirun_rsh
>
> This is the simple MPI code which generates a output file.
>
> $ cat gid.c
> #include <stdio.h>
> #include <mpi.h>
> #include <string.h>
>
> int main(int argc,char *argv[])
> {
> int rank,size,namelen;
> char name[MPI_MAX_PROCESSOR_NAME],comm[512];
>
> MPI_Init(&argc,&argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Get_processor_name(name,&namelen);
>
> sprintf(comm,"touch testfile_%s_%d",name,rank);
> system(comm);
>
> MPI_Finalize();
> return 0;
> }
>
> After running this code, I want that a output file was owned by
> "GAUSSIAN" group. But it was different from that I want. Below is a
run
> script including mpirun_rsh.
>
> $ cat run_i.sh
> #!/bin/bash
> . /opt/Modules/init/bash
> module load pgi mvapich2/pgi
> mpirun_rsh -np 1 com-0644 ./gid-mv2
>
> 4) User confirms that a created file doesn't owned appropriate group
ID.
>
> $ ls -l testfile_com-0644_0
> -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0
>
> You can confirm that this file is owned "cray" not "GAUSSIAN". This
> problem is caused on mpirun_rsh command or SSH server configuration, I
> think.
>
> 5) The way to solve it.
>
> I am considering that better way is inserting "sg" command just before
> a.out in mpirun_rsh command line. I would show you a example.
>
> $ grep mpirun_rsh run_i.sh
> mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2
>
> By specifying sg command just before a.out, It works well.
>
> $ ls -l testfile_com-0644_0
> -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0
>
> 6) Request to you
>
> I thought that the wrapper script of mpirun_rsh would be created at
> first. But it is difficult to specify executable file location on
> command lines. There are various patterns that user describes in
> mpirun_rsh line. For example:
>
> mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a
> Outputfile
> mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log
> mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0
> MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out
> ...
>
> And we can take a look on line 1607.
>
> 1607 /* add the arguments */
> 1608 for (i = aout_index + 1; i < argc; i++) {
> 1609 strcat(command_name, " ");
> 1610 strcat(command_name, argv[i]);
> 1611 }
>
> An example of edit:
>
> 1607 /* add the arguments */
> 1608 strcat(command_name, " /usr/bin/sg $(id -gn)");
> 1609 for (i = aout_index + 1; i < argc; i++) {
> 1610 strcat(command_name, " ");
> 1611 strcat(command_name, argv[i]);
> 1612 }
>
> I have edited showing above and done recompile it, but it doesn't
apply.
> If you know other way which is able to solve this problem, can you
> please tell me?
>
> Best regards,
> Satoshi Isono
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list