[mvapich-discuss] Questions on running mvapich on a cluster

Marcos Verissimo Alves Marcos.Verissimo at uclouvain.be
Mon Dec 3 00:56:50 EST 2007


Hi all,

I am new to the list and maybe the question I have is already answered
somewhere else. Since, however, I could not find it in the archives, I'll
make to the more knowledgeable people in the list, because I am not the
system administrator.

We have a cluster with infiniband interconnection, and I have managed to
successfully compile mvapich2-0.9.8 (quite easy!). Also, I have compiled
the ab initio calculation program SIESTA and have managed to make it run
in parallel correctly. However, I am sure that there must be a more
intelligent (and correct) way of executing a program in parallel. So here
are my questions:

1) Supposing that mvapich were adequately installed by our sysadmin, what
exactly is the command to start the mpi daemon on the machines? Currently
my script uses the following command:

/usr/bin/rsh -n $machine "/home/pcpm/mverissi/my_mvapich_0.9.8/bin/mpdboot
-n 8 -f /home/pcpm/mverissi/mfile.$idm --ncpus=2
--mpd=/home/pcpm/mverissi/my_mvapich_0.9.8/bin/mpd --rsh=rsh"

where mfile.$idm is a file containing the machines onto which the mpd will
be started. Thinking about it now, I guess it would be enough to issue the
mpdboot command without the /usr/bin/rsh -n $machine  part. Is that
correct?

2) If the mpi daemon is started by the sysadmin on the nodes, probably
he'll start it as root. If he does so, will we "mortal" users be able to
run their processes? In other words, can one run a calculation using
mvapich even if the owner of the mpi process is root?

The reason I ask is because of the following. As I guess is customary in
many clusters for HPC, we have a home in which we keep our files,
available through NFS, and the calculations are executed in such a way
that the (huge) files that contain the data are written on local disks
with a faster access, then copied to the user's home after the calculation
ends.

When I run the calculations, the queue system creates a directory, on each
local disk on the slave nodes, named like /tmp/all.q.98764 . However, the
mpd console files are created, generally, in /tmp/ . I see that my console
files have the name mpd2.console_mverissi (my username in the cluster),
but I do not know if this is because I started the mpi daemon myself. So,
I am not sure if the mpd2.console_xxxxxx files will be created as
mpd2.console_root (if root starts the mpd) or if the console files will
exist as mpd2.console_xxxxxx if root starts the mpd but user xxxxxx starts
the mpi calculation. Hope I am not being extremely confusing here...

3) The last question concerns the .mpd.conf and .mpdpasswd files. To make
the calculation run, the only way I could think of was to copy those files
to the temporary directory that I mentioned in question 2. However, it
would be nice to have mvapich finding those files in the users' home dirs
(even if the calculation is being run in a scratch disk) instead of the
users having to copy them to temporary directories. Is there a way of
doing this?

Sorry if the questions are too basic and even worse, if they have been
asked (and answered before). It's just that our sysadmin has been quite
busy lately (as is usual with all sysadmins :D ) and I'd like to get these
information to pass it to him. By the way, if those information could be
included in the user's guide, it would be extremely useful.

Thanks in advance,

Marcos

-- 
Dr. Marcos Verissimo Alves
Post-Doctoral Fellow
Unité de Physico-Chimie et de Physique des Matériaux (PCPM)
Université Catholique de Louvain
1 Place Croix du Sud, B-1348
Louvain-la-Neuve
Belgique

------

Gort, Klaatu barada nikto. Klaatu barada nikto. Klaatu barada nikto.






More information about the mvapich-discuss mailing list