[mvapich-discuss] FW: programming problems

Bainbridge, Brian bba at bgs.ac.uk
Mon Dec 20 07:24:11 EST 2010


Hi There,

I'm attempting to get some code running on MVAPICH2 on our cluster but initially I'm trying to get mpd to run properly

As far as I can tell I've got it running ok. mpdtrace produces:

[bba at node001 sge_test]$ mpdtrace
node001
node005
node012
node053
...
node025
node023
node024
node022

Showing all 75 nodes as being part of the mpd ring.

If I run an intrinsic program I get sensible results:

[bba at node001 sge_test]$ mpiexec -n 16 /bin/hostname
node048
node001
node047
node045
node042
node053
node011
node052
node050
node051
node049
node044
node013
node005
node012
node046

However, if I try to run an MPI program I get the following:

[bba at node001 sge_test]$ mpiexec -n 16 ./hello_f90
 Hello, world, I am             0  of            16  on
 node001
 Hello, world, I am             1  of            16  on  Hello, world, I am             2  of            16  on
 Hello, world, I am             5  of            16  on
 node001
 Hello, world, I am             3  of            16  on
 node001
 Hello, world, I am             4  of            16  on

 node001
 node001
 Hello, world, I am             6  of            16  on
 node001
 Hello, world, I am             7  of            16  on  Hello, world, I am             8  of            16  on
 node001
 Hello, world, I am            15  of            16  on
 node001

 node001
 Hello, world, I am             9  of            16  on
 node001
 node001
 Hello, world, I am            10  of            16  on
 node001
 Hello, world, I am            11  of            16  on
 node001
 Hello, world, I am            12  of            16  on
 node001
 Hello, world, I am            13  of            16  on
 node001
 Hello, world, I am            14  of            16  on
 node001

Which is more or less what I'm after, except all the programs are running on one node instead of being spread over multiple nodes.

I don't understand why the MPI code isn't running on different processes in the same way as /bin/hostname. The code is just
reading the environment variable HOSTNAME and printing it out. If I start running the code from a different node it prints that
node out instead:

[bba at node002 sge_test]$ mpiexec -n 16 ./hello_f90
 Hello, world, I am             0  of            16  on  Hello, world, I am             2  of            16  on
 node002
 node002
...

I've waded through the documentation I could find but it hasn't helped. As far as the documentation goes if it works with
/bin/hostname it should work with MPI.

I'd be grateful for any help on this.

Brian

-- 
This message (and any attachments) is for the recipient only NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20101220/6f116a7e/attachment.html


More information about the mvapich-discuss mailing list