[mvapich-discuss] FW: programming problems

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Dec 20 09:24:37 EST 2010


Hi,

The use of MPD has been discouraged for a long time for both MPICH2 and
MVAPICH2 because it is not scalable. The support has been deprecated for
many interfaces.

Please use mpirun_rsh or mpiexec.hydra instead.

Section 5.2.1 of MVAPICH2 user guide indicates how to run applications
using mpirun_rsh.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6rc1.html#x1-200005.2.1

Section 5.2.2 of MVAPICH2 user guide indicates how to run applications
using mpiexec.hydra.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6rc1.html#x1-210005.2.2

Let us know if you experience any issues when using mpirun_rsh or
mpiexec.hydra.

Thanks,

DK

On Mon, 20 Dec 2010, Bainbridge, Brian wrote:

> Hi There,
>
> I'm attempting to get some code running on MVAPICH2 on our cluster but initially I'm trying to get mpd to run properly
>
> As far as I can tell I've got it running ok. mpdtrace produces:
>
> [bba at node001 sge_test]$ mpdtrace
> node001
> node005
> node012
> node053
> ...
> node025
> node023
> node024
> node022
>
> Showing all 75 nodes as being part of the mpd ring.
>
> If I run an intrinsic program I get sensible results:
>
> [bba at node001 sge_test]$ mpiexec -n 16 /bin/hostname
> node048
> node001
> node047
> node045
> node042
> node053
> node011
> node052
> node050
> node051
> node049
> node044
> node013
> node005
> node012
> node046
>
> However, if I try to run an MPI program I get the following:
>
> [bba at node001 sge_test]$ mpiexec -n 16 ./hello_f90
>  Hello, world, I am             0  of            16  on
>  node001
>  Hello, world, I am             1  of            16  on  Hello, world, I am             2  of            16  on
>  Hello, world, I am             5  of            16  on
>  node001
>  Hello, world, I am             3  of            16  on
>  node001
>  Hello, world, I am             4  of            16  on
>
>  node001
>  node001
>  Hello, world, I am             6  of            16  on
>  node001
>  Hello, world, I am             7  of            16  on  Hello, world, I am             8  of            16  on
>  node001
>  Hello, world, I am            15  of            16  on
>  node001
>
>  node001
>  Hello, world, I am             9  of            16  on
>  node001
>  node001
>  Hello, world, I am            10  of            16  on
>  node001
>  Hello, world, I am            11  of            16  on
>  node001
>  Hello, world, I am            12  of            16  on
>  node001
>  Hello, world, I am            13  of            16  on
>  node001
>  Hello, world, I am            14  of            16  on
>  node001
>
> Which is more or less what I'm after, except all the programs are running on one node instead of being spread over multiple nodes.
>
> I don't understand why the MPI code isn't running on different processes in the same way as /bin/hostname. The code is just
> reading the environment variable HOSTNAME and printing it out. If I start running the code from a different node it prints that
> node out instead:
>
> [bba at node002 sge_test]$ mpiexec -n 16 ./hello_f90
>  Hello, world, I am             0  of            16  on  Hello, world, I am             2  of            16  on
>  node002
>  node002
> ...
>
> I've waded through the documentation I could find but it hasn't helped. As far as the documentation goes if it works with
> /bin/hostname it should work with MPI.
>
> I'd be grateful for any help on this.
>
> Brian
>
> --
> This message (and any attachments) is for the recipient only NERC
> is subject to the Freedom of Information Act 2000 and the contents
> of this email and any reply you make may be disclosed by NERC unless
> it is exempt from release under the Act. Any material supplied to
> NERC may be stored in an electronic records management system.



More information about the mvapich-discuss mailing list