[mvapich-discuss] FW: MVAPICH2 does not run...

Choudhury, Durga Durga.Choudhury at drs-ss.com
Tue Jan 31 13:59:02 EST 2006


Hi everyone

The following is an email I sent to the mvapich-help at cse.ohio-state.edu
mailing list yesterday and Professor Panda's reply to the same.
Unfortunately I am still not having any luck in getting this to run on
our cluster. If anyone has any suggestions (in particular, in getting
this to run on a MIPS64-Linux environment), I would greatly appreciate
it.

Commenting on Professor Panda's suggestions:

1. We have (and currently do) run Argonne's MPICH2 on our cluster and it
works fine. The reason we want to go with the OSU software is that we
want to take advantage of the uDAPL interface instead of the TCP/IP that
we presently use in Argonne's software. However, my original concern is
that the OSU software does not even work on Ethernet based TCP/IP
whereas Argonne's MPICH2 seems to work fine.

2. About possible network issues with name resolutions:

Like I said in the original email, the OSU software works fine on two
Linux PCs connected back-to-back via Ethernet (i.e. they are not on any
network). So I find it strange that our MIPS64 cluster (which ARE
connected to outside network via front panel Ethernet) would complain
about name resolution issues.

On the same token, could anybody please update me as to why MPI requires
name resolution anyway (as opposed to working based purely on IP
addresses)? Is it a security issue where clusters are connected over
public internet and an attacker would spoof IP addresses (and a secured
DNS server would protect against it)? If so, our clusters are all
bundled in one VME chassis and we don't care about this security alert.
In that case, I could change the code to not do name lookup and that may
it might gain some performance advantage as well.

I am an MPI novice so sorry for any stupid questions. Your
help/suggestions are deeply appreciated.

Best regards
Durga

-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
Sent: Tuesday, January 31, 2006 8:35 AM
To: Choudhury, Durga
Cc: mvapich-help at cse.ohio-state.edu
Subject: Re: MVAPICH2 does not run...

> Hi there
> 
> =20
> 
> I recently (in fact, today) downloaded the MVAPICH2 source code and
was
> trying to run it on a MIPS64-Linux architecture. I had some issues it
> trying to get it to compile (it did not recognize the platform), but
> finally I have managed to compile it. However, the daemon "mpd" does
not
> run; it dies on "gethostname_ex", which seems to be some kind of a
> python routine (and I have no knowledge of python.)

Thanks for downloading and trying MVAPICH2 on MIPS64-Linux
architecture.  FYI, we have not tested MVAPICH2 on MIPS64 architecture
since we do not have access to any such systems.

However, I believe several people in the community are running MVAPICH
and MVAPICH2 on MIPS64 platforms. You may post a note to
mvapich-discuss about this.

> Do I need to do something special for it to run? Before I tried it on
> the "real" (i.e. MIPS64) platform, I tried it on a cluster of 2 PCs
> connected via Ethernet. The software compiled and ran flawlessly.

No. 

I believe the PCs are typical Pentiums and Xeons. Thus, there should not

be a problem on the PCs. The problems seems to be related to the MIPS64 
architecture.

FYI, MVAPICH2 is based on MPICH2 from Argonne. The MPD start-up code
is also based on MPICH2's MPD code. Can you see whether you can run
MPICH2 on MIPS64. If MPICH2 runs, MVAPICH2 should be running without 
any problem.

For MPICH2 related problems, you may also consider posting to
mpich-discuss list by Argonne.

> My two Linux blades have local hostname (i.e. given by the "hostname"
> command) "moo" and "goo" respectively. They do not have a fully
> qualified hostname. To debug the issue a little further, I opened a
> python session on one of the boards and the session outputs were
similar
> to the following:

This problem seems to be some networking naming issues. Can you check
with your system administrator regarding this to make sure that the
systems and the networks are properly configured.

Thanks, 

DK

> Python> gethostname()
> 
> moo
> 
> Python> gethostname_ex(gethostname))
> 
> --- This threw an exception saying the name lookup failed---
> 
> =20
> 
> Any help regarding this issue is greatly appreciated.
> 
> =20
> 
> Best regards
> 
> =20
> 
> Durga



More information about the mvapich-discuss mailing list