[mvapich-discuss] Problems with hostname resolution and MPI_INIT()

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Jan 13 11:08:13 EST 2009


Michael:
Hi, my comments are inline.

On Tue, Jan 13, 2009 at 09:29:33AM -0600, Mike Heinz wrote:
> I keep running into this problem at random, and each time it brings someone down for a couple of hours before we figure it out... again.
> 
> Basically, some distros add lines like this to their host file:
> 
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1      node01 localhost.localdomain localhost

My impression is that node01 should not be listed as an entry for the
localhost ip address.  I believe node01 should be listed by its the unique
ip address on its subnet.

> 
> The problem is that this causes "node01" to tell the other MPI ranks that it's IP address is 127.0.0.1, which causes MPI jobs to hang in MPI_INIT().
> 
> I've seen a similar issue with distros that define 127.0.0.2.
> 
> So, I don't mind digging into the code and changing how mvapich does IP address resolution, but I can't help but think that this problem must happen all the time - before I start patching the code, is there a bug in my network configs that I should be fixing?

I think it is an issue with your network configs.  Which distro(s) do
you see this problem on?

> 
> 
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania

> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list