[mvapich-discuss] Problems with hostname resolution and MPI_INIT()
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Tue Jan 13 11:08:13 EST 2009
Michael:
Hi, my comments are inline.
On Tue, Jan 13, 2009 at 09:29:33AM -0600, Mike Heinz wrote:
> I keep running into this problem at random, and each time it brings someone down for a couple of hours before we figure it out... again.
>
> Basically, some distros add lines like this to their host file:
>
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1 node01 localhost.localdomain localhost
My impression is that node01 should not be listed as an entry for the
localhost ip address. I believe node01 should be listed by its the unique
ip address on its subnet.
>
> The problem is that this causes "node01" to tell the other MPI ranks that it's IP address is 127.0.0.1, which causes MPI jobs to hang in MPI_INIT().
>
> I've seen a similar issue with distros that define 127.0.0.2.
>
> So, I don't mind digging into the code and changing how mvapich does IP address resolution, but I can't help but think that this problem must happen all the time - before I start patching the code, is there a bug in my network configs that I should be fixing?
I think it is an issue with your network configs. Which distro(s) do
you see this problem on?
>
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
More information about the mvapich-discuss
mailing list