[mvapich-discuss] Problems with hostname resolution and MPI_INIT()

Mike Heinz michael.heinz at qlogic.com
Tue Jan 13 10:29:33 EST 2009


I keep running into this problem at random, and each time it brings someone down for a couple of hours before we figure it out... again.

Basically, some distros add lines like this to their host file:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1      node01 localhost.localdomain localhost

The problem is that this causes "node01" to tell the other MPI ranks that it's IP address is 127.0.0.1, which causes MPI jobs to hang in MPI_INIT().

I've seen a similar issue with distros that define 127.0.0.2.

So, I don't mind digging into the code and changing how mvapich does IP address resolution, but I can't help but think that this problem must happen all the time - before I start patching the code, is there a bug in my network configs that I should be fixing?


--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090113/d4336ed0/attachment-0001.html


More information about the mvapich-discuss mailing list