[mvapich-discuss] 答复: benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error

Lockwood, Glenn glock at sdsc.edu
Mon Mar 31 11:21:48 EDT 2014


Out of curiosity, does your hosts_mvapich literally contain a host with a wildcard character ("bb-nsi-ib04.*.com") or did you just star this out for privacy?

I don't think wildcards will work in mpich2/mvapich2 machine files.

Glenn

--
Glenn K. Lockwood, Ph.D.
User Services Group
San Diego Supercomputer Center
glock at sdsc.edu<mailto:glock at sdsc.edu> / (858) 246-1075

On Mar 31, 2014, at 7:00 AM, Wang,Yanfei(SYS) <wangyanfei01 at baidu.com<mailto:wangyanfei01 at baidu.com>> wrote:

Hi Hari,

The “MV2_USE_RoCE” does not fix this issue. Since the HCA can simultaneously supports TCP/TCP stack and RDMA stack, so even if any rdma specified paramerters are assigned, MVAPICH can select tcp/ip stack by default, I think.  Before I have add MV2_USE_RoCE param, and error still comes.

TEST:
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich MV2_USE_RoCE=1 osu_latency
gethostbyname: Unknown server error
gethostbyname: Unknown server error
[bb-nsi-ib04.*.com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
[root at bb-nsi-ib04 pt2pt]#

HOSTS:
Here, each server in cluster has same /etc/hosts content. Currently, we just use two physical node in RoCE network with switch enabled.
Eg:
[root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
192.168.71.3 ib03
192.168.71.4 ib04

Topology:
IB03 40G HCA --- 40G switch --- IB04 40G HCA,  (ROCE)
IB03 10G NIC --- -10G switch --- IB04 10G NIC,   (10G management network)
The same  network topology can runs for OpenMPI successfully

Whether the error is just from mpirrun_rsh, It seems that no any process is created..

BR

Thanks
Yanfei

发件人: hari.subramoni at gmail.com<mailto:hari.subramoni at gmail.com> [mailto:hari.subramoni at gmail.com] 代表 Hari Subramoni
发送时间: 2014年3月31日 21:44
收件人: Wang,Yanfei(SYS)
抄送: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
主题: Re: [mvapich-discuss] benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error

Hello Yanwei,
When you are running MVAPICH2 in a RoCE environment, you need to use set "MV2_USE_RoCE=1"
eg: mpirun_rsh -np 2 -hostfile <hostfile> MV2_USE_RoCE=1 prog
Please refer to the following section of our userguide for more details.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-2.0rc1.html#x1-370005.2.7
Please let us know if this solves your issue.

Regards,
Hari.

On Mon, Mar 31, 2014 at 9:31 AM, Wang,Yanfei(SYS) <wangyanfei01 at baidu.com<mailto:wangyanfei01 at baidu.com>> wrote:
Hi,

I am a fresh learner of MPI, and just try to do some verification on mVAPICH2 library on RoCE armed with mvapich2-2.0rc1 on MLNX_OFED_LINUX-2.1-1.0.6-rhel6.3-x86_64.

Could you give me some tips to fix this following issues.

Configuration:
[root at bb-nsi-ib04 pt2pt]# cat hosts_mvapich
ib03:1
ib04:1
[root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
192.168.71.3 ib03
192.168.71.4 ib04

ERROR:
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich osu_bw
gethostbyname: Unknown server error
[bb-nsi-ib04.*.com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
gethostbyname: Unknown server error
[root at bb-nsi-ib04 pt2pt]#

It could be caused by wrong configuration. Before on same platform I have do verification on OpenMPI with same RoCE configurations and similar host configurations.

Thanks.
-Yanfei

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140331/c9be9685/attachment-0001.html>


More information about the mvapich-discuss mailing list