[mvapich-discuss] 答复: benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error
Lockwood, Glenn
glock at sdsc.edu
Mon Mar 31 11:21:48 EDT 2014
Out of curiosity, does your hosts_mvapich literally contain a host with a wildcard character ("bb-nsi-ib04.*.com") or did you just star this out for privacy?
I don't think wildcards will work in mpich2/mvapich2 machine files.
Glenn
--
Glenn K. Lockwood, Ph.D.
User Services Group
San Diego Supercomputer Center
glock at sdsc.edu<mailto:glock at sdsc.edu> / (858) 246-1075
On Mar 31, 2014, at 7:00 AM, Wang,Yanfei(SYS) <wangyanfei01 at baidu.com<mailto:wangyanfei01 at baidu.com>> wrote:
Hi Hari,
The “MV2_USE_RoCE” does not fix this issue. Since the HCA can simultaneously supports TCP/TCP stack and RDMA stack, so even if any rdma specified paramerters are assigned, MVAPICH can select tcp/ip stack by default, I think. Before I have add MV2_USE_RoCE param, and error still comes.
TEST:
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich MV2_USE_RoCE=1 osu_latency
gethostbyname: Unknown server error
gethostbyname: Unknown server error
[bb-nsi-ib04.*.com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
[root at bb-nsi-ib04 pt2pt]#
HOSTS:
Here, each server in cluster has same /etc/hosts content. Currently, we just use two physical node in RoCE network with switch enabled.
Eg:
[root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
192.168.71.3 ib03
192.168.71.4 ib04
Topology:
IB03 40G HCA --- 40G switch --- IB04 40G HCA, (ROCE)
IB03 10G NIC --- -10G switch --- IB04 10G NIC, (10G management network)
The same network topology can runs for OpenMPI successfully
Whether the error is just from mpirrun_rsh, It seems that no any process is created..
BR
Thanks
Yanfei
发件人: hari.subramoni at gmail.com<mailto:hari.subramoni at gmail.com> [mailto:hari.subramoni at gmail.com] 代表 Hari Subramoni
发送时间: 2014年3月31日 21:44
收件人: Wang,Yanfei(SYS)
抄送: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
主题: Re: [mvapich-discuss] benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error
Hello Yanwei,
When you are running MVAPICH2 in a RoCE environment, you need to use set "MV2_USE_RoCE=1"
eg: mpirun_rsh -np 2 -hostfile <hostfile> MV2_USE_RoCE=1 prog
Please refer to the following section of our userguide for more details.
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-2.0rc1.html#x1-370005.2.7
Please let us know if this solves your issue.
Regards,
Hari.
On Mon, Mar 31, 2014 at 9:31 AM, Wang,Yanfei(SYS) <wangyanfei01 at baidu.com<mailto:wangyanfei01 at baidu.com>> wrote:
Hi,
I am a fresh learner of MPI, and just try to do some verification on mVAPICH2 library on RoCE armed with mvapich2-2.0rc1 on MLNX_OFED_LINUX-2.1-1.0.6-rhel6.3-x86_64.
Could you give me some tips to fix this following issues.
Configuration:
[root at bb-nsi-ib04 pt2pt]# cat hosts_mvapich
ib03:1
ib04:1
[root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
192.168.71.3 ib03
192.168.71.4 ib04
ERROR:
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich osu_bw
gethostbyname: Unknown server error
[bb-nsi-ib04.*.com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
gethostbyname: Unknown server error
[root at bb-nsi-ib04 pt2pt]#
It could be caused by wrong configuration. Before on same platform I have do verification on OpenMPI with same RoCE configurations and similar host configurations.
Thanks.
-Yanfei
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140331/c9be9685/attachment-0001.html>
More information about the mvapich-discuss
mailing list