[mvapich-discuss] 答复: 答复: 答复: benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error

Wang,Yanfei(SYS) wangyanfei01 at baidu.com
Mon Mar 31 11:28:59 EDT 2014


Hi,  
It seem that the error goes further, old error has expired! Are there some online materials about this, I would like to consult that as well, to try to fix this issue by myself.  

Before the iptables have prohibited the connection for mpirun_rsh, which has been removed. 

[root at bb-nsi-ib04 pt2pt]# mpiexec -n 2 -f hosts_mvapich ./osu_bw
[cli_1]: aborting job:
Fatal error in MPI_Init:
Other MPI error


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich  ./osu_latency
gethostbyname: Unknown server error
[bb-nsi-ib04.#com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
gethostbyname: Unknown server error
[root at bb-nsi-ib04 pt2pt]# 


Thanks
Yanfei

-----邮件原件-----
发件人: mvapich-discuss [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] 代表 Wang,Yanfei(SYS)
发送时间: 2014年3月31日 23:06
收件人: Jonathan Perkins
抄送: mvapich-discuss
主题: [mvapich-discuss] 答复: 答复: benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error

Hi,  

Result: 

Mpiexec run fails. 
1. mpiexec
[root at bb-nsi-ib04 pt2pt]# mpiexec -n 2 -f hosts_mvapich osu_bw [proxy:0:1 at bb-nsi-ib04.*com] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file osu_bw (No such file or directory)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 255
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at bb-nsi-ib03*.com] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file osu_bw (No such file or directory)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 255
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

2. mpirun_rsh with RoCE parameter
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich MV2_USE_RoCE=1 osu_latency
gethostbyname: Unknown server error
[bb-nsi-ib04.*com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
gethostbyname: Unknown server error
[root at bb-nsi-ib04 pt2pt]# 

3. mpirun_rsh
[root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich  osu_latency
gethostbyname: Unknown server error
[bb-nsi-ib04.*com:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
gethostbyname: Unknown server error
[root at bb-nsi-ib04 pt2pt]#

BR 

Thanks
Yanfei 

-----邮件原件-----
发件人: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
发送时间: 2014年3月31日 22:53
收件人: Wang,Yanfei(SYS)
抄送: Jonathan Perkins; mvapich-discuss
主题: Re: 答复: [mvapich-discuss] benchmark osu_bws run failed, on mvapich2-2.0rc1: gethostbyname: Unknown server error

Before debugging further, I would like to know whether the following works for you...

mpiexec -n 2 -f hosts_mvapich osu_bw


On Mon, Mar 31, 2014 at 10:12 AM, Wang,Yanfei(SYS) <wangyanfei01 at baidu.com> wrote:
> Hi,
>
>
>
> Each node in cluster has same /etc/hosts, which is like:
>
> [root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
> Currently, we have only 2 nodes available in RoCE cluster, IB03 and IB04.
>
>
>
> BR
>
>
>
> Thanks
>
> Yanfei
>
>
>
>
>
> 发件人: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
> 发送时间: 2014年3月31日 21:40
> 收件人: Wang,Yanfei(SYS)
> 抄送: mvapich-discuss
> 主题: Re: [mvapich-discuss] benchmark osu_bws run failed, on mvapich2-2.0rc1:
> gethostbyname: Unknown server error
>
>
>
> Can you share the contents of the /etc/hosts file from each machine 
> including the machine that you launch from?
>
> On Mar 31, 2014 9:33 AM, "Wang,Yanfei(SYS)" <wangyanfei01 at baidu.com> wrote:
>
> Hi,
>
>
>
> I am a fresh learner of MPI, and just try to do some verification on
> mVAPICH2 library on RoCE armed with mvapich2-2.0rc1 on 
> MLNX_OFED_LINUX-2.1-1.0.6-rhel6.3-x86_64.
>
>
>
> Could you give me some tips to fix this following issues.
>
>
>
> Configuration:
>
> [root at bb-nsi-ib04 pt2pt]# cat hosts_mvapich
>
> ib03:1
>
> ib04:1
>
> [root at bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
>
>
> ERROR:
>
> [root at bb-nsi-ib04 pt2pt]# mpirun_rsh -np 2 --hostfile hosts_mvapich 
> osu_bw
>
> gethostbyname: Unknown server error
>
> [bb-nsi-ib04.*.com:mpirun_rsh][child_handler] Error in init phase, aborting!
> (0/2 mpispawn connections)
>
> gethostbyname: Unknown server error
>
> [root at bb-nsi-ib04 pt2pt]#
>
>
>
> It could be caused by wrong configuration. Before on same platform I 
> have do verification on OpenMPI with same RoCE configurations and 
> similar host configurations.
>
>
>
> Thanks.
>
> -Yanfei
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list