[mvapich-discuss] Error 'End of file reached on hostfile'

Paul van der Mark paulvdm at scs.fsu.edu
Thu Mar 6 09:39:02 EST 2008


We had the same thing with mvapich2 and torque (an open source
derivative of PBS). We used a simple perl script that rewrites the
PBS_NODEFILE into a format suitable for mpirun. 

For mvapich (0.9.9) we use Peter Wyckoff's mpiexec, since it has support
for resource managers like PBS and TORQUE and doesn't rely on sshd to
start/stop the processes.

Paul van der Mark

On Wed, 2008-03-05 at 22:46 +0100, Pawel Dziekonski wrote:
> hi,
> 
> i'm using mvapich-0.9.9-1458 (the one that comes OFED 1.2.5.5) and it
> emits error 'End of file reached on hostfile at 2 of 4 hostnames' when
> machinefile contains the same hostname more than once. this happens
> only for some applications, like CPMD or Amber, basic tests or HPL
> work ok. machinefile is generated by PBS Pro 8 queueing system and
> looks very simple, eg:
> 
> wn152
> wn152
> wn153
> wn153
> 
> when job is enqueued with a hard requirement for 4 cpus on 4 different
> nodes (nodes=4:ppn=1) than generated machinefile looks like:
> 
> wn152
> wn153
> wn154
> wn155
> 
> and jobs run perfectly well.
> 
> is it a bug or feature? ;)
> any way to avoid this?
> 
> thanks in advance, P
> -- 
> Pawel Dziekonski <pawel.dziekonski at wcss.pl>
> Wroclaw Centre for Networking & Supercomputing, HPC Department
> Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND
> phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the mvapich-discuss mailing list