[mvapich-discuss] viacheck.c error?

Matthew Koop koop at cse.ohio-state.edu
Thu Mar 15 23:52:39 EDT 2007


> Has anyone seen this before?  I'm puzzled.  If it is a connectivity
> issue, how do I know, and why would cpi and the other very simple
> programs run?  Non-ib batch programs using mpich run successfully.

I'm not completely sure why the other benchmarks are not showing the
problems -- perhaps it is related to the osu_benchmarks stressing the
network more than cpi, which is minimal in communication.

> We use ssh for everything and have disabled rsh.  The average user
> cannot ssh to a compute node from the head node, but, once on a
> compute node, he/she can ssh to other compute nodes without supplying
> passwords. For mpich, we use mpiexec and this works well for us, but
> we'd really like to use the infiniband.  :( Please point me in the
> right direction if this is not a mvapich related problem.  I
> administer this cluster, so asking my sysadmin is not an option. ;)

This does not seem like an MVAPICH issue, so perhaps testing the
lower-level InfiniBand layer directly will be beneficial to rule out setup
and connectivity issues.

It'd be helpful if you could try running 'ibv_rc_pingpong' that is shipped
with OFED. That will allow us to see if the network between at least two
nodes is working properly.

On host_a:
ibv_rc_pingpong

On host_b:
ibv_rc_pingpong host_b

Just a couple other questions:
- What is the OS being used?
- What HCA type is being used? (Mellanox or Pathscale, PCI-Express or
PCI-X)

Thanks,
Matt



More information about the mvapich-discuss mailing list