[mvapich-discuss] viacheck.c error?
Matthew Koop
koop at cse.ohio-state.edu
Thu Mar 15 23:52:39 EDT 2007
> Has anyone seen this before? I'm puzzled. If it is a connectivity
> issue, how do I know, and why would cpi and the other very simple
> programs run? Non-ib batch programs using mpich run successfully.
I'm not completely sure why the other benchmarks are not showing the
problems -- perhaps it is related to the osu_benchmarks stressing the
network more than cpi, which is minimal in communication.
> We use ssh for everything and have disabled rsh. The average user
> cannot ssh to a compute node from the head node, but, once on a
> compute node, he/she can ssh to other compute nodes without supplying
> passwords. For mpich, we use mpiexec and this works well for us, but
> we'd really like to use the infiniband. :( Please point me in the
> right direction if this is not a mvapich related problem. I
> administer this cluster, so asking my sysadmin is not an option. ;)
This does not seem like an MVAPICH issue, so perhaps testing the
lower-level InfiniBand layer directly will be beneficial to rule out setup
and connectivity issues.
It'd be helpful if you could try running 'ibv_rc_pingpong' that is shipped
with OFED. That will allow us to see if the network between at least two
nodes is working properly.
On host_a:
ibv_rc_pingpong
On host_b:
ibv_rc_pingpong host_b
Just a couple other questions:
- What is the OS being used?
- What HCA type is being used? (Mellanox or Pathscale, PCI-Express or
PCI-X)
Thanks,
Matt
More information about the mvapich-discuss
mailing list