[mvapich-discuss] Running more than 72 tasks with mvapich 0.9.5
Otheus
otheus.shelling at uibk.ac.at
Mon May 8 15:32:39 EDT 2006
Greetings,
I think I found my answer at:
https://docs.mellanox.com/dm/ibgold/docs/Troubleshooting.txt
Problem: Running MPI on a big cluster (>200 nodes) fails.
Suggestion:
Try to increase the VAPI driver timeout parameter, VIADEV_DEFAULT_TIME_OUT,
for the MPI stack. To achieve this, use the '-paramfile filename' option with
mpirun_rsh. For example, you can run:
/usr/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh -np 2 -paramfile ./perfparams -hostfile /root/cluster /usr/local/ibgd/mpi/osu/gcc/tests/PMB2.2.1/PMB-MPI1
where the file perfparams includes the following line:
VIADEV_DEFAULT_TIME_OUT = 12
In my case, I had to set the default to 31. Numbers bigger than this
resulted in another error.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20060508/aa3099c3/attachment.html
More information about the mvapich-discuss
mailing list