[mvapich-discuss] Running more than 72 tasks with mvapich 0.9.5

Otheus otheus.shelling at uibk.ac.at
Mon May 8 15:32:39 EDT 2006


Greetings,

I think I found my answer at: 
https://docs.mellanox.com/dm/ibgold/docs/Troubleshooting.txt

    Problem: Running MPI on a big cluster (>200 nodes) fails.

    Suggestion:
    	Try to increase the VAPI driver timeout parameter, VIADEV_DEFAULT_TIME_OUT,
    	for the MPI stack. To achieve	this, use the '-paramfile filename' option with
    	mpirun_rsh. For example, you can run:

         /usr/local/ibgd/mpi/osu/gcc/mvapich-0.9.5/bin/mpirun_rsh -np 2 -paramfile ./perfparams -hostfile /root/cluster /usr/local/ibgd/mpi/osu/gcc/tests/PMB2.2.1/PMB-MPI1

          where the file perfparams includes the following line:
    	VIADEV_DEFAULT_TIME_OUT = 12

In my case, I had to set the default to 31. Numbers bigger than this 
resulted in another error.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20060508/aa3099c3/attachment.html


More information about the mvapich-discuss mailing list