[mvapich-discuss] mvapich 1.1 mpirun_rsh problems on process count > 350

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Mar 29 09:15:41 EDT 2011


Johnny:
Hello, I don't think we've seen this type of error previously.  The on
demand variable will not have an effect on the behavior that you're
seeing.  I'll need to check to see if any variables can change the
pmgr collective behavior and get back to you. Can you tell us how many
processes there are on each node (ie. 16 per node)?

If you are able to, I would suggest updating your installation of
mvapich to mvapich-1.2rc1 or mvapich2-1.6.  It's possible that the
problem you're facing has been resolved in one of our releases since
mvapich-1.1.

On Mon, Mar 28, 2011 at 5:35 AM, Johnny Devaprasad
<johnnydevaprasad at gmail.com> wrote:
> Hi all,
> mpirun_rsh has problems when launching jobs with np > 350.
> The resulting error is as follows:
> PMGR_COLLECTIVE ERROR: unexpected value: received 1, expecting 7 @ file
> pmgr_collective_mpispawn.c:144
> Any suggestions, about how to fix this would be greatly appreciated.
> I tried increasing the limits by VIADEV ON DEMAND THRESHOLD , but
> does not seem to help. I could be tuning the wrong variable.
> The IB stack that is being used is the default redhat 5 infiniband stack.
> I do not have information about the IB adapters, but if that is contributing
> to this
> error, then please do let me know.
> Thank you in advance.
> Regards,
> Johnny
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list