[mvapich-discuss] mvapich 1.1 mpirun_rsh problems on process count > 350

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Mar 31 12:35:13 EDT 2011


Johnny:
Thanks for the info.  We suggest using mvapich2 but we'll take a look
at the mvapich issue.

On Thu, Mar 31, 2011 at 5:37 AM, Johnny Devaprasad
<johnnydevaprasad at gmail.com> wrote:
>
> Hi Jonathan,
> These are magny-cours cpus, with 48 cores  on each node. Hence I have set
> the number of slots to 48.
> But that might be irrelevant, because even if i set the slots to 8, for
> example, I can run jobs across multiple
> nodes, but not more than the total of 350.
> -   I have tried, mvapich-1.2rc1 and it has the same problem.
> + There are no issues when using  mvapich2-1.6 and it works fine. I have
> tried upto 1000 cores per job.
> Regards,
> Johnny
>
> On Tue, Mar 29, 2011 at 3:15 PM, Jonathan Perkins
> <perkinjo at cse.ohio-state.edu> wrote:
>>
>> Johnny:
>> Hello, I don't think we've seen this type of error previously.  The on
>> demand variable will not have an effect on the behavior that you're
>> seeing.  I'll need to check to see if any variables can change the
>> pmgr collective behavior and get back to you. Can you tell us how many
>> processes there are on each node (ie. 16 per node)?
>>
>> If you are able to, I would suggest updating your installation of
>> mvapich to mvapich-1.2rc1 or mvapich2-1.6.  It's possible that the
>> problem you're facing has been resolved in one of our releases since
>> mvapich-1.1.
>>
>> On Mon, Mar 28, 2011 at 5:35 AM, Johnny Devaprasad
>> <johnnydevaprasad at gmail.com> wrote:
>> > Hi all,
>> > mpirun_rsh has problems when launching jobs with np > 350.
>> > The resulting error is as follows:
>> > PMGR_COLLECTIVE ERROR: unexpected value: received 1, expecting 7 @ file
>> > pmgr_collective_mpispawn.c:144
>> > Any suggestions, about how to fix this would be greatly appreciated.
>> > I tried increasing the limits by VIADEV ON DEMAND THRESHOLD , but
>> > does not seem to help. I could be tuning the wrong variable.
>> > The IB stack that is being used is the default redhat 5 infiniband
>> > stack.
>> > I do not have information about the IB adapters, but if that is
>> > contributing
>> > to this
>> > error, then please do let me know.
>> > Thank you in advance.
>> > Regards,
>> > Johnny
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > mvapich-discuss mailing list
>> > mvapich-discuss at cse.ohio-state.edu
>> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Perkins
>> http://www.cse.ohio-state.edu/~perkinjo
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list