[mvapich-discuss] Re: mvapich2-1.6 problems with np ~ >= 1000

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Apr 5 11:32:34 EDT 2011


Does osu_bcast run successfully?

On Tue, Apr 5, 2011 at 9:54 AM, Johnny Devaprasad
<johnnydevaprasad at gmail.com> wrote:
> Hi Jonathan,
> The version of mvapich2 is 1.6 (This is on the subject line, so I did not
> include it in the description.) I downloaded this a couple of weeks back.
> I have 112 nodes (48 cores each - magny-cours cpus).
> On the command line i specify -np 2000 ( which is approximately 42 nodes).
> My machine file has more entries than that.
>
> Infiniband information:
> -------------------------------
> [root at node112 ~]# lspci | grep Infi
> 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
> 5GT/s - IB QDR / 10GigE] (rev b0)
> ibstat
> CA 'mlx4_0'
> CA type: MT26428
> Number of ports: 1
> Firmware version: 2.7.626
> Hardware version: b0
>
> Regards,
> Johnny
> On Tue, Apr 5, 2011 at 12:01 PM, Johnny Devaprasad
> <johnnydevaprasad at gmail.com> wrote:
>>
>> Hi all,
>> I am running a simple MPI program (only calls MPI_Get_processor_name).
>> This sometime works and most of the time does not...
>> mpirun_rsh -np 2000 -hostfile
>> /home/jd/working/simple/mvapich2/machinefile_large
>> /home/jd/working/simple/mvapich2/mvapich2_pgi
>> Exit code -5 signaled from node015
>> MPI process (rank: 315) terminated unexpectedly on node027
>> MPI process (rank: 214) terminated unexpectedly on node014
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> mpirun_rsh -np 1000 -hostfile
>> /home/jd/working/simple/mvapich2/machinefile_large
>> /home/jd/working/simple/mvapich2/mvapich2_pgi
>> MPI process (rank: 435) terminated unexpectedly on node044
>> Exit code -5 signaled from node041
>> MPI process (rank: 777) terminated unexpectedly on node048
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>> handle_mt_peer: fail to read...: Success
>>
>> Regards,
>> Johnny
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list