[mvapich-discuss] program hanged using mvapich with large number of processes

Dhabaleswar Panda panda at cse.ohio-state.edu
Fri Jan 22 16:52:04 EST 2010


Can you tell us the MVAPICH2 version you are using. Also, can you tell us
the IB adapter type used in your system.

Thanks,

DK

On Fri, 22 Jan 2010, Weimin Wang wrote:

> Hello, list,
>
> I have got a strange problem with mvapich2. For cpi example, when I run it
> with small number of processes, it is OK:
>
> wmwang at node32:~/test> mpirun_rsh -ssh -np 2 -hostfile ./ma ./cpi
> Process 0 on node32
> Process 1 on node32
> pi is approximately 3.1416009869231241, Error is 0.0000083333333309
> wall clock time = 0.000174
>
> wmwang at node32:~/test> mpirun_rsh -ssh -np 10 -hostfile ./ma ./cpi
> Process 8 on node33
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.000127
> Process 1 on node32
> Process 3 on node32
> Process 0 on node32
> Process 4 on node32
> Process 2 on node32
> Process 6 on node32
> Process 5 on node32
> Process 7 on node32
> Process 9 on node33
> However, when I run cpi with large number processes, the program hangs with
> no output:
>
> wmwang at node32:~/test> mpirun_rsh -ssh -np 18 -hostfile ./ma ./cpi
>
> And top command in node32 show that,
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 14507 wmwang    15   0 60336  50m  676 S   56  0.2   0:03.86 mpispawn
> The system I used is,
>
> wmwang at node33:~> uname -a
> Linux node33 2.6.16.60-0.42.4_lustre.1.8.1.1-smp #1 SMP Fri Aug 14 08:33:26
> MDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> The compiler is pgi v10.0.
>
> Would you please give me any hint for this problem?
>



More information about the mvapich-discuss mailing list