[mvapich-discuss] program hanged using mvapich with large number of processes

Weimin Wang wmwang at gmail.com
Fri Jan 22 10:24:16 EST 2010


Hello, list,

I have got a strange problem with mvapich2. For cpi example, when I run it
with small number of processes, it is OK:

wmwang at node32:~/test> mpirun_rsh -ssh -np 2 -hostfile ./ma ./cpi
Process 0 on node32
Process 1 on node32
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000174

wmwang at node32:~/test> mpirun_rsh -ssh -np 10 -hostfile ./ma ./cpi
Process 8 on node33
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000127
Process 1 on node32
Process 3 on node32
Process 0 on node32
Process 4 on node32
Process 2 on node32
Process 6 on node32
Process 5 on node32
Process 7 on node32
Process 9 on node33
However, when I run cpi with large number processes, the program hangs with
no output:

wmwang at node32:~/test> mpirun_rsh -ssh -np 18 -hostfile ./ma ./cpi

And top command in node32 show that,

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14507 wmwang    15   0 60336  50m  676 S   56  0.2   0:03.86 mpispawn
The system I used is,

wmwang at node33:~> uname -a
Linux node33 2.6.16.60-0.42.4_lustre.1.8.1.1-smp #1 SMP Fri Aug 14 08:33:26
MDT 2009 x86_64 x86_64 x86_64 GNU/Linux
The compiler is pgi v10.0.

Would you please give me any hint for this problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100122/55abec2f/attachment.html


More information about the mvapich-discuss mailing list