[mvapich-discuss] Problem with more MPI jobs on the same node

Emir Imamagic eimamagi at srce.hr
Sat Aug 29 14:48:40 EDT 2009


Dhabaleswar Panda wrote:
> What is the output of top and mpstat when you run a 16-process LU job on
> the same 16-cores (0-15)?

Command:
  mpirun_rsh -ssh -np 16 -hostfile ./machines VIADEV_USE_AFFINITY=0 
./lu.C.16

TOP:
top - 20:45:42 up 56 days, 15:18,  2 users,  load average: 8.55, 5.76, 4.46
Tasks: 484 total,  17 running, 467 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.2%us,  1.4%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:  66072240k total,  9708912k used, 56363328k free,   336556k buffers
Swap:  7999992k total,        0k used,  7999992k free,  7728032k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32508 eimamagi  25   0  140m  79m  19m R 99.1  0.1   0:42.09 lu.C.16
32509 eimamagi  25   0  140m  63m 4176 R 99.1  0.1   0:42.10 lu.C.16
32510 eimamagi  25   0  140m  63m 3792 R 99.1  0.1   0:42.08 lu.C.16
32511 eimamagi  25   0  140m  63m 3332 R 99.1  0.1   0:42.09 lu.C.16
32512 eimamagi  25   0  140m  63m 4228 R 99.1  0.1   0:42.11 lu.C.16
32513 eimamagi  25   0  140m  64m 5148 R 99.1  0.1   0:42.11 lu.C.16
32514 eimamagi  25   0  140m  64m 4772 R 99.1  0.1   0:42.11 lu.C.16
32515 eimamagi  25   0  140m  63m 4232 R 99.1  0.1   0:42.11 lu.C.16
32516 eimamagi  25   0  140m  63m 4052 R 99.1  0.1   0:42.11 lu.C.16
32517 eimamagi  25   0  140m  64m 4716 R 99.1  0.1   0:42.10 lu.C.16
32518 eimamagi  25   0  140m  63m 4544 R 99.1  0.1   0:42.10 lu.C.16
32519 eimamagi  25   0  140m  63m 4060 R 99.1  0.1   0:42.11 lu.C.16
32520 eimamagi  25   0  140m  62m 3892 R 99.1  0.1   0:42.10 lu.C.16
32521 eimamagi  25   0  140m  63m 4428 R 99.1  0.1   0:42.11 lu.C.16
32522 eimamagi  25   0  140m  63m 4428 R 99.1  0.1   0:42.11 lu.C.16
32523 eimamagi  25   0  140m  62m 3392 R 99.1  0.1   0:42.11 lu.C.16

MPSTAT:
20:45:23     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal 
   %idle    intr/s
20:45:25     all   50.02    0.00    0.03    0.00    0.00    0.00    0.00 
   49.95   1005.00
20:45:25       0  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00   1005.00
20:45:25       1  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       2  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       3  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       4  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       6  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       7  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       8  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25       9  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      10  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      11  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      12  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      13  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      14  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      15  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:45:25      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      19    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      20    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      23    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      24    0.50    0.00    0.50    0.00    0.00    0.00    0.00 
   99.00      0.00
20:45:25      25    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      26    0.50    0.00    0.50    0.00    0.00    0.00    0.00 
   99.00      0.00
20:45:25      27    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      29    0.00    0.00    0.50    0.00    0.00    0.00    0.00 
   99.50      0.00
20:45:25      30    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:45:25      31    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00



Just for comparison, here's the output when I run 2 instances of 
lu.C.16. It is pretty obvious that only first 16 CPUs are used no matter 
how many jobs I start.

TOP:
top - 20:47:06 up 56 days, 15:19,  3 users,  load average: 16.74, 8.87, 5.66
Tasks: 509 total,  33 running, 476 sleeping,   0 stopped,   0 zombie
Cpu(s): 50.0%us,  0.1%sy,  0.0%ni, 49.9%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:  66072240k total, 10744044k used, 55328196k free,   336564k buffers
Swap:  7999992k total,        0k used,  7999992k free,  7769652k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32671 eimamagi  25   0  140m  62m 3464 R 50.5  0.1   0:05.25 lu.C.16
32673 eimamagi  25   0  140m  63m 3996 R 50.5  0.1   0:05.26 lu.C.16
32510 eimamagi  25   0  140m  63m 3892 R 50.2  0.1   2:01.07 lu.C.16
32511 eimamagi  25   0  140m  63m 3380 R 50.2  0.1   2:01.13 lu.C.16
32512 eimamagi  25   0  140m  64m 4228 R 50.2  0.1   2:01.15 lu.C.16
32513 eimamagi  25   0  140m  64m 5148 R 50.2  0.1   2:01.15 lu.C.16
32514 eimamagi  25   0  140m  64m 4860 R 50.2  0.1   2:01.15 lu.C.16
32516 eimamagi  25   0  141m  63m 4084 R 50.2  0.1   2:01.13 lu.C.16
32519 eimamagi  25   0  140m  63m 4152 R 50.2  0.1   2:01.14 lu.C.16
32521 eimamagi  25   0  140m  63m 4468 R 50.2  0.1   2:01.14 lu.C.16
32523 eimamagi  25   0  140m  62m 3756 R 50.2  0.1   2:01.13 lu.C.16
32659 eimamagi  25   0  140m  79m  19m R 50.2  0.1   0:05.25 lu.C.16
32660 eimamagi  25   0  140m  63m 4160 R 50.2  0.1   0:05.26 lu.C.16
32662 eimamagi  25   0  140m  63m 3280 R 50.2  0.1   0:05.27 lu.C.16
32664 eimamagi  25   0  141m  64m 5140 R 50.2  0.1   0:05.27 lu.C.16
32665 eimamagi  25   0  140m  64m 4876 R 50.2  0.1   0:05.27 lu.C.16
32666 eimamagi  25   0  140m  64m 4348 R 50.2  0.1   0:05.27 lu.C.16
32668 eimamagi  25   0  140m  64m 4688 R 50.2  0.1   0:05.26 lu.C.16
32669 eimamagi  25   0  140m  63m 4416 R 50.2  0.1   0:05.26 lu.C.16
32672 eimamagi  25   0  140m  63m 4396 R 50.2  0.1   0:05.26 lu.C.16
32508 eimamagi  25   0  140m  79m  19m R 49.8  0.1   2:01.14 lu.C.16
32509 eimamagi  25   0  140m  63m 4176 R 49.8  0.1   2:01.13 lu.C.16
32515 eimamagi  25   0  140m  64m 4404 R 49.8  0.1   2:01.14 lu.C.16
32517 eimamagi  25   0  141m  64m 4716 R 49.8  0.1   2:01.14 lu.C.16
32518 eimamagi  25   0  140m  64m 4660 R 49.8  0.1   2:01.13 lu.C.16
32520 eimamagi  25   0  140m  63m 3960 R 49.8  0.1   2:01.15 lu.C.16
32522 eimamagi  25   0  140m  63m 4484 R 49.8  0.1   2:01.14 lu.C.16
32661 eimamagi  25   0  140m  63m 3776 R 49.8  0.1   0:05.27 lu.C.16
32663 eimamagi  25   0  140m  64m 4216 R 49.8  0.1   0:05.26 lu.C.16
32667 eimamagi  25   0  140m  63m 3896 R 49.8  0.1   0:05.27 lu.C.16
32670 eimamagi  25   0  140m  63m 3956 R 49.8  0.1   0:05.26 lu.C.16
32674 eimamagi  25   0  140m  62m 3408 R 49.8  0.1   0:05.27 lu.C.16

MPSTAT:
20:47:35     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal 
   %idle    intr/s
20:47:37     all   50.00    0.02    0.08    0.00    0.00    0.00    0.00 
   49.91   1004.50
20:47:37       0  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00   1004.50
20:47:37       1  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       2  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       3  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       4  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       6  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       7  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       8  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37       9  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      10  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      11  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      12  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      13  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      14  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      15  100.00    0.00    0.00    0.00    0.00    0.00    0.00 
    0.00      0.00
20:47:37      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      19    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      20    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      23    0.00    0.00    0.50    0.00    0.00    0.00    0.00 
   99.50      0.00
20:47:37      24    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      25    0.00    0.00    0.50    0.00    0.00    0.00    0.00 
   99.50      0.00
20:47:37      26    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      27    0.00    0.00    0.50    0.00    0.00    0.00    0.00 
   99.50      0.00
20:47:37      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      29    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      30    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
  100.00      0.00
20:47:37      31    0.00    0.00    0.50    0.00    0.00    0.00    0.00 
   99.50      0.00


> You also indicated in your original e-mail that a single node has 32
> cores. I am assuming that it has eight sockets of four cores each. Are
> these Opterons or any other processor type?

Quad-Core AMD Opteron(tm) Processor 8384.

Thanks,
emir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3283 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090829/c80d76e9/smime.bin


More information about the mvapich-discuss mailing list