[mvapich-discuss] Problem with more MPI jobs on the same node

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Aug 31 21:22:03 EDT 2009


Very glad to know that your problem is solved with the latest mvapich 1.1
nightly build from mvapich web site.

I am posting this reply to mvapich-discuss list for other users.

We do not do the testing of mvapich versions included in the RHEL
versions. Red Hat people do this testing. I am cc'ing this note to Doug
Ledford.

Doug - Could you or your team members take a look at this issue. It looks
like affinity-related stuff is not working correctly with the mvapich 1.1
RPM version included in RHEL 5.3. There is a thread of discussion on this
issue on the mvapich-discuss list.

Thanks,

DK

On Tue, 1 Sep 2009, Emir Imamagic wrote:

> Dhabaleswar Panda wrote:
> > Thanks for the update. Will it be possible for you to download mvapich
> > 1.1.0 `branch' version from our web site and let us know whether it
> > exhibits the same behavior or not.  This will help us to isolate whether
> > it is a problem with the specific version in the SRPM or not.
>
> muy bueno, this solved the problem. I downloaded the latest nightly
> build (mvapich-1.1-2009-08-30.tar.gz), rebuild the RPM and voila, all my
> CPUs are completely utilized (see top output below).
>
> It is probably a good idea for you to test the version distributed with
> the latest RHEL 5.3 and try to reproduce the error. I guess RHEL users
> would be happy to know that they're getting problematic version.
>
> Cheers,
> emir
>
> top - 02:46:42 up 58 days, 21:19,  2 users,  load average: 30.02, 13.39,
> 5.11
> Tasks: 509 total,  33 running, 476 sleeping,   0 stopped,   0 zombie
> Cpu(s): 99.9%us,  0.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  66072240k total, 10799684k used, 55272556k free,   337052k buffers
> Swap:  7999992k total,        0k used,  7999992k free,  7831652k cached
>
>   P   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   0 18716 eimamagi  25   0  138m  79m  19m R 99.9  0.1   2:42.29 lu.C.16
> 26 18721 eimamagi  25   0  138m  65m 5312 R 99.9  0.1   2:42.40 lu.C.16
> 24 18723 eimamagi  25   0  138m  64m 4512 R 99.9  0.1   2:42.36 lu.C.16
>   2 18724 eimamagi  25   0  138m  64m 4744 R 99.9  0.1   2:42.40 lu.C.16
> 16 18726 eimamagi  25   0  138m  64m 5252 R 99.9  0.1   2:42.35 lu.C.16
>   1 18728 eimamagi  25   0  138m  63m 4164 R 99.9  0.1   2:42.42 lu.C.16
> 23 18729 eimamagi  25   0  138m  63m 4708 R 99.9  0.1   2:42.38 lu.C.16
>   6 18730 eimamagi  25   0  138m  63m 4696 R 99.9  0.1   2:42.41 lu.C.16
> 17 18753 eimamagi  25   0  138m  63m 3548 R 99.9  0.1   2:41.14 lu.C.16
> 19 18761 eimamagi  25   0  138m  64m 4700 R 99.9  0.1   2:41.41 lu.C.16
> 20 18762 eimamagi  25   0  138m  63m 4112 R 99.9  0.1   2:41.35 lu.C.16
> 15 18763 eimamagi  25   0  138m  63m 4632 R 99.9  0.1   2:41.43 lu.C.16
> 27 18764 eimamagi  25   0  138m  63m 4696 R 99.9  0.1   2:41.42 lu.C.16
>   3 18717 eimamagi  25   0  138m  64m 4560 R 99.6  0.1   2:42.41 lu.C.16
> 12 18718 eimamagi  25   0  138m  64m 4324 R 99.6  0.1   2:42.34 lu.C.16
> 21 18719 eimamagi  25   0  138m  63m 3652 R 99.6  0.1   2:42.39 lu.C.16
>   4 18720 eimamagi  25   0  138m  64m 4640 R 99.6  0.1   2:42.39 lu.C.16
> 18 18722 eimamagi  25   0  138m  64m 5152 R 99.6  0.1   2:42.40 lu.C.16
> 13 18727 eimamagi  25   0  138m  64m 4692 R 99.6  0.1   2:42.41 lu.C.16
> 28 18731 eimamagi  25   0  138m  63m 4136 R 99.6  0.1   2:42.35 lu.C.16
> 11 18750 eimamagi  25   0  138m  79m  19m R 99.6  0.1   2:41.40 lu.C.16
>   7 18751 eimamagi  25   0  138m  64m 4824 R 99.6  0.1   2:41.18 lu.C.16
> 22 18752 eimamagi  25   0  138m  64m 4532 R 99.6  0.1   2:41.41 lu.C.16
> 30 18754 eimamagi  25   0  138m  64m 4760 R 99.6  0.1   2:41.41 lu.C.16
>   5 18755 eimamagi  25   0  138m  65m 5312 R 99.6  0.1   2:41.42 lu.C.16
>   8 18757 eimamagi  25   0  138m  64m 4536 R 99.6  0.1   2:41.34 lu.C.16
> 14 18758 eimamagi  25   0  138m  64m 4676 R 99.6  0.1   2:41.41 lu.C.16
> 25 18759 eimamagi  25   0  138m  64m 5296 R 99.6  0.1   2:41.39 lu.C.16
> 10 18765 eimamagi  25   0  138m  62m 3784 R 99.6  0.1   2:41.27 lu.C.16
> 29 18725 eimamagi  25   0  138m  64m 5284 R 99.3  0.1   2:41.56 lu.C.16
>   9 18756 eimamagi  25   0  138m  65m 5288 R 99.3  0.1   2:40.30 lu.C.16
> 31 18760 eimamagi  25   0  138m  64m 5260 R 97.0  0.1   2:36.96 lu.C.16
>
>



More information about the mvapich-discuss mailing list