[mvapich-discuss] Problem with more MPI jobs on the same node
Dhabaleswar Panda
panda at cse.ohio-state.edu
Mon Aug 31 21:22:03 EDT 2009
Very glad to know that your problem is solved with the latest mvapich 1.1
nightly build from mvapich web site.
I am posting this reply to mvapich-discuss list for other users.
We do not do the testing of mvapich versions included in the RHEL
versions. Red Hat people do this testing. I am cc'ing this note to Doug
Ledford.
Doug - Could you or your team members take a look at this issue. It looks
like affinity-related stuff is not working correctly with the mvapich 1.1
RPM version included in RHEL 5.3. There is a thread of discussion on this
issue on the mvapich-discuss list.
Thanks,
DK
On Tue, 1 Sep 2009, Emir Imamagic wrote:
> Dhabaleswar Panda wrote:
> > Thanks for the update. Will it be possible for you to download mvapich
> > 1.1.0 `branch' version from our web site and let us know whether it
> > exhibits the same behavior or not. This will help us to isolate whether
> > it is a problem with the specific version in the SRPM or not.
>
> muy bueno, this solved the problem. I downloaded the latest nightly
> build (mvapich-1.1-2009-08-30.tar.gz), rebuild the RPM and voila, all my
> CPUs are completely utilized (see top output below).
>
> It is probably a good idea for you to test the version distributed with
> the latest RHEL 5.3 and try to reproduce the error. I guess RHEL users
> would be happy to know that they're getting problematic version.
>
> Cheers,
> emir
>
> top - 02:46:42 up 58 days, 21:19, 2 users, load average: 30.02, 13.39,
> 5.11
> Tasks: 509 total, 33 running, 476 sleeping, 0 stopped, 0 zombie
> Cpu(s): 99.9%us, 0.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 66072240k total, 10799684k used, 55272556k free, 337052k buffers
> Swap: 7999992k total, 0k used, 7999992k free, 7831652k cached
>
> P PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 0 18716 eimamagi 25 0 138m 79m 19m R 99.9 0.1 2:42.29 lu.C.16
> 26 18721 eimamagi 25 0 138m 65m 5312 R 99.9 0.1 2:42.40 lu.C.16
> 24 18723 eimamagi 25 0 138m 64m 4512 R 99.9 0.1 2:42.36 lu.C.16
> 2 18724 eimamagi 25 0 138m 64m 4744 R 99.9 0.1 2:42.40 lu.C.16
> 16 18726 eimamagi 25 0 138m 64m 5252 R 99.9 0.1 2:42.35 lu.C.16
> 1 18728 eimamagi 25 0 138m 63m 4164 R 99.9 0.1 2:42.42 lu.C.16
> 23 18729 eimamagi 25 0 138m 63m 4708 R 99.9 0.1 2:42.38 lu.C.16
> 6 18730 eimamagi 25 0 138m 63m 4696 R 99.9 0.1 2:42.41 lu.C.16
> 17 18753 eimamagi 25 0 138m 63m 3548 R 99.9 0.1 2:41.14 lu.C.16
> 19 18761 eimamagi 25 0 138m 64m 4700 R 99.9 0.1 2:41.41 lu.C.16
> 20 18762 eimamagi 25 0 138m 63m 4112 R 99.9 0.1 2:41.35 lu.C.16
> 15 18763 eimamagi 25 0 138m 63m 4632 R 99.9 0.1 2:41.43 lu.C.16
> 27 18764 eimamagi 25 0 138m 63m 4696 R 99.9 0.1 2:41.42 lu.C.16
> 3 18717 eimamagi 25 0 138m 64m 4560 R 99.6 0.1 2:42.41 lu.C.16
> 12 18718 eimamagi 25 0 138m 64m 4324 R 99.6 0.1 2:42.34 lu.C.16
> 21 18719 eimamagi 25 0 138m 63m 3652 R 99.6 0.1 2:42.39 lu.C.16
> 4 18720 eimamagi 25 0 138m 64m 4640 R 99.6 0.1 2:42.39 lu.C.16
> 18 18722 eimamagi 25 0 138m 64m 5152 R 99.6 0.1 2:42.40 lu.C.16
> 13 18727 eimamagi 25 0 138m 64m 4692 R 99.6 0.1 2:42.41 lu.C.16
> 28 18731 eimamagi 25 0 138m 63m 4136 R 99.6 0.1 2:42.35 lu.C.16
> 11 18750 eimamagi 25 0 138m 79m 19m R 99.6 0.1 2:41.40 lu.C.16
> 7 18751 eimamagi 25 0 138m 64m 4824 R 99.6 0.1 2:41.18 lu.C.16
> 22 18752 eimamagi 25 0 138m 64m 4532 R 99.6 0.1 2:41.41 lu.C.16
> 30 18754 eimamagi 25 0 138m 64m 4760 R 99.6 0.1 2:41.41 lu.C.16
> 5 18755 eimamagi 25 0 138m 65m 5312 R 99.6 0.1 2:41.42 lu.C.16
> 8 18757 eimamagi 25 0 138m 64m 4536 R 99.6 0.1 2:41.34 lu.C.16
> 14 18758 eimamagi 25 0 138m 64m 4676 R 99.6 0.1 2:41.41 lu.C.16
> 25 18759 eimamagi 25 0 138m 64m 5296 R 99.6 0.1 2:41.39 lu.C.16
> 10 18765 eimamagi 25 0 138m 62m 3784 R 99.6 0.1 2:41.27 lu.C.16
> 29 18725 eimamagi 25 0 138m 64m 5284 R 99.3 0.1 2:41.56 lu.C.16
> 9 18756 eimamagi 25 0 138m 65m 5288 R 99.3 0.1 2:40.30 lu.C.16
> 31 18760 eimamagi 25 0 138m 64m 5260 R 97.0 0.1 2:36.96 lu.C.16
>
>
More information about the mvapich-discuss
mailing list