[mvapich-discuss] MVAPICH does not share MPIR_proctable[] common string space

John DelSignore John.DelSignore at roguewave.com
Mon Apr 29 10:53:06 EDT 2013


Hi,

MVAPICH does not share MPIR_proctable[] common string space. I thought it was worth mentioning because it can certainly affect tool scalability.

The the MPIR spec (see section 9.2 on page 16 here: <http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf>) says: "The MPI implementation should share the host and executable name character strings across multiple process descriptor entries whenever possible. For example, if all of the MPI processes are executing “/path/a.out”, then the executable name field in each process descriptor should point to the same null-terminated character string. Sharing the strings enhances the tools scalability by allowing it to cache data from the starter process and avoid reading redundant character strings."

I ran a small job on LLNL's sierra under TotalView with MPIR_proctable debug logging turned on, and it showed the following output (I snipped out four of the entries to save space):

[00:00:03.938480] mpir_proctable_t::create: extracting hostname/execname/pids for 8 processes
[00:00:03.938523] mpir_proctable_t::create: MPIR_proctable[0]: host_name(0x2aaab8000dc8)="sierra528", executable_name(0x04352d98)="/g/g0/jdelsign/tw249_tests.sierra/IRS/run.2013-04-29-06:33:33.1x12x0L2P8.testing.nut.sierra1620/../IRS/bld/codes_debug/irs", pid=26009
[00:00:03.938615] mpir_proctable_t::create: MPIR_proctable[1]: host_name(0x2aaab8000de8)="sierra528", executable_name(0x04352ae8)="/g/g0/jdelsign/tw249_tests.sierra/IRS/run.2013-04-29-06:33:33.1x12x0L2P8.testing.nut.sierra1620/../IRS/bld/codes_debug/irs", pid=26010
[...snip...]
[00:00:03.938880] mpir_proctable_t::create: MPIR_proctable[6]: host_name(0x2aaab80009b8)="sierra528", executable_name(0x0411b668)="/g/g0/jdelsign/tw249_tests.sierra/IRS/run.2013-04-29-06:33:33.1x12x0L2P8.testing.nut.sierra1620/../IRS/bld/codes_debug/irs", pid=26015
[00:00:03.938899] mpir_proctable_t::create: MPIR_proctable[7]: host_name(0x2aaab80009d8)="sierra528", executable_name(0x0411b6f8)="/g/g0/jdelsign/tw249_tests.sierra/IRS/run.2013-04-29-06:33:33.1x12x0L2P8.testing.nut.sierra1620/../IRS/bld/codes_debug/irs", pid=26016

The hex numbers in parens after "host_name" and "executable_name" shows the *pointer* to the strings; the string values follow. Notice that *none* of the pointers are equal, even though the strings are equal. So, it appears to me that MVAPICH is not following the advice given in the MPIR spec.

Note that MPICH logged this issue as: https://trac.mpich.org/projects/mpich/ticket/1821

Cheers, John D.


More information about the mvapich-discuss mailing list