[mvapich-discuss] Profiling timer expired

Sayantan Sur surs at cse.ohio-state.edu
Wed May 11 10:36:39 EDT 2011


Hi Jim,

Thanks for reporting this issue. May I ask which exact version of
MVAPICH2 have you been using? I see that you launched your job using
SLURM. Did you try launching with mpirun_rsh or hydra? If you did, did
you see the same problem with mpirun_rsh, it might help us narrow down
the problem. You can 'salloc' the nodes and then use mpirun_rsh
directly.

Thanks for your help.

On Wed, May 11, 2011 at 9:09 AM, Jim Galarowicz <jeg at krellinst.org> wrote:
>
> Hi,
>
> I'm a developer on the Open|SpeedShop performance tool project
> (www.openspeedshop.org).  I've recently (maybe tied to recent versions of
> mvapich2) started seeing the error message:
>      srun: error: rs265: tasks 0-1: Profiling timer expired
> when running any Open|SpeedShop experiments that use the SIGPROF signal in
> conjunction with setitimer(ITIMER_PROF,...) to periodically interrupt the
> application to take program counter, call tree samples, and read papi event
> counters.   I'm trying to figure out if mvapich2 is also using these
> mechanisms and we are in conflict.
>
> Does mvapich2 use these timer mechanisms?    Otherwise, maybe slurm is using
> the timer mechanisms?
>
> Thanks,
> Jim G
>
>
> Here is an example of what I'm seeing when running our pcsamp (program
> counter experiment):
>
>
> osspcsamp "srun -n 8 --ntasks-per-node 8 /usr/bin/numa_wrapper -ppn 8
> ./application-redsky-opt ./test1.in"
> [openss]: pcsamp experiment using the pcsamp experiment default sampling
> rate: "100".
> [openss]: Using OPENSS_PREFIX installed in
> /projects/OSS/openspeedshop-2.0.1_beta2
> [openss]: Setting up offline raw data directory in ./offline-oss
> [openss]: Running offline pcsamp experiment using the command:
> "/apps/slurm/wrapper/srun -n 8 --ntasks-per-node 8 /usr/bin/numa_wrapper
> -ppn 8 /projects/OSS/openspeedshop-2.0.1_beta2/bin/ossrun -c pcsamp
> "./application-redsky-opt ./test1.in" "
>
> srun: error: rs265: tasks 0-1: Profiling timer expired
> srun: First task exited 30s ago
> srun: tasks 2-7: running
> srun: tasks 0-1: exited abnormally
> srun: Terminating job step 4249909.15
> slurmd[rs265]: *** STEP 4249909.15 KILLED AT 2011-05-10T22:22:42 WITH SIGNAL
> 9 ***
>
> [openss]: Converting raw data from ./offline-oss into temp file X.0.openss
>
> Processing raw data for application
> Processing processes and threads ...
> Processing performance data ...
> Processing functions and statements ...
>
> [openss]: Restoring and displaying default view for:
>    /home/jgalaro/demos/application/application-redsky-opt-pcsamp-10.openss
> [openss]: The restored experiment identifier is:  -x 1
> No performance measurements were made for the experiment.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list