[mvapich-discuss] Profiling timer expired

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed May 11 11:44:51 EDT 2011


This password prompt is because of ssh (mpirun_rsh uses this behind the scenes).

You should take a look at `man ssh-keygen' on how to create ssh keys.

I suggest taking the following steps:
    ssh-keygen -tdsa
    <press enter at each prompt>
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

This should setup passwordless ssh for you.  You can test it out by
simply trying to ssh between the nodes that you have allocated with
salloc.

On Wed, May 11, 2011 at 11:23 AM, Jim Galarowicz <jeg at krellinst.org> wrote:
>
> Hi Sayantan,
>
> I'm using mpi/mvapich2-1.4.1_oobpr_intel-11.1-f064-c064 at Sandia and a
> similar version at LLNL and seeing the same messages about the profiling
> timer.
>
> I'm having trouble getting mpirun_rsh to work.   I must be doing something
> wrong.  I haven't run in this mode before.
> This is the command I'm using:
>
>  /apps/x86_64/mpi/mvapich2/intel-11.1-f064-c064/mvapich2-1.4.1_oobpr/bin/mpirun_rsh
>  -np 32  -hostfile hostfile ./aleph-redsky-opt ./test1.in\
> Where hostfile: cat hostfile
> rs264.sandia.gov
> rs265.sandia.gov
> rs266.sandia.gov
> rs267.sandia.gov
>
> I was getting asked for my password for each node and when I entered it, it
> was rejected.
>
> Now I'm just seeing the cursor returned when I enter the mpirun_rsh command?
>
>
> I used salloc to alloc 4 nodes at 8 procs each.
>
> Sorry for not knowing how to run mpirun_rsh... Any suggestions?    I also
> haven't used hydra either.
>
> Thanks,
> Jim G
>
>
>
> On 05/11/2011 09:36 AM, Sayantan Sur wrote:
>>
>> Hi Jim,
>>
>> Thanks for reporting this issue. May I ask which exact version of
>> MVAPICH2 have you been using? I see that you launched your job using
>> SLURM. Did you try launching with mpirun_rsh or hydra? If you did, did
>> you see the same problem with mpirun_rsh, it might help us narrow down
>> the problem. You can 'salloc' the nodes and then use mpirun_rsh
>> directly.
>>
>> Thanks for your help.
>>
>> On Wed, May 11, 2011 at 9:09 AM, Jim Galarowicz<jeg at krellinst.org>  wrote:
>>>
>>> Hi,
>>>
>>> I'm a developer on the Open|SpeedShop performance tool project
>>> (www.openspeedshop.org).  I've recently (maybe tied to recent versions of
>>> mvapich2) started seeing the error message:
>>>      srun: error: rs265: tasks 0-1: Profiling timer expired
>>> when running any Open|SpeedShop experiments that use the SIGPROF signal
>>> in
>>> conjunction with setitimer(ITIMER_PROF,...) to periodically interrupt the
>>> application to take program counter, call tree samples, and read papi
>>> event
>>> counters.   I'm trying to figure out if mvapich2 is also using these
>>> mechanisms and we are in conflict.
>>>
>>> Does mvapich2 use these timer mechanisms?    Otherwise, maybe slurm is
>>> using
>>> the timer mechanisms?
>>>
>>> Thanks,
>>> Jim G
>>>
>>>
>>> Here is an example of what I'm seeing when running our pcsamp (program
>>> counter experiment):
>>>
>>>
>>> osspcsamp "srun -n 8 --ntasks-per-node 8 /usr/bin/numa_wrapper -ppn 8
>>> ./application-redsky-opt ./test1.in"
>>> [openss]: pcsamp experiment using the pcsamp experiment default sampling
>>> rate: "100".
>>> [openss]: Using OPENSS_PREFIX installed in
>>> /projects/OSS/openspeedshop-2.0.1_beta2
>>> [openss]: Setting up offline raw data directory in ./offline-oss
>>> [openss]: Running offline pcsamp experiment using the command:
>>> "/apps/slurm/wrapper/srun -n 8 --ntasks-per-node 8 /usr/bin/numa_wrapper
>>> -ppn 8 /projects/OSS/openspeedshop-2.0.1_beta2/bin/ossrun -c pcsamp
>>> "./application-redsky-opt ./test1.in""
>>>
>>> srun: error: rs265: tasks 0-1: Profiling timer expired
>>> srun: First task exited 30s ago
>>> srun: tasks 2-7: running
>>> srun: tasks 0-1: exited abnormally
>>> srun: Terminating job step 4249909.15
>>> slurmd[rs265]: *** STEP 4249909.15 KILLED AT 2011-05-10T22:22:42 WITH
>>> SIGNAL
>>> 9 ***
>>>
>>> [openss]: Converting raw data from ./offline-oss into temp file
>>> X.0.openss
>>>
>>> Processing raw data for application
>>> Processing processes and threads ...
>>> Processing performance data ...
>>> Processing functions and statements ...
>>>
>>> [openss]: Restoring and displaying default view for:
>>>
>>>  /home/jgalaro/demos/application/application-redsky-opt-pcsamp-10.openss
>>> [openss]: The restored experiment identifier is:  -x 1
>>> No performance measurements were made for the experiment.
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list