[OOD-users] Looking for help applications failing to launch

Gabriel Stenger gabeks at umich.edu
Thu Jul 26 15:34:24 EDT 2018


Eric,

Sorry about taking awhile to respond.

After some rethinking and trying a few other things I have a better
understanding of what my problem is and I believe that my problem is
something specifically within the xstartup process ondemand does.

The problem I am having affects multiple programs and doesn't happen
when I start my own vncserver and connect to run the program. I have
copied the OOD xvncserver command and used it as a model, and it seems
that using the `-noxstartup`  flag induces the problem.

The default command

`vncserver -log "vnc.log" -rfbauth "vnc.passwd" -nohttpd -noxstartup
-geometry 1344x840 -idletimeout 0 2>&1`

can't be run directly from the command line, as the desktop, window
manager, and terminal are never started.  When OOD uses that command,
how are those command passed to the running Xvnc?

Where is OOD's equivalent to the xstartup script kept?  Perhaps
comparing that to our own, working xstartup will reveal the cause?  Is
there a way to use that with the command line startup?

Thanks for any pointers you have,
Gabe


On Tue, Jul 24, 2018 at 2:01 PM, Franz, Eric <efranz at osc.edu> wrote:

> Gabe,
>
> Have you had any success in debugging the problem you are experiencing?
>
> I asked internally about similar errors with launching MATLAB and this was
> the response I got:
>
> >we had something might be related.
> >So, the issue was that matlab confused between system libstdc++.so and
> build-in libstdc++.so.
> >Matlab comes with its own libstdc++.so, it is under
> {matlab}/sys/os/glnxa64.
> >
> >The accepted answer of https://www.mathworks.com/
> matlabcentral/answers/329796-issue-with-libstdc-so-6 was useful.
> >
> >But, this is a matlab specific issue
>
> So that may or may not be helpful.
>
> Thanks,
> Eric
>
> ---
> Eric Franz, Senior Web & Interface App Engineer
> Ohio Supercomputer Center
> An Ohio Technology Consortium (OH-TECH) Member
> 1224 Kinnear Road
> Columbus, OH 43212
> email: efranz at osc.edu
>
> On 7/18/18, 4:31 PM, "Franz, Eric" <efranz at osc.edu> wrote:
>
>     Gabe Stenger,
>
>     When the per user nginx process is started via sudo the user’s
> environment is wiped. So perhaps when you go to submit the job manually
> that you are using to start turbovnc there is something in your environment
> via the login shell that is affecting the way the job is submitted. That
> said I thought I remembered the solution was to specify to use the login
> shell when submitting the jobs, so I thought we were handling this case.
>
>     Below is a summary of the debug information available. With it I hope
> you will be able to determine what OnDemand is doing differently from when
> you run things manually. Of course, after determining that, the next
> question is how do you change your OnDemand configuration to fix it. There
> are many options and we can address that when you get there.
>
>     FWIW the vncserver command that is executed when we run this at OSC is:
>
>         vncserver -log "vnc.log" -rfbauth "vnc.passwd" -nohttpd
> -noxstartup -geometry 1344x840 -idletimeout 0  2>&1
>
>     which is different than
>
>         vncserver :99 -depth 24 -geometry 1400x1024 -name TestAccount
> -NeverShared -localhost
>
>     I can also forward this internally to see if anyone else here has seen
> something like this. Let me know if this is helpful or what questions you
> have.
>
>     Thanks,
>     Eric
>
>
>     Summary of debug info:
>
>     As the user who submitted the interactive app session you can see
> details on the job submission by looking at the job directory. The path
> will look something like this:
>
>         ~/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/
> owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b
>
>     A very long path, explained:
>
>     * ~/ondemand/data/sys/dashboard – the namespaced dataroot of the
> dashboard app, which was used to submit the interactive app
>     * batch_connect/sys/bc_desktop/owens – the namespaced dataroot of the
> interactive app that the dashboard manages
>     * output/d7b6bc22-de96-437c-93cf-d5c1c75d786b – the actual job
> directory.
>
>     In there you will find a file job_script_contents.sh, that contains a
> copy of the job script string submitted using sbatch, qsub, bsub, etc.
> depending on the resource manager you are using and job_script_options.json
> contains a JSON representation of the Hash used as arguments to the adapter
> class that submits the job. This ends up using execve and that actual
> command is logged to the users per-user-nginx log file
> /var/log/nginx/USER/error.log. For example, my username efranz and my
> logfile is /var/log/nginx/efranz/error.log and the logfile has a line
> like this:
>
>     App 21770 stdout: [2018-07-18 15:57:12 -0400 ]  INFO "execve =
> [{\"PBS_DEFAULT\"=>\"owens-batch.ten.osc.edu\",
> \"LD_LIBRARY_PATH\"=>\"/opt/torque/lib64:/opt/rh/v8314/root/usr/l
>     ib64:/opt/rh/nodejs010/root/usr/lib64:/opt/rh/rh-ruby22/
> root/usr/lib64:/opt/rh/rh-passenger40/root/usr/lib64\"},
> \"/opt/torque/bin/qsub\", \"-d\", \"/users/PZS0562/efranz/ondem
>     and/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/
> output/d7b6bc22-de96-437c-93cf-d5c1c75d786b\", \"-N\",
> \"ondemand/sys/dashboard/sys/bc_desktop/owens\", \"-S\", \"/bin
>     /bash\", \"-o\", \"/users/PZS0562/efranz/ondemand/data/sys/dashboard/
> batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-
> 437c-93cf-d5c1c75d786b/output.log\", \"-j\", \"oe\"
>     , \"-A\", \"PZS0002\", \"-l\", \"walltime=01:00:00\", \"-l\",
> \"nodes=1:ppn=28\", \"/tmp/qsub.20180718-21770-1o0mgjv\"]"
>
>     That shows the system command that was run:
>
>     PBS_DEFAULT=owens-batch.ten.osc.edu LD_LIBRARY_PATH=/opt/torque/
> lib64:/opt/rh/v8314/root/usr/lib64:/opt/rh/nodejs010/root/
> usr/lib64:/opt/rh/rh-ruby22/root/usr/lib64:/opt/rh/rh-passenger40/root/usr/lib64
> /opt/torque/bin/qsub -d /users/PZS0562/efranz/ondemand/data/sys/dashboard/
> batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b
> -N " ondemand/sys/dashboard/sys/bc_desktop/owens" -S /bin/bash … … …
>
>     The Torque adapter actually writes the script to a file in tmp and
> submits that (/tmp/qsub.20180718-21770-1o0mgjv above) but the other
> adapters are smarter and just write the contents of the script directly to
> STDIN of sbatch or bsub.
>
>     The job directory also contains the other scripts the main job script
> executes, including a script.sh and a desktops directory with a script for
> each supported desktop type (gnome.sh, mate.sh, and xfce.sh).
>
>     So with all of this you should hopefully have access to all the
> information you need to at least determine what OnDemand is doing
> differently:
>
>     * the command run to submit the job
>     * the job script contents
>     * the scripts the job script executes
>
>     ---
>     Eric Franz, Senior Web & Interface App Engineer
>     Ohio Supercomputer Center
>     An Ohio Technology Consortium (OH-TECH) Member
>     1224 Kinnear Road
>     Columbus, OH 43212
>     email: efranz at osc.edu
>
>     From: OOD-users <ood-users-bounces+efranz=osc.edu at lists.osc.edu> on
> behalf of Gabriel Stenger <gabeks at umich.edu>
>     Reply-To: User support mailing list for Open OnDemand <
> ood-users at lists.osc.edu>
>     Date: Tuesday, July 17, 2018 at 10:51 AM
>     To: "ood-users at lists.osc.edu" <ood-users at lists.osc.edu>
>     Subject: [OOD-users] Looking for help applications failing to launch
>
>     We are trying to get Open OnDemand (OOD) set up and ran into what
>     appears to be a problem with the VNC configuration that OOD uses.
>
>     We have a test cluster set up.  If we start TurboVNC manually using
>     the following command
>
>         vncserver :99 -depth 24 -geometry 1400x1024 -name TestAccount \
>             -NeverShared -localhost
>
>     The VNC server seems to start fine, we can connect to it using RealVNC
>     Viewer via an ssh tunnel to both our login and to a compute node, and
>     we can run both MATLAB R2017b and Firefox without issue.
>
>     If we start a remote interactive desktop job using OOD and connect to
>     it using the OOD NoVNC viewer, we get a desktop fine, 'vanilla' X
>     applications seem to run OK, but both MATLAB and Firefox segfault at
>     startup.  In the case of MATLAB, it also segfaults even when starting
>     with no graphical component, i.e.,
>
>     $ matlab -nojvm -nodisplay
>
>     The last few lines of the trace are
>
>     [ 23] 0x00002b41730f3c06
>     bin/glnxa64/libmwmcr.so+00486406
>     [ 24] 0x00002b415fa7fe25
>     /lib64/libpthread.so.0+00032293
>     [ 25] 0x00002b415fd8c34d
>     /lib64/libc.so.6+01016653 clone+00000109
>     [ 26] 0x0000000000000000
>     <unknown-module>+00000000
>
>     Has anyone else had this problem and know what causes it?
>
>     Because we can start TurboVNC manually and run it, that seems to point
>     to something in the environment being created by the OOD session
>     manager being the cause.  I might have thought it was solely in the
>     X/Xvnc setup until we saw that it also segfaults when the jvm and
>     display are supressed.
>
>     Most of what I found about seg faults with MATLAB have to do with a
>     mismatched libstdc++.so.6 version, but since MATLAB runs fine outside
>     of OOD, I think that is not the problem.
>
>     Suggestions for how we might be able to trace the cause?
>
>     Please let us know if there is additional debugging information we can
> provide.
>
>     Thanks,    -- Gabe Stenger
>
>
>
>
> _______________________________________________
> OOD-users mailing list
> OOD-users at lists.osc.edu
> https://lists.osu.edu/mailman/listinfo/ood-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/ood-users/attachments/20180726/45d2918f/attachment-0001.html>


More information about the OOD-users mailing list