[OOD-users] Looking for help applications failing to launch

Franz, Eric efranz at osc.edu
Mon Jul 30 10:16:09 EDT 2018


Gabe,

Just to let you know I’m looking into it and will respond early this week.

Thanks,
Eric

---
Eric Franz, Senior Web & Interface App Engineer
Ohio Supercomputer Center
An Ohio Technology Consortium (OH-TECH) Member
1224 Kinnear Road
Columbus, OH 43212
email: efranz at osc.edu

From: OOD-users <ood-users-bounces+efranz=osc.edu at lists.osc.edu> on behalf of Gabriel Stenger <gabeks at umich.edu>
Reply-To: User support mailing list for Open OnDemand <ood-users at lists.osc.edu>
Date: Thursday, July 26, 2018 at 3:34 PM
To: User support mailing list for Open OnDemand <ood-users at lists.osc.edu>
Subject: Re: [OOD-users] Looking for help applications failing to launch

Eric,

Sorry about taking awhile to respond.

After some rethinking and trying a few other things I have a better
understanding of what my problem is and I believe that my problem is
something specifically within the xstartup process ondemand does.

The problem I am having affects multiple programs and doesn't happen
when I start my own vncserver and connect to run the program. I have
copied the OOD xvncserver command and used it as a model, and it seems
that using the `-noxstartup`  flag induces the problem.

The default command

`vncserver -log "vnc.log" -rfbauth "vnc.passwd" -nohttpd -noxstartup
-geometry 1344x840 -idletimeout 0 2>&1`

can't be run directly from the command line, as the desktop, window
manager, and terminal are never started.  When OOD uses that command,
how are those command passed to the running Xvnc?

Where is OOD's equivalent to the xstartup script kept?  Perhaps
comparing that to our own, working xstartup will reveal the cause?  Is
there a way to use that with the command line startup?

Thanks for any pointers you have,
Gabe


On Tue, Jul 24, 2018 at 2:01 PM, Franz, Eric <efranz at osc.edu<mailto:efranz at osc.edu>> wrote:
Gabe,

Have you had any success in debugging the problem you are experiencing?

I asked internally about similar errors with launching MATLAB and this was the response I got:

>we had something might be related.
>So, the issue was that matlab confused between system libstdc++.so and build-in libstdc++.so.
>Matlab comes with its own libstdc++.so, it is under {matlab}/sys/os/glnxa64.
>
>The accepted answer of https://www.mathworks.com/matlabcentral/answers/329796-issue-with-libstdc-so-6 was useful.
>
>But, this is a matlab specific issue

So that may or may not be helpful.

Thanks,
Eric

---
Eric Franz, Senior Web & Interface App Engineer
Ohio Supercomputer Center
An Ohio Technology Consortium (OH-TECH) Member
1224 Kinnear Road
Columbus, OH 43212
email: efranz at osc.edu<mailto:efranz at osc.edu>
On 7/18/18, 4:31 PM, "Franz, Eric" <efranz at osc.edu<mailto:efranz at osc.edu>> wrote:

    Gabe Stenger,

    When the per user nginx process is started via sudo the user’s environment is wiped. So perhaps when you go to submit the job manually that you are using to start turbovnc there is something in your environment via the login shell that is affecting the way the job is submitted. That said I thought I remembered the solution was to specify to use the login shell when submitting the jobs, so I thought we were handling this case.

    Below is a summary of the debug information available. With it I hope you will be able to determine what OnDemand is doing differently from when you run things manually. Of course, after determining that, the next question is how do you change your OnDemand configuration to fix it. There are many options and we can address that when you get there.

    FWIW the vncserver command that is executed when we run this at OSC is:

        vncserver -log "vnc.log" -rfbauth "vnc.passwd" -nohttpd -noxstartup -geometry 1344x840 -idletimeout 0  2>&1

    which is different than

        vncserver :99 -depth 24 -geometry 1400x1024 -name TestAccount -NeverShared -localhost

    I can also forward this internally to see if anyone else here has seen something like this. Let me know if this is helpful or what questions you have.

    Thanks,
    Eric


    Summary of debug info:

    As the user who submitted the interactive app session you can see details on the job submission by looking at the job directory. The path will look something like this:

        ~/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b

    A very long path, explained:

    * ~/ondemand/data/sys/dashboard – the namespaced dataroot of the dashboard app, which was used to submit the interactive app
    * batch_connect/sys/bc_desktop/owens – the namespaced dataroot of the interactive app that the dashboard manages
    * output/d7b6bc22-de96-437c-93cf-d5c1c75d786b – the actual job directory.

    In there you will find a file job_script_contents.sh, that contains a copy of the job script string submitted using sbatch, qsub, bsub, etc. depending on the resource manager you are using and job_script_options.json contains a JSON representation of the Hash used as arguments to the adapter class that submits the job. This ends up using execve and that actual command is logged to the users per-user-nginx log file /var/log/nginx/USER/error.log. For example, my username efranz and my logfile is /var/log/nginx/efranz/error.log and the logfile has a line like this:

    App 21770 stdout: [2018-07-18 15:57:12 -0400 ]  INFO "execve = [{\"PBS_DEFAULT\"=>\"owens-batch.ten.osc.edu<http://owens-batch.ten.osc.edu>\", \"LD_LIBRARY_PATH\"=>\"/opt/torque/lib64:/opt/rh/v8314/root/usr/l
    ib64:/opt/rh/nodejs010/root/usr/lib64:/opt/rh/rh-ruby22/root/usr/lib64:/opt/rh/rh-passenger40/root/usr/lib64\"}, \"/opt/torque/bin/qsub\", \"-d\", \"/users/PZS0562/efranz/ondem
    and/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b\", \"-N\", \"ondemand/sys/dashboard/sys/bc_desktop/owens\", \"-S\", \"/bin
    /bash\", \"-o\", \"/users/PZS0562/efranz/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b/output.log\", \"-j\", \"oe\"
    , \"-A\", \"PZS0002\", \"-l\", \"walltime=01:00:00\", \"-l\", \"nodes=1:ppn=28\", \"/tmp/qsub.20180718-21770-1o0mgjv\"]"

    That shows the system command that was run:

    PBS_DEFAULT=owens-batch.ten.osc.edu<http://owens-batch.ten.osc.edu> LD_LIBRARY_PATH=/opt/torque/lib64:/opt/rh/v8314/root/usr/lib64:/opt/rh/nodejs010/root/usr/lib64:/opt/rh/rh-ruby22/root/usr/lib64:/opt/rh/rh-passenger40/root/usr/lib64 /opt/torque/bin/qsub -d /users/PZS0562/efranz/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/output/d7b6bc22-de96-437c-93cf-d5c1c75d786b -N " ondemand/sys/dashboard/sys/bc_desktop/owens" -S /bin/bash … … …

    The Torque adapter actually writes the script to a file in tmp and submits that (/tmp/qsub.20180718-21770-1o0mgjv above) but the other adapters are smarter and just write the contents of the script directly to STDIN of sbatch or bsub.

    The job directory also contains the other scripts the main job script executes, including a script.sh and a desktops directory with a script for each supported desktop type (gnome.sh, mate.sh, and xfce.sh).

    So with all of this you should hopefully have access to all the information you need to at least determine what OnDemand is doing differently:

    * the command run to submit the job
    * the job script contents
    * the scripts the job script executes

    ---
    Eric Franz, Senior Web & Interface App Engineer
    Ohio Supercomputer Center
    An Ohio Technology Consortium (OH-TECH) Member
    1224 Kinnear Road
    Columbus, OH 43212
    email: efranz at osc.edu<mailto:efranz at osc.edu>

    From: OOD-users <ood-users-bounces+efranz=osc.edu at lists.osc.edu<mailto:osc.edu at lists.osc.edu>> on behalf of Gabriel Stenger <gabeks at umich.edu<mailto:gabeks at umich.edu>>
    Reply-To: User support mailing list for Open OnDemand <ood-users at lists.osc.edu<mailto:ood-users at lists.osc.edu>>
    Date: Tuesday, July 17, 2018 at 10:51 AM
    To: "ood-users at lists.osc.edu<mailto:ood-users at lists.osc.edu>" <ood-users at lists.osc.edu<mailto:ood-users at lists.osc.edu>>
    Subject: [OOD-users] Looking for help applications failing to launch

    We are trying to get Open OnDemand (OOD) set up and ran into what
    appears to be a problem with the VNC configuration that OOD uses.

    We have a test cluster set up.  If we start TurboVNC manually using
    the following command

        vncserver :99 -depth 24 -geometry 1400x1024 -name TestAccount \
            -NeverShared -localhost

    The VNC server seems to start fine, we can connect to it using RealVNC
    Viewer via an ssh tunnel to both our login and to a compute node, and
    we can run both MATLAB R2017b and Firefox without issue.

    If we start a remote interactive desktop job using OOD and connect to
    it using the OOD NoVNC viewer, we get a desktop fine, 'vanilla' X
    applications seem to run OK, but both MATLAB and Firefox segfault at
    startup.  In the case of MATLAB, it also segfaults even when starting
    with no graphical component, i.e.,

    $ matlab -nojvm -nodisplay

    The last few lines of the trace are

    [ 23] 0x00002b41730f3c06
    bin/glnxa64/libmwmcr.so+00486406
    [ 24] 0x00002b415fa7fe25
    /lib64/libpthread.so.0+00032293
    [ 25] 0x00002b415fd8c34d
    /lib64/libc.so.6+01016653 clone+00000109
    [ 26] 0x0000000000000000
    <unknown-module>+00000000

    Has anyone else had this problem and know what causes it?

    Because we can start TurboVNC manually and run it, that seems to point
    to something in the environment being created by the OOD session
    manager being the cause.  I might have thought it was solely in the
    X/Xvnc setup until we saw that it also segfaults when the jvm and
    display are supressed.

    Most of what I found about seg faults with MATLAB have to do with a
    mismatched libstdc++.so.6 version, but since MATLAB runs fine outside
    of OOD, I think that is not the problem.

    Suggestions for how we might be able to trace the cause?

    Please let us know if there is additional debugging information we can provide.

    Thanks,    -- Gabe Stenger




_______________________________________________
OOD-users mailing list
OOD-users at lists.osc.edu<mailto:OOD-users at lists.osc.edu>
https://lists.osu.edu/mailman/listinfo/ood-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/ood-users/attachments/20180730/e986b710/attachment-0001.html>


More information about the OOD-users mailing list