[OOD-users] no environment set for HPC desktop -- job fails

Michael Coleman mcolema5 at uoregon.edu
Fri Dec 7 23:24:48 EST 2018


If you read the man page very, very carefully, I think the new behavior actually matches what it always said.  The old behavior didn't actually match.  But I definitely agree that it's a pretty significant change.

Michael Coleman (mcolema5 at uoregon.edu), Computational Scientist
Research Advanced Computing Services
6235 University of Oregon
Eugene, OR 97403


________________________________________
From: John-Paul Robinson <jprorama at gmail.com>
Sent: Friday, December 7, 2018 17:19
To: Michael Coleman
Cc: User support mailing list for Open OnDemand
Subject: Re: [OOD-users] no environment set for HPC desktop -- job fails

I’m also scratching my head because documentation (even for the 18 release) still says the default of export is to propagate all environment variables.  :/

> On Dec 7, 2018, at 6:59 PM, John-Paul Robinson <jprorama at gmail.com> wrote:
>
> Its a pretty alarming change to have happened in three ticks of the minor release number.
>
> Any insights from others on this?
>
>> On Dec 7, 2018, at 6:37 PM, Michael Coleman <mcolema5 at uoregon.edu> wrote:
>>
>> Hi John-Paul,
>>
>> As you say, I believe the key event was the transition in SLURM versions.  They apparently made a change in the behavior of export of environment variables from the submitting environment to the job environment (the stuff --export controls).  There was much wailing of our users here when their sbatch scripts broke as a result.  Generally, the "fix" was simply for users that were already using the --export flag to add the "ALL" keyword to that list, which seemed to restore the old behavior.
>>
>> Ultimately, OOD is calling 'sbatch' to create jobs, and this change affects the environment those jobs see.  At least in our environment, the --export=ALL flag seems to cure OOD issues.  There are probably other ways to change things, but this seemed the simplest.
>>
>> Good luck,
>> Mike
>>
>>
>> -----Original Message-----
>> From: John-Paul Robinson <jprorama at gmail.com>
>> Sent: Friday, December 7, 2018 03:56 PM
>> To: Michael Coleman <mcolema5 at uoregon.edu>; User support mailing list for Open OnDemand <ood-users at lists.osc.edu>
>> Subject: Re: [OOD-users] no environment set for HPC desktop -- job fails
>>
>> MIke,
>>
>> Thanks for pointing us to this issue.
>>
>> This does appear to be similar to what's happening in our dev
>> environment.  (Note our still-working prod environment is Bright CM with
>> slurm 17.02.2).
>>
>> The odd thing with our dev environment (built on OpenHPC) is that it was
>> working in October and only started failing in builds over the past
>> month.  This appears to coincide with the OpenHPC 1.3.5 to 1.3.6 update
>> (going from slurm 17.11.7 to 17.11.10).
>>
>> We've had some success in restoring the original working configuration
>> in one of our test stacks by reverting to the OpenHPC 1.3.5 release.
>>
>> What's odd is this implies the problem is not with OOD but in the
>> OpenHPC system env.  As far as we can determine, our OOD remains
>> identical.  We are setting up dev in vagrant with ansible provisioning
>> the openhpc + ood cluster based on the CRI_XSEDE work extended to add an
>> OOD node via vagrant + ansible (not as a warewulf provision).
>>
>> https://github.com/jprorama/CRI_XCBC
>>
>> I've read through the github issue below but haven't teased out all the
>> details.
>>
>> Is there an obvious transition point where this export behavior could be
>> impacted by the underlying system versions OOD is running on?
>>
>> We'll contribute insights on the github issue as we find them.
>>
>> Thanks,
>>
>> John-Paul
>>
>>
>>> On 12/5/18 5:15 PM, Michael Coleman wrote:
>>> Hi John-Paul,
>>>
>>> We worked through something similar.  You might find some useful hints on this ticket.
>>>
>>>    https://github.com/OSC/ood_core/issues/109
>>>
>>> Cheers,
>>> Mike
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: OOD-users <ood-users-bounces+mcolema5=uoregon.edu at lists.osc.edu> On Behalf Of John-Paul Robinson via OOD-users
>>> Sent: Wednesday, December 5, 2018 02:49 PM
>>> To: ood-users at lists.osc.edu
>>> Subject: [OOD-users] no environment set for HPC desktop -- job fails
>>>
>>> In our dev environment (slurm with ohpc) we have started to see this
>>> error when trying to launch interactive desktops:
>>>
>>> /tmp/slurmd/job00079/slurm_script: line 3: module: command not found
>>> Setting VNC password...
>>> Error: no HOME environment variable
>>> Starting VNC server...
>>> vncserver: The HOME environment variable is not set.
>>> vncserver: The HOME environment variable is not set.
>>> vncserver: The HOME environment variable is not set. vncserver: The HOME
>>> environment variable is
>>>
>>>
>>> As we understand it, the PUN nginx worker launches the batch job that
>>> starts the desktop batch job.
>>>
>>> The problem seems to be that the environment for the job is empty, hence
>>> no module function or HOME env or anything else.   We checked the env of
>>> the users nginx worker under /proc and it is completely empty.   Because
>>> our job env is inherited from the caller (the nginx worker in this case)
>>> the attempt to run the module command and vncserver commands naturally fail.
>>>
>>> When we launch an interactive terminal, it runs just fine, but I'm
>>> guessing that's because the interactive session actually reads the
>>> normal shell startup and builds its environment, even if it happened to
>>> be missing in the proxy.
>>>
>>> Do you have any pointers on what could cause this situation.   We
>>> noticed it after we started adding additional interactive apps but don't
>>> have a clear time point.  It was working fine originally and still
>>> functions fine in our prod env (without any of the additional
>>> interactive apps).
>>>
>>> Thanks,
>>>
>>> John-Paul
>>>
>>> _______________________________________________
>>> OOD-users mailing list
>>> OOD-users at lists.osc.edu
>>> https://lists.osu.edu/mailman/listinfo/ood-users
>>


More information about the OOD-users mailing list