[OOD-users] no environment set for HPC desktop -- job fails
John-Paul Robinson
jprorama at gmail.com
Fri Dec 7 18:56:20 EST 2018
MIke,
Thanks for pointing us to this issue.
This does appear to be similar to what's happening in our dev
environment. (Note our still-working prod environment is Bright CM with
slurm 17.02.2).
The odd thing with our dev environment (built on OpenHPC) is that it was
working in October and only started failing in builds over the past
month. This appears to coincide with the OpenHPC 1.3.5 to 1.3.6 update
(going from slurm 17.11.7 to 17.11.10).
We've had some success in restoring the original working configuration
in one of our test stacks by reverting to the OpenHPC 1.3.5 release.
What's odd is this implies the problem is not with OOD but in the
OpenHPC system env. As far as we can determine, our OOD remains
identical. We are setting up dev in vagrant with ansible provisioning
the openhpc + ood cluster based on the CRI_XSEDE work extended to add an
OOD node via vagrant + ansible (not as a warewulf provision).
https://github.com/jprorama/CRI_XCBC
I've read through the github issue below but haven't teased out all the
details.
Is there an obvious transition point where this export behavior could be
impacted by the underlying system versions OOD is running on?
We'll contribute insights on the github issue as we find them.
Thanks,
John-Paul
On 12/5/18 5:15 PM, Michael Coleman wrote:
> Hi John-Paul,
>
> We worked through something similar. You might find some useful hints on this ticket.
>
> https://github.com/OSC/ood_core/issues/109
>
> Cheers,
> Mike
>
>
>
>
> -----Original Message-----
> From: OOD-users <ood-users-bounces+mcolema5=uoregon.edu at lists.osc.edu> On Behalf Of John-Paul Robinson via OOD-users
> Sent: Wednesday, December 5, 2018 02:49 PM
> To: ood-users at lists.osc.edu
> Subject: [OOD-users] no environment set for HPC desktop -- job fails
>
> In our dev environment (slurm with ohpc) we have started to see this
> error when trying to launch interactive desktops:
>
> /tmp/slurmd/job00079/slurm_script: line 3: module: command not found
> Setting VNC password...
> Error: no HOME environment variable
> Starting VNC server...
> vncserver: The HOME environment variable is not set.
> vncserver: The HOME environment variable is not set.
> vncserver: The HOME environment variable is not set. vncserver: The HOME
> environment variable is
>
>
> As we understand it, the PUN nginx worker launches the batch job that
> starts the desktop batch job.
>
> The problem seems to be that the environment for the job is empty, hence
> no module function or HOME env or anything else. We checked the env of
> the users nginx worker under /proc and it is completely empty. Because
> our job env is inherited from the caller (the nginx worker in this case)
> the attempt to run the module command and vncserver commands naturally fail.
>
> When we launch an interactive terminal, it runs just fine, but I'm
> guessing that's because the interactive session actually reads the
> normal shell startup and builds its environment, even if it happened to
> be missing in the proxy.
>
> Do you have any pointers on what could cause this situation. We
> noticed it after we started adding additional interactive apps but don't
> have a clear time point. It was working fine originally and still
> functions fine in our prod env (without any of the additional
> interactive apps).
>
> Thanks,
>
> John-Paul
>
> _______________________________________________
> OOD-users mailing list
> OOD-users at lists.osc.edu
> https://lists.osu.edu/mailman/listinfo/ood-users
More information about the OOD-users
mailing list