[OOD-users] LSF configuration and debugging

Nicklas, Jeremy jnicklas at osc.edu
Tue Jun 20 09:11:40 EDT 2017


Hi Ping,

This is great news that you are trying out Open OnDemand. And I am glad to see you made it this far into the installation.

The YAML config you posted in the email looks fine. So a couple of things you can try:

- Confirm that the YAML config you posted is world readable (644) at the location `/etc/ood/config/clusters.d/cluster1.yml` from the OnDemand host machine. For example at our center we have:

   ```
   ┌─[jnicklas at ondemand][~]
   └─▪ ls -al /etc/ood/config/clusters.d/
   total 24
   drwxr-xr-x 2 root root 4096 Jun  2 11:00 .
   drwxr-xr-x 3 root root 4096 Feb 21 15:17 ..
   -rw-r--r-- 1 root root 2576 Jun  2 11:00 oakley.yml
   -rw-r--r-- 1 root root 2610 Jun  2 11:00 owens.yml
   -rw-r--r-- 1 root root  861 Jun  2 11:00 quick.yml
   -rw-r--r-- 1 root root 2787 Jun  2 11:00 ruby.yml
   ```

- After you modify the cluster config file, any running apps will need to be restarted for the changes to take effect. The simplest and sure-fire way to do that is from the Dashboard choose Help => Restart Web Server (warning: this will kill any active Shell App sessions you have open so please save your work if you have one open).

- If the above still doesn't work then you can view the logs of your running apps at (from the OnDemand host machine):

   /var/log/nginx/<username>/error.log

   where you replace <username> with the local system username you are using to run the apps as. Look for any errors or warnings within the logs and paste them here if they don't make sense.

Also, we noticed you are using LSF 9.1. When we wrote the LSF adapter we only had access to LSF 8.3, so some issues may arise on your installation after we resolve the cluster config issue you are currently seeing. We'd be greatly interested in providing better LSF 9+ support if you'd be willing to provide us access to your cluster, but we can discuss this in more detail after we debug what you are currently seeing.

Jeremy Nicklas
Web and Interface App Engineer
Ohio Supercomputer Center (OSC)<https://osc.edu/>
A member of the Ohio Technology Consortium<https://oh-tech.org/>
1224 Kinnear Road, Columbus, Ohio 43212
Office: (614) 292-6739<tel:+16142926739> • Mobile: (614) 316-6428<tel:+16143166428> • Fax: (614) 292-7168<tel:+16142927168>
jnicklas at osc.edu<mailto:jnicklas at osc.edu>

Learn more about OSC at https://osc.edu<https://osc.edu/>
________________________________
From: OOD-users [ood-users-bounces+jnicklas=osc.edu at lists.osc.edu] on behalf of Ping Luo [luop0812 at gmail.com]
Sent: Monday, June 19, 2017 11:05 PM
To: ood-users at lists.osc.edu
Subject: [OOD-users] LSF configuration and debugging

I installed OOD with lsf (Our lsf version is 9.1). My cluster configuration is



# /etc/ood/config/clusters.d/cluster1.yml
---
v2:
 metadata:
   title: "Cluster 1"
 login:
   host: "mycluster.login"
job:
  adapter:   "lsf"
  bindir:    "/software/lsf/9.1/linux2.6-glibc2.3-x86_64/bin"
  libdir:    "/software/lsf/9.1/linux2.6-glibc2.3-x86_64/lib"
  serverdir: "/software/lsf/9.1/linux2.6-glibc2.3-x86_64/etc"
  envdir:    "/software/lsf/conf"



The activejobs doesn't list any running job and myjobs complains "There are no configured hosts that allow you to submit jobs."

I can submit job from the host where ood is running for sure. The shell app works without any problem. There must be something wrong with lsf configuration, but I didn't see any error message from the PUN log file. Please help me debugging the issue.

Thanks,

Ping
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/ood-users/attachments/20170620/690eee99/attachment-0001.html>


More information about the OOD-users mailing list