[OOD-users] NSF CyberTraining Solicitation (NSF 19-524)

John-Paul Robinson jprorama at gmail.com
Thu Jan 3 13:50:48 EST 2019


Hi Alan,

Happy New Year!  December was a busy month here but gave me time to 
reflect on an idea which I think has value to the community and 
potential for funding.

I believe you and your colleagues met some of the UAB team at SC18. As 
you know, UAB is very excited about Open OnDemand.  We are also very 
thankful for the work OSC has put into building this tool and sharing it 
broadly.  It is a missing piece for modernized HPC services that mimic 
the user experience across the cloud: a comprehensive workbench native 
to the browser.

I have experience with many "HPC portal" efforts, going all the way back 
to GridSphere.  None of them have managed to capture the fluidity of use 
that is already present in OOD nor offered enough functionality to the 
long tail of new users.

We've been working with OOD since the summer and are preparing an open 
beta for the coming semester.  Like many campus research computing 
teams, we cover many bases.  Having a tool that we can reliably test, 
deploy and enhance is an important part of increasing the impact of our 
limited resources.  When we see an opportunity for improvement based on 
user feedback, we would ideally like to enhance the tool, test it, 
deploy it and get feedback as efficiently as possible without any 
worries that our small change leads to bigger problems.

Traditional IT and HPC operations often introduce hurdles to 
responsiveness.  Typically, a science gateway will go through some 
testing and integration work during the initial deploy. Once deployed to 
an edge device on the cluster, however, it becomes very difficult to 
improve.  Upgrades are often delayed because of uncertainty around 
changes potentially breaking "production" cluster operations.  
Maintaining integration with the local cluster on a node that is not 
part of that cluster is also challenging, it falls outside of typical 
cluster management frameworks. Very few resources are available for 
building or maintaining modern pipelines that can incrementally and 
reliably introduce improvements to web facing science gateways.

Our ideal HPC environment would have a modern DevOps pipeline at its 
core.  It would be capable of continuous integration and deployment of 
software stacks to the cluster.   Unfortunately, while readily available 
tools exist to simplify some parts of HPC operations (OpenHPC today and 
ROCKS before it), the tools generally don't provide mechanisms to easily 
maintain a local fork. They support do local modification but in a way 
that pushes those modifications outside the scope of the tools 
themselves. This is very different from the experience of cheap forks, 
branches and pull requests we have grown accustomed when maintaining the 
sources that feed into those projects.  If such tooling is desired, it's 
left to a local site to build and maintain.  For many sites, this 
usually remains on a perpetual wish list because the overhead is simply 
too high.

Locally, we have been working to build a DevOps framework in the context 
of our OOD deployment.  We want OOD to be a living piece of software 
that grows with the demands and interests of our community.  We also 
want to effectively contribute changes upstream as well as benefit from 
improvements across the broader community. Among other things, this 
requires a clear understanding of the current state of our deployed 
software and the changes being introduced through development.

In our initial steps toward this goal, we have built a dev cluster 
environment based on VirtualBox, Vagrant, OpenHPC, Open OnDemand, and 
the Ansible framework for OHPC from XSEDE.   At present, we have a 
collection of Ansible playbooks that can deploy a OOD-powered dev 
cluster (master, one compute, and an OOD node) at the proverbial "push 
of a button".  [vagrant config 
<https://gitlab.rc.uab.edu/jpr/ohpc_vagrant>, ohpc+ood ansible 
<https://github.com/jprorama/CRI_XCBC>]

Much work remains to turn this into the framework needed to achieve true 
CI/CD.  Nonetheless, we have already benefited from this by having a 
predictable install on our production cluster,  testing enhancements for 
additional interactive applications and discovering changes that will 
break the environment.  The OHPC update to 1.3.6 included Slurm 17.11.10 
which broke OOD interactive desktop startup in our dev environment.  We 
were able to discover, isolate and control the fix (moving to 
slurm-17.11.11) avoiding what could have been a very painful "live" 
experience if just deployed naively to the production cluster.  All too 
often, this is a choice often forced upon small developer teams with 
limited time, hardware and DevOps infrastructure.

I believe it would be a great addition to the OOD ecosystem to have a 
ready to use dev cluster based on OHCP bundled with OOD that could be 
used by any team to try out OOD.  After all, a cluster is the first step 
needed to explore OOD.  This should be part of a larger whole: a 
community DevOps CI/CD infrastructure to provide a continuous build of 
OOD for dev/test that could also act as an adoptable DevOps tool chain 
for campus HPC operations to deploy their own instances of OOD with site 
modifications.  This would work along the lines of the missing 
capabilities I highlighted above.

This framework would help local teams deploy and enhance OOD for local 
operations while offering a smooth coupling with the upstream 
project.    There are public models for this type of fabric in projects 
like Android 
<https://source.android.com/compatibility/vts/automated-test-infra>, 
OpenStack <https://docs.openstack.org/infra/manual/developers.html>, and 
Wikimedia <https://www.mediawiki.org/wiki/Developer_account>. Naturally, 
the model for OOD would start more modestly and could build on existing 
mechanisms in GitHub or GitLab.

I believe a proposal could be crafted under this NSF solicitation to 
build this fabric for the community of sites using  OOD.  The pilot 
program seems like the right target.  What are your thoughts on the 
above?  Is this something that could be shaped into a collaboration with 
you?  I realize the deadline is a month out but think there is still 
time to create a workable proposal.

Looking forward to your feedback,

John-Paul

HPC Architect
Research Computing
UAB IT
205-975-0124

On 12/3/18 9:14 PM, Chalker, Alan via OOD-users wrote:
>
> Open OnDemand Community:
>
> Just wanted to bring to your attention this new NSF funding 
> opportunity related to CyberTraining.  OSC is not in a position to 
> lead a proposal, but if anyone is thinking of submitting a proposal 
> that involves Open OnDemand, we’d be happy to collaborate on it.
>
> ---------------------------
>
> Alan Chalker, Ph.D.
>
> alanc at osc.edu <mailto:alanc at osc.edu>
>
> 614-247-8672
>
>     *Training-based Workforce Development for Advanced
>     Cyberinfrastructure (CyberTraining) – NSF 19-524*
>
>     This program seeks to prepare, nurture, and grow the national
>     scientific /research/workforce for /creating, utilizing, and
>     supporting/ advanced cyberinfrastructure (CI) to enable and
>     potentially transform fundamental science and engineering research
>     and contribute to the Nation's overall economic competitiveness
>     and security. The goals of this solicitation are to (i) ensure
>     broad adoption of CI tools, methods, and resources by the research
>     community in order to catalyze major research advances and to
>     enhance researchers’ abilities to lead the development of new CI;
>     and (ii) integrate core literacy and discipline-appropriate
>     advanced skills in advanced CI as well as computational and
>     data-driven science and engineering into the Nation’s educational
>     curriculum/instructional material fabric spanning undergraduate
>     and graduate courses for advancing fundamental research.
>
>     This solicitation calls for innovative, scalable training,
>     education, and curriculum/instructional materials—targeting one or
>     both of the solicitation goals—to address the emerging needs and
>     unresolved bottlenecks in scientific and engineering research
>     workforce development, from the postsecondary level to active
>     researchers. The funded activities, spanning targeted,
>     multidisciplinary communities, will lead to transformative changes
>     in the state of research workforce preparedness for advanced
>     CI-enabled research in the short- and long-terms.
>
>     All projects are expected to clearly articulate how they address
>     important community needs and will provide resources that will be
>     widely available to and usable by the research community.
>     Prospective principal investigators (PIs) are strongly encouraged
>     to contact the Cognizant Program Officers in CISE/OAC and in the
>     participating directorate/division relevant to the proposal to
>     ascertain whether the focus and budget of their proposed
>     activities are appropriate for this solicitation.Such
>     consultations should be completed at least one month in advance of
>     the submission deadline.
>
>     The revisions are as follows:
>
>       * Three project classes have been defined: /Pilot,
>         Implementation (Small or Medium), /and/ Large-scale Project
>         Conceptualization./
>       * The two solicitation goals have been clarified, and
>         /Pilot/ and /Implementation/ projects may target one or both
>         of the solicitation goals. /Large-scale Project
>         Conceptualization/ projects must address both goals.
>       * Separate submission tracks for /Cyberinfrastructure
>         Contributors, Users, and Professionals/ have been eliminated.
>         However, there remains a focus on these scientific
>         communities, and projects should target one or more of these
>         communities.
>       * The limit on number or proposals per PI or co-PI has been
>         updated to indicate an individual may serve as PI or co-PI on
>         only one /Pilot/ or /Implementation/ proposal to the
>         CyberTraining program per competition. The /Large-scale
>         Project Conceptualization/ projects are not included in this
>         limit.
>       * The programmatic areas of interest have been updated with the
>         current priorities of the participating directorates and
>         divisions, with one additional directorate participating: the
>         Directorate for Social, Behavioral and Economic Sciences (SBE).
>       * The list of additional solicitation specific review criteria
>         has been updated. Proposals should address a subset of these
>         criteria according to the project class and one or both chosen
>         goal(s) of the solicitation.
>
>     There are three project classes as defined below:
>
>       * /Pilot/ Projects: up to $300,000 total budget with durations
>         up to two years;
>       * /Implementation/ Projects: /Small/ (with total budgets of up
>         to $500,000) or /Medium/(with total budgets of up to
>         $1,000,000) for durations of up to four years; and
>       * /Large-scale Project Conceptualization/ Projects: up to
>         $500,000 total budgets with durations up to 2 years.
>
>     *Full Proposal Deadline*: February 6, 2019
>
>     Solicitation Website:
>     https://www.nsf.gov/pubs/2019/nsf19524/nsf19524.htm?WT.mc_id=USNSF_179
>
>     Contact:sprasad at nsf.gov <mailto:sprasad at nsf.gov>
>
>     **
>
>
> _______________________________________________
> OOD-users mailing list
> OOD-users at lists.osc.edu
> https://lists.osu.edu/mailman/listinfo/ood-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/ood-users/attachments/20190103/131530a7/attachment-0001.html>


More information about the OOD-users mailing list