Overview

Led by the Digital Preservation Services team at Yale University Library, and with support from OpenSLX, DataCurrent, PortalMedia, Educopia, and the Software Preservation Network, the EaaSI program of work is focused on the development of technology and services to expand and scale the capabilities of the Emulation-as-a-Service software.

This work includes:

  • The establishment of a community of partner institutions using the EaaSI software platform;
  • Implementation of a resource sharing functionality, the EaaSI Network, which enables distribution and retrieval of emulation environments and software installation media between users;
  • Improvement of description and discovery capabilities, including use of and contribution to data in the Wikidata body of knowledge (via the WikiDP portal);
  • Prototyping of various modules and services for management of end-user access (including the Open Source Software Sandbox).

As part of our efforts to increase the available resources, thousands of computing environments are being configured and shared to the EaaSI Network service. Configuration workers are busy cataloging software (e.g., names, developers, system requirements, file format support) and installing operating systems and applications in emulated computing environments.

The EaaSI program of work is sponsored by the Alfred P. Sloan Foundation and the Andrew W Mellon Foundation.

Launch Sandbox

Deliverables

Distribute

Distributed Management

To make sure our service lasts, we’re pushing forward a decentralized network of EaaSI nodes and environments. That includes:

  • At least 10 partner institutions, each controlling their own installation of the EaaSI platform
  • Thousands of pre-configured emulation environments and software installation media, seeded by Yale and shareable between nodes
  • A collaborative approach to service-building that ensures expert feedback and community investment

Scale Up

Deploying and using emulators should be streamlined. EaaSI offers:

  • Improvements to the Emulation-as-a-Service platform, originally developed by the bwFLA program and now maintained by OpenSLX
  • A fresh, intuitive user interface and experience for EaaSI, designed by PortalMedia
  • Enhanced performance and security through integration with user authentication services

Discover

EaaSI draws on existing metadata schema and standard initiatives to address software and software dependencies (CodeMeta, PREMIS, Wikidata, and more) to power all components of our service. That means:

  • A defined metadata profile for description of software and computing environments
  • Comprehensible, open, machine-readable records and documentation
  • Incorporation of services and schema developed by Wikidata for Digital Preservation

Access

To expand the community of users who benefit from emulation, we’re also developing four extensions of the EaaSI platform to address identified patrons and communities:

  • CD-ROM Sharing
    Shared emulation for obsolete digital media in circulating library collections
  • Virtual Reading Room
    Enhanced access to born-digital materials in archives and special collections
  • Scientific Software Portal
    Incorporation of emulation into workflows for Research Data Management (RDM)
  • Universal Virtual Interactor (UVI)
    an API to enable automated retrieval and interaction with EaaSI resources

Staff

EaaSI Student Staff

We recognize and appreciate the contributions of all our student staff. Every student staff member is critical to EaaSI program success.

Current Student Staff:

  • Zoe Sinclair (’22)
  • Justin Cong (’20)
  • Leo Lizbinski (M.P.H. ’20)
  • Kevin Zheng (M.M. ’18, M.P.H. ’20)

Former Student Staff:

  • Justin Aubin (’21)
  • Mac Schmidt (’20)
  • Idris Sylvester (’20)
  • Eric Timperman (M.M., ’19)
  • Matt Tu (’22)
  • Kohei Yamaguchi (M.M.A., ’19)
  • Paul Han (’20)
  • Alexa Murray (’19)
  • Vibhor Nayak (M.A.M., ’18)
  • Nico Taylor (’22)

Node Hosts

Node host institutions maintain and make use of a locally installed instance of the Emulation-as-a-Service platform that is connected to other nodes in the EaaSI network. Connection to the network includes access to software and emulated software environments shared by other member institutions, including Yale University Library’s seed collection. In these early days of the service, we anticipate nodes will use their instance primarily for testing and development of workflows. As the members of the network become more familiar with the service and our infrastructure improvements are implemented, we hope members of the network will contribute software and environments to the network, participate in ongoing improvements to the EaaS system, and deploy local access platforms that use the emulation service.

Together, EaaSI node hosts comprise a cohort, or an intentional learning community. The EaaSI node host cohort represents several distinct organizational types, collecting concentrations and designated user communities. Working with these various organizations, the EaaSI project staff will identify common pain points, gaps and potential areas of improvement that will shape the design of the EaaSI network and infrastructure.

From left to right: Don Brower (Node Lead, Digital Library Infrastructure Lead), Natalie Meyers (E-Research Librarian), Mark Suhovecky (Digital Library Data Curation Developer), & Rick Johnson (Co-Director of Digital Initiatives and Scholarship) Photo by Matt Cashore/University of Notre Dame

Notre Dame University

As scientific research is increasingly born digital, more researchers are using computational resources for simulations and data analysis. Participating in EaaSI is a way we can act on cloud advantages for emulation processes in digital preservation. Our most compelling use cases for emulation center around keeping software running in order to continue to provide access to underlying data for our campus researchers and the broader community at large. Through participation in the EaaSI network, we seek to improve software preservation for our researchers, expand our knowledge of how emulation fits in the context of digital scholarship. Approaching software preservation, emulation, and its infrastructure as a community allows us to move from the current landscape of idiosyncratic solutions to a commons centric approach that ensures greater interoperability with existing tools and platforms and their host institutions. The community can “divide and conquer” particular emulation environments: an emulation can be implemented once, and used across the consortium.

Read Notre Dame Statement of Importance

From left to right: Lauren Work (Digital Preservation Librarian), Robert “Chip” German (Node Lead, Scholarly Communication/Program Director) & Jeremy Bartczak (Metadata Librarian)

University of Virginia

The University of Virginia Library is an enthusiastic participant through the Software Preservation Network in Yale’s Emulation As A Service Infrastructure (EaaSI) project. We are motivated strongly by the urgent need to preserve software in ways that make our preservation of digital scholarly and cultural materials more accessible and meaningful.

As preservation professionals (whether or not it is a primary assignment for each of us), we are fond of a commonly expressed perspective: preservation without access is pointless. The preservation of relevant software adds many layers of complexity to the notion of digital preservation while promising critical additional value in the result.

If we lose access to the software component-mix that an object requires to be something other than an unintelligible sequence of bits, the object becomes useless.

Read UVA Statement of Importance

From left to right: Ryan Johnson (Metadata Librarian), Sibyl Schafer (Node Lead, Digital Preservation Analyst), Ron Stanonik (Systems Administrator), Tim Marconi (Library IT Operations Manager) & David Minor (Director of Research Data Curation)

University of California – San Diego

The UCSD Library’s Research Data Curation program has been working with UCSD researchers for over half a decade to curate and preserve the research data produced by our campus. Our large and quickly-growing collection of research data contains long-tail research data file formats as well as software created or used to generate scientific findings. As software environments morph over time, it becomes necessary to capture the original computing environment which produced the data. Therefore, emulation becomes an essential tool in not only accessing data generated in certain environments but also reproducing scientific results years after the original studies have been completed. Approaching software preservation and emulation as a collective action problem provides USCD with the opportunity to learn from other organizations that are grappling with these same issues, and can help us identify how we can ensure that the data we’re curating is accessible and reproducible in the future.

Read Notre Dame Statement of Importance

Stanford University

Stanford University has large collections of historic software which are currently inaccessible to our researchers. We are also actively collecting software that has been developed by our faculty for classroom use and by authors of digital publications for the Stanford University Press. We are excited by the potential value offered by emulation to expand access and use of our software collections. A distributed community approach to software preservation is advantageous as it increases the use of this resource type, allows for us to engage with software creators early in the software development lifecycle and it shares local expertise of emulation across multiple academic insitutitions.

As we embark on our work as a node host we look forward to learning how our partner institutions select, validate, and publicize their work. Authentication and rights management across institutional boundaries is one area of interest. We are also keen to learn how we can leverage the sharing of software libraries and configuration data.

Read Stanford Statement of Importance

From left to right: Eric Kaltman (Node Lead, CLIR Fellow for Data Curation in the Sciences), Jessica Benner (Computer Science Liaison Librarian), Emily Davis (Project Archivist) & Huajin Wang (Computer Science Liaison Librarian)

Carnegie Mellon University

Most modern activities are dependent in some way or another on a dense network of software and its related technical dependencies. In order to ensure continued access to human knowledge production it is imperative that software preservation become a significant and embedded societal practice. Without a concerted, collaborative effort to maintain, archive, and reveal software data and development practices we run the risk of losing a significant amount of humanity’s history. One significant means of software preservation is emulation, but an emulation solution, at scale, requires more resources than any single institution could reasonably support. The EaaSI project, in gathering use cases from institutions with different collections and collecting priorities, provides an expanding supportive network not possible through monolithic approaches.

Read CMU Statement of Importance

Advisors

Advisors to the project are individuals and/or organizations that want to contribute to the development of the network and EaaSI services, but require an alternative to supporting local infrastructure or contributing via a hosted service. Advisors may participate in hosted sandbox testing and will provide participate in requirements and use case gathering, feedback on documentation and resources, and other related project activities to be identified.

The EaaSI team understands that a commitment to hosting infrastructure (and providing the corresponding staff resources) may not be possible at this time. We value the participation of any interested individuals and institutions who would prefer to remain engaged in the project work without assuming the node host responsibilities.

About the EaaSI Sandbox

Powered by the EaaS emulation-as-a-service system, the Open Source Software Sandbox provides free, public access to emulated computer environments featuring operating systems and software from over twenty years of open source development. We’re offering this version of the EaaSI service to show how it makes access to emulation possible at the click of a button. You can also review the metadata records we are sharing via the Wikidata knowledge base to learn more about the open software software inside and to verify its accuracy. We invite you to poke around these fascinating legacy programs and to learn more about the capabilities of the EaaS system as you do.

Launch Sandbox

What can I do in the sandbox?

You are encouraged to investigate any and all of the example emulation environments in the Sandbox. Feel free to open any application, create new files, edit sample files, change operating system settings, etc. The Sandbox is exactly that, an open space to interact with software.

Since the Sandbox is provided for demonstration only, you won’t be able to save any changes you’ve made to an environment or export the files you interact with. We’ve also restricted access to the internet, so you won’t be able to surf the web or download anything to the emulation environments. Emulation sessions are time-limited to 30 minutes so do not attempt anything too time-consuming.

To get started, navigate to the Environments page, find an environment you want to see, and select “Run Environment” from the Choose Action menu. For more details on interacting with various environments in the Sandbox, check out the user guide here or click Help.

What's in the sandbox?

We’ve loaded the Sandbox with notable examples of open source operating systems and software applications to illustrate the breadth and diversity of open source development over the years.

Landmark Open Source Distributions

How does the look and feel change from version to version? What unexpected developments are noticeable? Are there changes in the software applications packaged with the OS? How do these systems compare to commercial platforms like Windows, MacOS?

  • Every major Ubuntu release from 2004-2010
  • Red Hat Linux as it transitioned to
  • Fedora Core
  • Mandrake Linux
  • Yellow Dog Linux
  • Slackware

Alternatives to Major Software

How does the look and feel differ between the open source and commercial platforms? Are there differences in format support? What additional functionality does the open source application include? Does it have limitations?

  • OpenOffice.org (vs. Microsoft Office)
  • GIMP (vs. Photoshop)
  • SciLab, FreeMat, GNU Octave (vs. MATLAB)
  • Scribus (vs. InDesign)
  • FreeCAD, QCad (vs. AutoCAD)

A Comparison of Interfaces for the R Programming Language

How do the features of each GUI differ? Are there any differences in the data outputs of each application? Which GUI is easiest to use? Are there tradeoffs for more advanced functionality?

  • R Studio
  • RKWard
  • R Commander

A Comparison of Open Source Educational Software

How does the curriculum of the projects differ? Are there any similarities in the games included? How do these differ from the commercial education software you’ve seen?

  • The KDE Education Project
  • Edubuntu

User and Code Documentation

Our technical documentation guides users and systems administrators through navigation and maintenance of EaaSI’s implementation of the Emulation-as-a-Service platform.

System requirements and the EaaSI User Handbook will be hosted on GitLab to allow for seamless integration with our development team’s workflows, and easy access for Node Hosts to bug tracking. All publicly-available repositories and code can be seen at EaaSI’s GitLab project page: (https://gitlab.com/eaasi)

More conversational, forum-style technical discussion and troubleshooting among the EaaSI network can also be viewed on the EaaSI Tech Talk Google Group (https://groups.google.com/forum/#!forum/eaasi-tech-talk)!

Webinars

Emulators and Configuration Workflows

Summary: This webinar, the last in the EaaSI Webinar Series, is entitled “Emulators and Configuration Workflows” and addresses questions such as: What is your previous experience in using emulation software? Were there any emulation programs already incorporated into your institutional workflows? …Continue Emulators and Configuration Workflows

EaaSI Metadata Model and Wikidata

This webinar, entitled “EaaSI Metadata Model and Wikidata,” addresses questions such as: How does your organization create, manage or discover metadata about software and related resources now? What would you like to be able to do with software metadata? Do …Continue EaaSI Metadata Model and Wikidata

Why EaaSI? System Overview

Summary: This webinar, entitled “Why EaaSI? System Overview,” addresses questions such as: What problem is EaaSI trying to solve? What are EaaSI core functions and underlying design concepts? Why are approaching it this way? How is EaaSI different from EaaS? …Continue Why EaaSI? System Overview

Legal and Institutional Policy Frameworks

Summary: This webinar, entitled “Legal and Institutional Policy Frameworks,” addresses questions such as: When you don’t collect software as an object, why do you do it? What role(s) does/do software play in your organizations and their long-term collection and preservation …Continue Legal and Institutional Policy Frameworks

Templates

In-Person Meeting Exercises

EaaSI Node Leads convened in November 2018 to review the EaaSI beta implementation process, discuss service design for emulation in local organizational contexts, prioritize project activities and review the Participation Agreement drafted in partnership with the Harvard Law School Cyberlaw …Continue In-Person Meeting Exercises

Software Collections Inventory

EaaSI Software Collections Inventory is what is sometimes referred to as a Random (or spot) inventory. Random (or spot) inventories are extremely limited in scope. They are primarily used to verify the location of a representative sampling of objects. They …Continue Software Collections Inventory

Scenarios for (Re)Use & Access

EaaSI Scenarios for Use & Access asks each Node Host to brainstorm scenarios for use and access they believe will drive the adoption of EaaSI; identify users whose use cases they believe may correspond with the scenarios for use and …Continue Scenarios for (Re)Use & Access

Statement of Importance for Software Preservation

Each Node Host Team was asking to provide 250 words or so that addressed the following questions: Why is software preservation important? What are the advantages to community infrastructure or approaching software + preservation and emulation as a collective action problem? What …Continue Statement of Importance for Software Preservation

Semi-Structured Interviews

EaaSI Scenarios for Use & Access asks each Node Host to brainstorm scenarios for use and access they believe will drive the adoption of EaaSI; identify users whose use cases they believe may correspond with the scenarios for use and …Continue Semi-Structured Interviews

Soup to Nuts: Simulating Software

The purpose of the Soup to Nuts exercise is to revisit your Software & Collection Inventory and Scenarios for Use & Access to determine if those software cases/examples are still the cases/examples that you want to configure and test in …Continue Soup to Nuts: Simulating Software

Prioritizing Software

This purpose of Prioritizing Software is to expose Node Hosts to National Software Reference Library software categories and their contribution to the EaaSI seed library; to provide an overview of the EaaSI software configuration workflow; and to connect the software …Continue Prioritizing Software

Prioritizing Features

This purpose of Prioritizing Features is to encourage Node Hosts to articulate and prioritize possible features for each feature category of the EaaSI system including Search/Discovery Resource Import Environment Configuration Metadata/Description User Management User Interface Network Capabilities Data Management Access …Continue Prioritizing Features

Envisioning Local Services

This purpose of the Envisioning Local Services exercise is to help your organization consider the specifics of the local service environment and create a list of short, medium and long-term goals that will help to bring software preservation and emulation …Continue Envisioning Local Services

Baseline Cost Measurement

This purpose of the baseline cost calculator is to capture as much granular information about the costs of software preservation in different organizational contexts as possible – inlcuding start-up costs. We understand that every organization will vary in terms of …Continue Baseline Cost Measurement

Software Preservation Statement of Importance

The purpose of the Software Preservation Statement of Importance is to faciliate consensus building within an organization on the topic of software reservation and emulation. The goal of the Statement of Importance is to articulate clear understanding of softwar preservation …Continue Software Preservation Statement of Importance

Presentations

Towards a Universal Virtual Interactor (UVI) for Digital Objects

Presentation given at iPRES 2019  

Participatory Design for Software Preservation and Emulation Services

Presentation given at the International Conference on Digital Preservation (iPRES) 2018

Complex Data Sets, Software Preservation, and Emulation: A distributed approach to long-term care

Presentation given at Research Data Archiving & Preservation (RDAP) Summit 2018

Investigating Emulation as a Service for Reproducible Research at Yale

Presentation given at the 2020 Librarians Building Momentum for Reproducible Research virtual conference.    

An Introduction to Emulation

Presentation at the NYU Institute for Fine Arts’ workshop “Digital Preservation: Caring for Digital Art Objects.”    

Software Curation: Intersection of Policy and Practice

Panel presentation at Maintainers III in Washington, D.C. in October 2019 featuring perspectives on the materials that have to be maintained around the software in order to make the software meaningful including: research data curation librarian, service provider, legal expert, …Continue Software Curation: Intersection of Policy and Practice

Participatory Design for Long-term Access: User Research, Software Preservation, and Emulation

EaaSI presentation given at Open Repositories in Hamburg, Germany in June 2019

Software Curation: An Ecosystem of Users, Tools, and Services

Panel presentation at PASIG in Mexico City, Mexico in February 2019 featuring perspectives from different niches in the software curation ecosystem including: software sustinability trainer, software preservation and emulation infrastructure providers, a disciplinary liasion, and a legal expert.

Scaling Software Preservation & Emulation Services

EaaSI presentation at Digital Library Forum in Las Vegas, Nevada in October 2018.

Participatory Design for Software Preservation and Emulation Services

EaaSI presentation given at International Conference on Digital Preservation (iPRES) in Boston, Massachusetts in September 2018

Complex Data Sets, Software Preservation and Emulation: A distributed approach to long-term care

EaaSI presentation at Research Data Archiving & Preservation Summit in Chicago, Ilinois in March 2018