Post written by: Seth Anderson, EaaSI Program Manager
Spreading the Word

Multithreading Software Preservation and Emulation

August began with the day-long Multithreading Software Preservation and Emulation workshop at the Austin Public Library during the 2019 SAA Conference. The event included introductions to the Fostering Communities of Practice cohort projects and the EaaSI program, illustrating how the organizations involved implement software preservation and emulation services. Exercises provided attendees an opportunity to consider common software preservation challenges related to documentation, identifying use cases, and developing action plans for their own institutions. Stacks of Post-Its were used.

Attendees left with next steps to advocate for software preservation and emulation at their institutions and we look forward to following up in the coming months. The workshop confirmed the value of the EaaSI program’s efforts and the need for scalable emulation services. Thanks to all of our co-facilitators for all their hard work!

Webinars, Webinars, Webinars

The EaaSI webinar series continued in August and September with sessions focused on institutional policy and legal frameworks and a look at the EaaSI metadata model. If you weren’t able to attend, do not worry, recordings and transcripts of the webinars will soon be posted on the SPN website. Be sure to register for our final webinar of the series in October, focused on emulators and configuration workflows.

You can also check out our recent OCLC Works in Progress presentation, available here, for an intro to the EaaSI project and a demonstration of the EaaSI Network in action.

iPres

The arrival of iPres is an annual milestone for our team, as we present the results of our hard work to the international community and learn about related efforts in digital preservation that inform future efforts. This year was no different. And while we wish all of the team could have gone to Amsterdam, we still had a large and active presence at the conference.

Primary Investigator Euan Cochrane (Yale University Library) and Lead Developer Klaus Rechert (OpenSLX) presented a paper on the Universal Virtual Interactor, demonstrating our recent advancements in the automation of emulation selection and configuration. Our current working prototype is able to analyze and identify file formats, locate compatible software in an emulation environment, and automatically render a submitted file upon startup. This prototype is only the beginning as we continue to improve the details and accuracy of our analysis and work to integrate the service with other systems, including EaaSI.

Semantic Architect Kat Thornton (Data Current) and Kenneth Seals-Nutt presented Getting Data Out of Wikidata, check out the paper here, a summary of the various methods by which their Wikidata for Digital Preservation portal utilizes and represents digital preservation data (e.g., software and file format metadata) from the Wikidata knowledge base.

EaaSI also co-facilitated a hackathon, with the BitCurator project, during the conference. Participants worked on various projects focused on documentation of software applications and emulators and coding new functionality for emulation of forensic disk images. Keep an eye out for a summary blog post from the man who made the whole thing possible, Ethan Gates, summarizing the outcomes of everyone’s hard work.

If you weren’t able to snag one our nifty branded EaaSI flash drives at iPres, no need to worry! There will be more to hand out in October at Maintainers III, DLF/NDSA, and the BitCurator Users Forum.

The Universal Virtual Interactor

About that UVI service mentioned above…

Euan has previously covered our detailed plans for the UVI in a blog post for the Digital Preservation Coalition. Implementation of the service began in August with the goal of a prototype ready in time for iPres. At this stage, the service is limited to analysis of one file at a time in order to limit the overall complexity. Matches between identified formats and compatible software are established via Pronom identifiers associated with the application. Our ability to expand the analysis and scale of available environments for use by the UVI is contingent on ongoing data collection to determine format support of the many applications we plan to add to the EaaSI Network.

Klaus has also done incredible work to develop scripts that enable the operating system of an emulated computing environment to open files upon start up. This was no easy task, as explained by Klaus:

“The UVI is designed to automate as many steps as possible in order to simplify usage of old computer systems and software. This requires an automation logic customized for the object to be rendered and the target emulated software environment. In order to reduce maintenance of emulated software environments as well as keeping them generic and reusable for other usage scenarios, the automation logic is built on generic infrastructure embedded into emulated environments and an object / target system specific part, generated during rendering request. For instance, the generic framework inside UVI Windows environments, is scanning auxiliary drives (floppies, secondary disks or CDROMs) for UVI specific content. If not present, the environment will function normally. In case of a UVI rendering request, additional information is available, e.g. a customized /autorun.inf/ to autostart the selected software application and to load the provided rendering object.”

We will have more documentation (in layman’s terms) and demonstrations of the UVI in the future. So stay tuned.

The Metadata Model

I’ve mentioned documentation and data gathering multiple times in this update which should indicate the importance of metadata to the success of the EaaSI program. The more work we do on new functionality or redesigns of the user interface, the more we find we need to capture and structure detailed information about the computing environments and software in EaaSI. This month, I’ve worked to complete another round of updates to the EaaSI metadata model, fine-tuning the various properties to be collected or presented. In particular, we’ve taken a closer look at how to support complex computing environments and operating system settings. As Ethan recently commented, “Computers are too complicated,” and so is describing them.

Our goal is to define a model that can be presented in forms or serialized in exports for the capture and exchange of information essential to discovery, creation, and automation of emulation. This includes such properties as the system requirements of software, the association of file formats and software applications in an operating system, or the partition configuration of a computing environment’s disk image (e.g., volume, boot order, etc.). All of these data points are related within a computing environment and mapping them out has been one of the larger undertakings of our project.

A public copy of the model’s data dictionary is available here and we welcome feedback from the community. Please take a look and let us know what you think.