Greetings from Claire Fox, EaaSI Summer Intern
My name is Claire Fox, and I’m a graduate student in the Moving Image Archiving and Preservation program at New York University. I’ve been spending my summer at Yale University Libraries Digital Preservation Services, steeping myself in the EaaSI body of work and aiming to provide a little extra research into infrastructure preservation.
One of the perks of being an intern at EaaSI is that I get a sneak peek at digital artifacts that have been — or hopefully will be! — newly revived within an EaaSI configured software environment.
A few examples:
I’ve had the privilege of seeing a 1990s AutoCAD rendering of a library floor plan, which looked like a neon stick figure until we zoomed in to see clusters of tiny labels, measurements, and instructions
There was a week when Laurie Anderson’s alluring voice would periodically fill the hallway as my colleague Ethan Gates configured an environment for her 1995 HyperCard video game, Puppet Motel.
An ongoing attempt at developing an emulation use case with a 1986 text-based choose-your-own-adventure video game has yielded a series of disk images that occasionally display license headers or — less helpfully, more enjoyably — lists of types of food.
Before my internship at Yale, I assumed that EaaSI was all about access to legacy software-dependent objects. And it is! But the EaaSI program of work also creates digital objects that warrant preservation infrastructure of their own. This is where my internship begins.
Configured Software Environments =
The TL;DR here is that EaaSI’s preservation objects are the user-configured software environments created and saved on the EaaSI software platform. We’ve been calling these objects Environment Objects. Any Environment Object of any size is made up of a set of disk images: the first one will always be a Base Environment, and we refer to subsequent layers as Derivative Environments. When Derivatives are saved on the EaaSI software platform, they register any changes made to the Base Environment or preceding Derivative in blocks.
This accounts for the efficiency of the EaaSI infrastructure: rather than saving a new consolidated environment every time a new software product is added to the software environment and saved, EaaSI only saves the changes made to the previous “layer,” which makes for a much smaller file. Think of it this way: say I’m working with a Windows 95 Base Environment, and I install WordPerfect 7 and hit save. In the EaaSI infrastructure, I’ll have two files to work with: the Base Environment, which might be 5 GB, and the blocks that changed in the Base Environment when I installed Word Perfect, which might be 500 MB. If I saved these changes as two standalone virtual machines in an alternative emulator like VirtualBox, I’d also have two files, but they’d be the 5 GB Base Environment, and the 5 GB Base Environment again with the addition of WordPerfect. That’s an extra 5 GB to save a 500 MB change.
Defining the Preservation Package
While de-duplicating data is a guiding principle of the EaaSI architecture, it presents a major challenge for long-term preservation: How do we maintain the relationship between all of those Base Images and Derivatives once they’ve left the EaaSI platform? I would take steps to answer this question: define the components we’d need to maintain the relationships, and create a workflow that would enable those components to be exported from EaaSI and ingested into an archival repository. This action would be taken with the intention of creating a backup of all EaaSI Environment objects.
When everything is still within the EaaSI software platform, the platform can maintain the relationships itself. But once we think about creating a backup copy of the environment – in particular, one we’d want to ingest into an archival repository, like Preservica – we need to gather the appropriate contextual information that will maintain those relationships. That means multiple types of breadcrumbs, including:
Embedded metadata within the disk images that maintains the structure of an environment
Descriptive metadata schemas that articulate system requirements, dependencies, and parent-child relationships
Screen shots that give visual clues for how an environment is supposed to look.
At baseline, if I want to retrieve and run my Derivative environment with WordPerfect 7, embedded metadata will tell me that I also need to retrieve the Windows 95 Base Environment to do so, and a descriptive metadata xml document will confirm that relationship.
With all these fun details in mind, my internship started out with the goal of writing a workflow that would facilitate the export of configured software environments from the EaaSI software platform, and direct their ingest as SIPs into Preservica. Over time, and through hours of research, many Google docs, and conversations in the genre of “hmm, I’ve never thought of that before,” the aim for a workflow shifted to the creation of a workplan, in which I hope to point out where there are gaps in the infrastructure that we need to close before an action-oriented workflow can be created.