Software Preservation Network Member Profiles

A regular series on the SPN blog, member profiles highlight the full spectrum of software preservation work underway at member institutions.

Post written by: Vicky Steeves

Tell us about the software preservation program at your organization.

We have a few projects around the Libraries that look at software preservation in various forms, but no formal software preservation program as a part of core digital archiving services. We have three current software preservation projects underway:

1. Saving Data Journalism — an IMLS-funded project to create an emulation-based web archiving prototype to capture complex interactive websites.

2. Enhancing Services to Preserve New Forms of Scholarship – an Andrew W. Mellon Foundation-funded project to look at preserving new forms of scholarship.

3. Investigating & Archiving the Scholarly Git Experience — an Alfred P. Sloan funded project to write an archival specification for the git & it’s data format, a version control system for software.

What has your organization accomplished recently that you're proud of - big or small?

We recently launched the code of an emulation-based web archiving tool, built as an extension to ReproZip, that is currently in testing and development. This represents the completion of our 2018-2019 IMLS planning grant. The tool was built to specifically capture dynamic data journalism websites. We announced the launch at the annual conference for the National Institute for Computer-Assisted Reporting (NICAR), and received a lot of initial positive feedback from that community, which has been energizing.

We have just received an award to explore the means by which academic publishers can better preserve new forms of scholarship.  We will be working with NYU Press, the University of Minnesota Press, Michigan Publishing, CLOCKSS, Portico, and several other publishing, preservation, and emulation services. Over the next 16 months, we plan to test different formats and develop guidelines so that publishers can advise authors at the beginning of a project about the long-term “preservability” of different approaches.

Member Profile: New York University. "We recently launched the code of an emulation-based web archiving tool, built as an extension to ReproZip, that is currently in testing and development. The tool was built to specifically capture dynamic data journalism websites. We announced the launch at the annual conference for the National Institute for Computer-Assisted Reporting (NICAR), and received a lot of initial positive feedback from that community, which has been energizing." Vicky Steeves. Librarian for Research Data Management and Preservability.
Tell us about a challenge that your organization is facing in its software preservation work or that the field is facing as a whole?

The biggest challenge to the field of software preservation is copyright/legal concerns. In the United States, software are considered ‘literary works’ and so copyright protects not only the literal elements (e.g. source code), but also non-literal elements (e.g. code sequence, unique application of utilitarian methods). 

While libraries specifically have an exception to copyright laws that “…allows them (libraries) to reproduce and distribute copyrighted works under certain specific conditions,” this has not been tested for creating an archival copy of a copyrighted software. Most GLAMs err on the side of caution, and as such a lot of fair use for archiving software has not been tested in the courts. The legal infrastructure tested as a part of software preservation has the potential to be transformative for GLAMs who want to archive critical software for their users (e.g. the first version of PhotoShop to see artwork in the original environment) but are scared of the legal consequences of doing so.

Name a software title or library that is particularly significant for one or more of your user constituencies, and tell us why that title or library is particularly significant.

Given that we support a global campus with lots of software titles that are particularly significant for large, differents swaths of that community, this is a super hard question! 

For a research context, so many softwares that scholars rely on are locked down and impossible for many to access to reproduce studies’ findings, etc. Hopefully a lot of the work of SPN will help us navigate these tricky legal waters to allow scholars to access and re-use others’ work.

Why is collective action the best approach to software preservation?

It’s the best approach for most things! 

In all seriousness, collective action is a clear path forward for the copyright concerns and legal implications of software preservation. The legal, social, and technical infrastructure work of SPN is best done together with real action and power (in voting, etc.) amongst our “collective.”