Formal comments submitted by the Software Preservation Network to the Office of Science and Technology Policy in response to a Request for Information regarding “Public Access to Peer-Reviewed Scholarly Publications, Data and Code Resulting From Federally Funded Research.”
Thank you for this opportunity to comment on the future of public access to the results of federally funded research. We write to emphasize the importance of providing wherever possible for immediate, free, and openly-licensed access to software code developed as part of federally funded research. As complex research projects increasingly rely on custom-made research software tools, full access to research results can only be assured when relevant software code is part of the publicly accessible and reusable outputs of any software-dependent research project.
The Software Preservation Network (SPN) is a coordinated, distributed effort to ensure long term access to software through community engagement, infrastructure support and knowledge generation. SPN believes that software should be curated and preserved because it is both a dependency to access existing digital data and because it has intrinsic cultural value due to its mediating role in our lives. The core of SPN’s constituency consists of 20 institutional members from a group of universities, museums, and research institutions committed to the belief that software is critical information infrastructure. In addition to the financial support of its institutional members, SPN has led projects funded by the Alfred P. Sloan Foundation, the Andrew W. Mellon Foundation, and the Institute for Museum and Library Services.
Lack of Access to Software Code Limits Effective Communication of Research Results
Software is as integral to the full understanding and dissemination of research as a paper, monograph, or dataset. Communication of research results serves at least three core purposes, and each of these requires access to software code. Code is essential to enable validation and reproducibility of findings, to support collaboration and reuse, and to provide the means to share software and data with future researchers.
Reliable research results must be valid and reproducible. To fully validate and reproduce the results of a research project, independent researchers need access to the key inputs and tools involved in conducting the original research. This increasingly means that independent researchers need access not only to the research data and to detailed information about methodology, but also to the software code used to derive results from data. Access to code enables independent assessment of the code itself as well as confirmation that analysis of the data using the relevant code produces the published results. When relevant code is either unavailable or unusable (due to licensing restrictions), independent validation and reproduction are difficult, if not impossible.
Effective communication of research results should facilitate collaboration and reuse. Discussions of reuse of scholarly research results often focus on data, but code is also an important reusable element of research. Code can be reused for its original purpose, or repurposed, modified, and adapted to serve a new research purpose. When code is unavailable, or its reuse is clouded due to restrictive or unclear license terms, downstream collaboration and reuse suffer.
The core purpose of research communication is to fully convey research to future scholars. Leaving aside validation, reproduction, reuse and collaboration, simply understanding research results often requires access to software code. Research communication that doesn’t include code simply is not full communication.
Broad, Clear Access and Licensing Requirements for Code Will Ensure Federally Funded Research Has the Greatest Possible Impact
To ensure adequate access and reuse rights for code resulting from federally funded research, agencies should consider taking the following steps:
- Require immediate, full access to software code resulting from federally funded research, alongside data and peer-reviewed journal articles.
- Require software be released under simple, clear license terms that permit reuse and adaptation, not just read access.
- Where exceptions are necessary (e.g., due to privacy or security concerns), the justification for withholding public access should be published and a process should exist for researchers to challenge the withholding of data, or to request private access where possible.
- Metadata about research outputs—including software code, data, and publications—should be available in machine-actionable formats at the time of publication. Regardless of the license chosen for the outputs themselves, metadata should be dedicated to the public domain via a Creative Commons Public Domain Dedication (CC0), to ensure it is free of all copyright restrictions.
- Access to these materials should either be provided via a digital repository maintained by a Federal agency or in any repository meeting appropriate criteria to ensure high quality.
- Compliance with these policies should be closely monitored and enforced and become a condition of receiving federal funding.
Thank you for interest in these important issues, and for considering our views.