Why we need to rethink risk in digital data preservation
January 6, 2022
By Libby Plummer
Prof Rebecca D Frank(opens in new tab/window) of the Einstein Center Digital Future(opens in new tab/window) explains the importance of research data continuity and how it can be achieved.
Prof Frank is also Junior Professor for Information Management at the Berlin School of Library and Information Science at Humboldt University of Berlin and an associate with the Humboldt-Elsevier Advanced Data and Text Centre (HEADT Centre).
As a society, we are generating an ever-expanding amount of data. From historical collections and government archives to vital scientific research, vast amounts of data are stored in digital repositories across the globe. Digital preservation is an increasingly important issue, and without preservation planning, we run the risk of losing important data if disaster strikes or in more common scenarios, such as rapid technological change or an organization going out of business.
“Whether it’s data that was ‘born digital’ or digitized surrogates of physical objects, the information itself may seem small, but what's really important to remember is that even a very small file can represent an enormous amount of money and time and resources,” said Prof Frank in a recent webinar for Elsevier employees. “That’s why it's important to think about not just taking care of this data but really trying to understand how to ensure its survival long-term.”
It is essential that we have trust in the organizations that act as custodians of crucial research data, which is why they undergo testing as part of a rigorous auditing process to receive certification. Repository succession plans are developed in order to ensure data continuity, even if the primary storage method fails in some way, such as an organization closing down.
But are we doing enough to understand and mitigate the risks involved in digital preservation? Rebecca thinks we still have some way to go.
Research reveals areas in need of improvement
"In digital preservation, we’ve done an excellent job of thinking about risks in technical, economic, and organizational terms,” she said. “It’s time to add social to that list.”
At the Einstein Center Digital Future, where she is sponsored by Elsevier, Rebecca conducts research in open data, digital curation and preservation, and data reuse, with a focus on how social and ethical barriers influence the long-term preservation of digital information. She is also Junior Professor for Information Management at the Berlin School of Library and Information Science at Humboldt University of Berlin(opens in new tab/window) and an associate with the Humboldt-Elsevier Advanced Data and Text Centre (HEADT Centre)(opens in new tab/window), which focuses on research integrity and education.
Rebecca’s current research examines the social construction of risk in the context of digital repository audit and certification. Her interest in risk and digital preservation began with a previous study in disaster planning(opens in new tab/window). As Rebecca explained in the webinar:
I found that repositories that were trying to become certified as trustworthy were more likely to have documented articulated disaster plans because, of course, that documentation is what's necessary for certification. That study also really highlighted for me the fact that digital preservation and trustworthy repository certification are all about risk.
Rebecca’s next round of research(opens in new tab/window) focused on one of the three most prominent data preservation certifications in use today — the ISO 16363 certification, usually referred to as TRAC (Trustworthy Repositories Audit & Certification)(opens in new tab/window). Her research found that stakeholders in this process tended to rely on a classical approach to risk, focusing on the probability of an event, multiplied by the magnitude of its consequences. However, this view of risk assumes first that we can identify threats, and second, that people will behave predictably in response to risk information. As Rebecca explained:
This approach doesn't get us all the way, so this is where my research comes in. We know that people are not perfectly rational actors and that social factors influence the way they receive, understand and respond to all kinds of information, including information about risks. It's really how people behave in response to information that matters for ensuring the longevity of digital information. And I argue in this research, social factors influence the ways in which actors involved in repository certification and digital preservation understand the context of risk.
With these factors in mind, Rebecca developed a theoretical model for the social construction of risk in digital preservation. With a focus on the TRAC certification system, Rebecca then set out to investigate how auditors and repository managers conceptualize risk in the context of an audit, what the differences and similarities are between how they understand risk, and how it's communicated. She also looked at the extent to which the factors in her model influence risk perception.
She found that auditors and standard developer on one hand, and repository staff members on the other, did not share the same understanding of risk. For example, her research(opens in new tab/window) found that while repositories provided credible succession plans, their staff did not necessarily believe that these plans were evidence of risk being fully addressed. Participants also argued that these plans may not be enforceable, since they would only be enacted if their own organization failed.
This research led Rebecca to her current project, which focuses on the CoreTrustSeal(opens in new tab/window) certification system — a combination of two legacy certification systems (Data Seal of Approval, and World Data Systems) that are more global in scope than TRAC. Under the CoreTrustSeal system, auditors and repository staff are no longer separate and distinct groups. Rather, the group of auditors consists of representatives from certified repositories.
“I'm really interested to see what happens when the two groups who really disagreed in my previous study are no longer in separate, distinct groups,” Rebecca said. “The goal of this research, much like the previous research, is to understand how stakeholders in this process construct their understanding of risk.”
Rebecca is now conducting interviews and collecting documents for further analysis. While her previous research found that the standard developers and auditors believe that repositories demonstrate trustworthiness, repository staff view the process more as performative. Preliminary findings from the current research suggest that this is not the case, so the next step will be to dig into why that is. But what does this mean in the wider context of digital preservation?
“It means that we've identified risks, we've documented how we'll respond to them — but we haven't really taken into account the fact that people aren't going to behave the same way,” Rebecca said.
This research suggests that we need to improve the way we approach digital preservation and how we decide who’s capable of taking care of such important data, she explained. In particular, we need to look more closely at the factors that influence risk and in turn the reliability of data repositories and their certification.
While Rebecca is inevitably very critical of certification systems in her work, she believes that they do make repositories more trustworthy even if the processes currently in place aren’t perfect. “My belief is that by going through the process of certification, repositories are better for it,” she said.
This viewpoint on the importance of certification is echoed by Elsevier colleagues who work in partnership with several data preservation organizations, including CLOCKSS(opens in new tab/window).
Gwen Evans(opens in new tab/window), VP of Global Library Relations at Elsevier and a member of the CLOCKSS advisory board, said:
Elsevier is deeply committed to the permanent availability and preservation of the scholarly record so that it endures for the benefit of future researchers. We work in tandem with librarians and other publishers through trusted partners such as Portico, CLOCKSS, DANS (Data Archiving and Networked Services) and the National Library of the Netherlands to deposit our content. This takes ongoing resources and commitment behind the scenes by all parties to ensure that content is protected and is available no matter how the technology of access changes going forward.
It’s clear that digital repositories and certification are vital for data preservation and that we need to continually re-examine how we define the related risk. So what happens next?
After completing her current research, Rebecca plans to look more closely at disparities that may exist within the data preservation community:
What I'm hoping to do long term is to think about how risk is constructed among repositories that want to demonstrate trustworthiness but can't go through a formal audit process. All of these processes are time-consuming and costly, and the repositories that are certified exist largely at well-funded institutions in the Global North. I'm really interested to think about how this all plays out in organizations with fewer resources.