Permanent Objects, Disposable Systems

Thumbnail Image
Abrams, Stephen
Cruse, Patricia
Kunze, John
Associated Organization(s)
Organizational Unit
Supplementary to
The California Digital Library (CDL) preservation program is re-envisioning its curation infrastructure as a set of loosely-coupled, distributed micro-services. There are many monolithic systems that support a range of preservation activities but also require the user and the hosting institution to buy-in to a particular system culture. The result is an institution that becomes, say, a DSpace, Fedora, or LOCKSS "shop", with a specific worldview and set of object flows and structures that will eventually need to be abandoned when it comes time to transition to the next system. Experience shows that these transitions are unavoidable, despite claims that once an object is in the system, it will be safe forever. In view of this it is safer and more cost-effective to acknowledge from the outset the inevitable transient nature of systems and to plan on managing, rather than resisting change. The disruption caused by change can be mitigated by basing curation services on simple universal structures and protocols (e.g., filesystems, HTTP) and micro-services that operate on them. We promote a "mix and match" approach in which appropriate content- and context-specific curation workflows can be nimbly constructed by combining necessary functions drawn from a granular set of independent micro-services. Micro-services, whether deployed in isolation or in combination, are especially suited to exploitation upstream towards content creators who normally don't want to think about preservation, especially if it's costly; compared to buying into an entire curation culture, it is easy to adopt a small, inexpensive tool that requires very little commitment. We see digital curation as an ongoing process of enrichment at all stages in the lifecycle of a digital object. Because the early developmental stages are so critical to an object's health and longevity, it is desirable to push curation "best practices" as far upstream towards the object creators as possible. If preservation is considered only when objects are close to retirement, it is often too late to correct the structural and semantic deficiencies that can impair object usability. The later the intervention, the more expensive the correction process, and it is always difficult to fund interventions for "has been" objects. In contrast, early stage curation challenges traditional practices. Traditionally, preservation actions are often based on end-stage processing, where objects are deposited "as is" and kept out of harm's way by limiting access (i.e., dark archives). While some systems are designed to be dark or "dim", with limited access and little regard for versioning or object enrichment, enrichment and access are now seen as necessary curation actions, that is, interventions for the sake of preservation. In particular, the darkness of an entire collection can change in the blink of an eye, for example, as the result of a court ruling or access rights purchase; turning the lights on for a collection should be as simple as throwing a switch, and not require transferring the collection from a "preservation repository" to an "access repository". Effective curation services must be flexible and easily configurable in order to respond appropriately to the wide diversity of content and content uses. To be most effective, not only should curation practices be pushed upstream but also they should be pushed out to many different contexts. The micro-services approach promotes the idea that curation is an outcome, not a place. Curation actions should be applied to content where it most usefully exists for the convenience of its creators or users. For example, high value digital assets in access repositories, or even scholars' desktops, would certainly benefit from such things as persistent identification or regular audits to discover and repair bit-level damage, functions usually available only in the context of a "preservation system" but now easily applied to content where it most usefully resides without requiring transfer to a central location.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI