My digital preservation Christmas wish list

All I want for Christmas is a digital archive.

By paparutzi on Flickr CC BY 2.0
Since I started at the Borthwick Institute for Archives I have been keen to adopt a digital preservation solution. Up until this point, exploratory work on the digital archive has been overtaken by other priorities, perhaps the most important of these being an audit of digital data held at the Borthwick and an audit of research data management practices across the University. The outcome is clear to me – we hold a lot of data and if we are to manage this data effectively over time, a digital archiving system is required.

In a talk at the SPRUCE end of project workshop a couple of weeks ago both Ed Fay and Chris Fryer spoke about the importance of the language that we use when we talk about digital archiving. This is a known problem for the digital preservation community and one I have myself come up against on a number of different levels.

In an institution relatively new to digital preservation the term ‘digital archiving’ can mean a variety of different things and on the most basic IT level it implies static storage, a conceptual box we can put data in, a place where we put data when we have finished using it, a place where data will be stored but no longer maintained.

Those of us who work in digital preservation have a different understanding of digital archiving. We see digital archiving as the continuous active management of our digital assets, the curation of data over its whole life cycle, the systems that ensure data remains not only preserved, but fit for reuse over the long term. Digital archiving is more than just storage and needs to encompass activities as described within the Open Archival Information System reference model such as preservation planning and data management. Storage should be seen as just one part of a digital preservation solution.

To this end, and to inform discussions about what digital preservation really is, I pulled together a list of digital preservation requirements which any digital preservation system or software should be assessed against. This became my wish list for a digital preservation system. I do not really expect to have a system such as this unwrapped and ready to go on Christmas morning this year but may-be some time in the future!

In order to create this list of requirements I looked at the OAIS reference model and the main functional entities within this model. The list below is structured around these entities. 

I also bravely revisited ISO16363: Audit and Certification of Trustworthy Digital Repositories. This is the key (and most rigorous) certification route for those organisations who would like to become Trusted Digital Repositories. It goes into great detail about some of the activities which should be taking place within a digital archive and many of these are processes which would be most effectively carried out by an automated system built into the software or system on which the digital archive runs.

This list of requirements I have come up with has a slightly different emphasis from other lists of this nature due to the omission of the OAIS entity for Access. 

Access should be a key part of any digital archive. What is the point of preserving information if we are not going to allow others to access it at some point down the line? However, at York we already have an established system for providing access to digital data in the shape of York Digital Library. Any digital preservation system we adopt would need to build on and work alongside this existing repository not replace it. 

Functional requirements for access have also been well articulated by colleagues at Leeds University as part of their RoaDMaP project and I was keen not to duplicate effort here.

As well as helping to articulate what I actually mean when I talk about my hypothetical ‘digital archive’, one of the purposes of this is to provide a grid for comparing the functionality of different digital preservation systems and software.

Thanks to Julie Allinson and Chris Fryer for providing comment thus far. Chris's excellent case study for the SPRUCE project helped inform this exercise.

My requirements are listed below. Feedback is most welcome


The digital archive will enable us to record/store administrative information relating to the Submission Information Package (information and correspondence relating to receipt of the SIP)
The digital archive will include a means for recording decisions regarding selection/retention/disposal of material from the Submission Information Package
The digital archive will be able to identify and characterise data objects (where appropriate tools exist)
The digital archive will be able to validate files (where appropriate tools exist)
The digital archive will support automated extraction of metadata from files
The digital archive will incorporate virus checking as part of the ingest process
The digital archive will be able to record the presence and location of related physical material

The digital archive will generate persistent, unique internal identifiers
The digital archive will ensure that preservation description information (PDI) is persistently associated with the relevant content information. The relationship between a file and its metadata/documentation should be permanent
The digital archive will support the PREMIS metadata schema and use it to store preservation metadata
The digital archive will enable us to describe data at different levels of granularity – for example metadata could be attached to a collection, a group of files or an individual file
The digital archive will accurately record and maintain relationships between different representations of a file (for example, from submitted originals to dissemination and preservation versions and subsequent migrations)
The digital archive must store technical metadata extracted from files (for example that created as part of the ingest process)

The digital archive will allow preservation plans (such as file migration or refreshment) to be enacted on individual or groups of files.
Automated checking of significant properties of files will be carried out post-migration to ensure they are preserved (where tools exist).
The digital archive will record actions, migrations and administrative processes that occur whilst the digital objects are contained within the digital archive

The digital archive will allow for disposal of data where appropriate. A record must be kept of this data and when disposal occurred
The digital archive will have reporting capabilities so statistics can be gathered on numbers of files, types of files etc.

The digital archive will actively monitor the integrity of digital objects with the use of checksums
Where problems of data loss or corruption occur, The digital archive will have a reporting/notification system to prompt appropriate action

The digital archive will be able to connect to, and support a range of storage systems

The digital archive will be compliant with the Open Archival Information System (OAIS) reference model
The digital archive will integrate with the access system/repository
The digital archive will have APIs or other services for integrating with other systems
The digital archive will be able to incorporate new digital preservation tools (for migration, file validation, characterisation etc) as they become available
The digital archive will include functionality for extracting and exporting the data and associated metadata in standards compliant formats
The software or system chosen for the digital archive will be supported and technical help will be available
The software or system chosen for the digital archive will be under active development

Jenny Mitcham, Digital Archivist


  1. A wonderful resource! You have helped me to get a head-start in pin-pointing exactly what we require for our hypothetical digital archive - something I shall be talking to developers about on Monday.

    Jenny Mitcham, Digital Archivist


Post a Comment

Popular posts from this blog

How can we preserve Google Documents?

Preserving emails. How hard can it be?

Checksum or Fixity? Which tool is for me?