Showing posts with label SPRUCE. Show all posts
Showing posts with label SPRUCE. Show all posts

Wednesday, 11 December 2013

My digital preservation Christmas wish list

All I want for Christmas is a digital archive.

By paparutzi on Flickr CC BY 2.0
Since I started at the Borthwick Institute for Archives I have been keen to adopt a digital preservation solution. Up until this point, exploratory work on the digital archive has been overtaken by other priorities, perhaps the most important of these being an audit of digital data held at the Borthwick and an audit of research data management practices across the University. The outcome is clear to me – we hold a lot of data and if we are to manage this data effectively over time, a digital archiving system is required.

In a talk at the SPRUCE end of project workshop a couple of weeks ago both Ed Fay and Chris Fryer spoke about the importance of the language that we use when we talk about digital archiving. This is a known problem for the digital preservation community and one I have myself come up against on a number of different levels.

In an institution relatively new to digital preservation the term ‘digital archiving’ can mean a variety of different things and on the most basic IT level it implies static storage, a conceptual box we can put data in, a place where we put data when we have finished using it, a place where data will be stored but no longer maintained.

Those of us who work in digital preservation have a different understanding of digital archiving. We see digital archiving as the continuous active management of our digital assets, the curation of data over its whole life cycle, the systems that ensure data remains not only preserved, but fit for reuse over the long term. Digital archiving is more than just storage and needs to encompass activities as described within the Open Archival Information System reference model such as preservation planning and data management. Storage should be seen as just one part of a digital preservation solution.

To this end, and to inform discussions about what digital preservation really is, I pulled together a list of digital preservation requirements which any digital preservation system or software should be assessed against. This became my wish list for a digital preservation system. I do not really expect to have a system such as this unwrapped and ready to go on Christmas morning this year but may-be some time in the future!

In order to create this list of requirements I looked at the OAIS reference model and the main functional entities within this model. The list below is structured around these entities. 

I also bravely revisited ISO16363: Audit and Certification of Trustworthy Digital Repositories. This is the key (and most rigorous) certification route for those organisations who would like to become Trusted Digital Repositories. It goes into great detail about some of the activities which should be taking place within a digital archive and many of these are processes which would be most effectively carried out by an automated system built into the software or system on which the digital archive runs.

This list of requirements I have come up with has a slightly different emphasis from other lists of this nature due to the omission of the OAIS entity for Access. 

Access should be a key part of any digital archive. What is the point of preserving information if we are not going to allow others to access it at some point down the line? However, at York we already have an established system for providing access to digital data in the shape of York Digital Library. Any digital preservation system we adopt would need to build on and work alongside this existing repository not replace it. 

Functional requirements for access have also been well articulated by colleagues at Leeds University as part of their RoaDMaP project and I was keen not to duplicate effort here.

As well as helping to articulate what I actually mean when I talk about my hypothetical ‘digital archive’, one of the purposes of this is to provide a grid for comparing the functionality of different digital preservation systems and software.


Thanks to Julie Allinson and Chris Fryer for providing comment thus far. Chris's excellent case study for the SPRUCE project helped inform this exercise.

My requirements are listed below. Feedback is most welcome


#
Requirement

INGEST
I1
The digital archive will enable us to record/store administrative information relating to the Submission Information Package (information and correspondence relating to receipt of the SIP)
I2
The digital archive will include a means for recording decisions regarding selection/retention/disposal of material from the Submission Information Package
I3
The digital archive will be able to identify and characterise data objects (where appropriate tools exist)
I4
The digital archive will be able to validate files (where appropriate tools exist)
I5
The digital archive will support automated extraction of metadata from files
I6
The digital archive will incorporate virus checking as part of the ingest process
I7
The digital archive will be able to record the presence and location of related physical material



DATA MANAGEMENT
DM1
The digital archive will generate persistent, unique internal identifiers
DM2
The digital archive will ensure that preservation description information (PDI) is persistently associated with the relevant content information. The relationship between a file and its metadata/documentation should be permanent
DM3
The digital archive will support the PREMIS metadata schema and use it to store preservation metadata
DM4
The digital archive will enable us to describe data at different levels of granularity – for example metadata could be attached to a collection, a group of files or an individual file
DM5
The digital archive will accurately record and maintain relationships between different representations of a file (for example, from submitted originals to dissemination and preservation versions and subsequent migrations)
DM6
The digital archive must store technical metadata extracted from files (for example that created as part of the ingest process)



PRESERVATION PLANNING
PP1
The digital archive will allow preservation plans (such as file migration or refreshment) to be enacted on individual or groups of files.
PP2
Automated checking of significant properties of files will be carried out post-migration to ensure they are preserved (where tools exist).
PP3
The digital archive will record actions, migrations and administrative processes that occur whilst the digital objects are contained within the digital archive



ADMINISTRATION
A1
The digital archive will allow for disposal of data where appropriate. A record must be kept of this data and when disposal occurred
A2
The digital archive will have reporting capabilities so statistics can be gathered on numbers of files, types of files etc.



ARCHIVAL STORAGE
AS1
The digital archive will actively monitor the integrity of digital objects with the use of checksums
AS2
Where problems of data loss or corruption occur, The digital archive will have a reporting/notification system to prompt appropriate action

AS3
The digital archive will be able to connect to, and support a range of storage systems



GENERAL
G1
The digital archive will be compliant with the Open Archival Information System (OAIS) reference model
G2
The digital archive will integrate with the access system/repository
G3
The digital archive will have APIs or other services for integrating with other systems
G4
The digital archive will be able to incorporate new digital preservation tools (for migration, file validation, characterisation etc) as they become available
G5
The digital archive will include functionality for extracting and exporting the data and associated metadata in standards compliant formats
G6
The software or system chosen for the digital archive will be supported and technical help will be available
G7
The software or system chosen for the digital archive will be under active development



Jenny Mitcham, Digital Archivist

Tuesday, 26 November 2013

Fund it, Solve it, Keep it – a personal perspective on SPRUCE

Yesterday I attended the SPRUCE end of project event at the fabulous new Library of Birmingham. The SPRUCE project was lauded by Neil Grindley as one of the best digital preservation projects that JISC has funded and it is easy to see why. Over the 2 years it has run, SPRUCE has done for a great deal for the digital preservation community. Bringing together people to come up with solutions for some of our digital preservation problems being one of the most important of these. The SPRUCE project is perhaps most well-known for its mash-up events* but should also be praised for its involvement and leadership in other community based digital preservation initiatives such as the recently launched tool registry COPTR (more about this in a future blog post).
Library of Birmingham by KellyNicholls27 on Flickr

SPRUCE can’t fix all the problems of the digital preservation community but what it has done very effectively is what William Kilbride describes as “productive small scale problem solving”. 

This event was a good opportunity to learn more about some of the tools and resources that have come out of the SPRUCE project. 

I was interested to hear Toni Sant of the Malta Music Memory Project describing their tool for extracting data from audio cds that was made available last week. I have not had a chance to investigate this in any detail yet but think this could be exactly what we need in order to move us forward from our audit of audio formats at the Borthwick Institute earlier this year to a methodology for ensuring their long term preservation in line with the proposed 15 year digitisation strategy as described last month. Obviously this deals only with audio CDs so its scope is limited, but being that audio CDs are a high priority for digital preservation this is an important development.

Another interesting tool described by Eleonora Nicchiarelli at Nottingham University allowed them to put XMP metadata into the headers of TIF images produced by their digitisation team. This avoids the separation of the images from the contextual information that is so important in making sense of them.

It was also good to hear Ray Moore from the Archaeology Data Service talk about the ReACT tool (Resource Audit and Comparison), the proposal for which I wrote in my last few weeks at the ADS. A simple tool written in VBA with a friendly Excel GUI capable of automatically checking for the presence of related files in different directories. Originally created for those situations where you want to ensure a dissemination or preservation version of a file is present for each of your archival originals, it could have many use cases in alternative scenarios. As Ray articulated, “simple solutions are sometimes the best solutions”. Thanks due to Ray and Andrew Amato of LSE for seeing that project through.

Chris Fryer of Northumberland Estates described some great work he has done (along with Ed Pinsent of ULCC) on defining digital preservation requirements and assessing a number of solutions against these requirements. He has produced a set of resources that could be widely re-used by others going through a similar process.

When I attended the first SPRUCE mash-up in Glasgow early last year participants did a bit of work on defining the business case for digital preservation in the context in of their own organisations and roles. At the time this seemed barely relevant to me, working as I was at the time within an organisation for which digital preservation was its very reason for being and for which the business case had already been well defined using the Keeping Research Data Safe model. Since Glasgow I have moved to a different job within the University of York so it was useful yesterday to have a reminder of this work from Ed Fay who was able to summarise some of the key tools and techniques and highlight why a business case is so important in order to get senior buy in for digital preservation. This is something I need to go back to and review.  The recently published Digital Preservation Business Case Toolkit should be a great resource to help me with this. 

The need to have a well prepared elevator pitch to persuade senior managers that more resources should be put into digital preservation has also become more real for me. The one I wrote at the time in Glasgow was a good start but perhaps needs to be a little bit less tongue in cheek!


* as an ex-archaeologist I see SPRUCE mash-ups as being the digital preservation equivalent of Channel 4's Time Team but without the TV cameras, and with Paul Wheatley ably taking on  the role of Tony Robinson. Instead of 3 days to excavate an archaeological site we have 3 days to solve a selection of digital preservation problems and issues.



Jenny Mitcham, Digital Archivist

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...