Today we have published our third and final Filling the Digital Preservation Gap report.
The report can be accessed from Figshare: https://dx.doi.org/10.6084/m9.figshare.4040787
This report details work the team at the Universities of York and Hull have been carrying out over the last six months (from March to September 2016) during phase 3 of the project.
The first section of the report focuses on our implementation work. It describes how each institution has established a proof of concept implementation of Archivematica integrated with other systems used for research data management. As well as describing how these implementations work it also discusses future priorities and lessons learned.
The second section of the report looks in more detail at the file format problem for research data. It discusses DROID profiling work that has been carried out over the course of the project (both for research data and other data types) and signature development to increase the number of research data signatures in the PRONOM registry. In recognition of the fact that this is an issue that can only be solved as a community, it also includes recommendations for a variety of different stakeholder groups.
The final section of the report details the outreach work that we have carried out over the course of this final phase of the project. It has been a real pleasure to have been given an opportunity to speak about our work at so many different events and to such a variety of different groups over the last few months!
The last of this run of events in our calendars is the final Jisc Research Data Spring showcase in Birmingham tomorrow (20th October). I hope to see you there!
Wednesday, 19 October 2016
Tuesday, 11 October 2016
|A lovely view of the mountains from Bern|
This year the conference was held in the Swiss town of Bern. A great place to be based for the week - fantastic public transport, some lovely little restaurants and cellar bars, miles of shopping arcades, bizarre statues and non-stop sunshine!
There was so much content over the course of the 4 days that it is impossible to cover it all in one blog post. Instead I offer up a selection of highlights and takeaway thoughts.
Jeremy York from the University of Michigan gave an excellent paper about ‘The Stewardship Gap’. An interesting project with the aim of understanding the gap between valuable digital data and long term curation. Jeremy reported on the results of a series of interviews with researchers at his institution where they were asked about the value of the data they created and their plans for longer term curation. A theme throughout the paper was around data value and how we assess this. Most researchers interviewed felt that their data did have long term value (and were able to articulate the reasons why). Most of the respondents expressed an intention to preserve the data for the longer term but did not have any concrete plans as to how they would achieve this. It was not yet clear to the project whether an intention to preserve actually leads to deposit with a repository or not. Work on this project is ongoing and I’ll look forward to finding out more when it is available.
|Bern at night|
Firstly a poster from the University of Illinois at Urbana-Champaign entitled Should We Keep Everything Forever?: Determining Long-Term Value of Research Data.
The poster discussed an issue that we have also been grappling with recently as part of Filling the Digital Preservation Gap, that of the value of research data. It proposed an approach to assessing the value of content within the Illinois Data Bank using automated methods and measurable criteria. Recognising that a human eye is also important in assessing value, it would highlight those datasets that appear to have a low value which can then be assessed in a more manual fashion. This pragmatic two-stage approach will ensure that data thought to be of low value can be discarded after 5 years but that time intensive manual checking of datasets is kept to a minimum. This is a useful model that I would like to hear more about once they get it fully established. There was a lot of buzz around this poster and I wasn’t surprised to see it shortlisted for the best poster award.
Another excellent poster (and worthy winner of the best poster award) was To Act or Not to Act - Handling File Format Identification Issues in Practice. This poster from ETH Zurich described how the institution handles file identification and validation errors within their digital archive and showed some worked examples of the types of problems they encountered. This kind of practical sharing of the nuts and bolts of digital preservation is really good to see, and very much in line with the recommendations we are making as part of Filling the Digital Preservation Gap. As well as finding internal solutions to these problems I hope that ETH Zurich are also passing feedback to the tool providers to ensure that the tools work more effectively and efficiently for other users. It is this feedback loop that is so important in helping the discipline as a whole progress.
|OAIS panel session in full swing|
It was really useful to hear about the discussions about OAIS that have been held internationally and of course interesting to note the common themes recurring throughout – for example around the desire for a pre-ingest step within the model, the need to firm up the reference model to accommodate changes to AIPs that may occur through re-ingest, and around the need for openness with regard to audit and certification standards.
This session was a great example of an international collaboration to help shape the standards that we rely so much on. I do hope that the feedback from our community is given full consideration in the revised OAIS Reference Model.
|Me and Steve presenting our paper|
(image from @shirapeltzman)
One of the most interesting sessions of the conference for me was one that was devoted to ingest tools and methods. At a conference such as this, I'm always drawn to the sessions that focus on practical tools and first hand experiences of doing things rather than the more theoretical strands so this one was an obvious choice for me. First we had Bruno Ferreira from KEEP SOLUTIONS talking about the Database Preservation Toolkit (more about this toolkit later). Then "Exploring Friedrich Kittler's Digital Legacy on Different Levels: Tools to Equip the Future Archivist" by Jurgen Enge and Heinz Werner Kramski from the University of Art and Design in Basel.
It was fascinating to see how they have handled the preservation of a large, diverse and complex digital legacy and overcome some of the challenges and hurdles that this has thrown at them. The speakers also made the point that the hardware itself is important evidence in its physical form, showing for instance how regularly Friedrich Kittler clearly used the reset button on his PC!
|Conference delegates relaxing on the terrace|
Brent West from the University of Illinois followed on with another interesting presentation on Processing Capstone Email using Predictive Coding. This talk focused on the problems of making appraisal decisions and sensitivity assessments for email and how a technology assisted review could help, enabling the software to learn from human decisions that are made and allowing human effort to be reduced and targeted. Again, I felt that this sort of work could be really useful to me in the future if I am faced with the task of e-mail preservation at scale.
|A very expensive gin and tonic!|
On Wednesday afternoon I took part in a workshop called OSS4PRES 2.0: Building Bridges and Filling Gaps – essentially a follow on from a workshop called Using Open-Source Tools to Fulfill Digital Preservation Requirements that I blogged about from iPRES last year. This was one of those workshops where we were actually expected to work (always a bit of a shock after lunch!) and the participants split into groups to address 3 different areas. One group was looking at gaps in the open source digital preservation tool set that we should be looking to fill (either by enhancing existing tools or with the development of new tools). Another group was working on drawing up a set of guidelines for providers of open source tools. The group I was in was thinking about the creation of a community space for sharing digital preservation workflows. This is something that I think could turn into a really valuable resource for practitioners who want to see how others have implemented tools. All the groups came out with lots of ideas and an action plan by the end of the afternoon and work in these areas is scheduled to continue outside of the workshop. Great to see a workshop that is more than just a talking shop but that will lead to some more concrete results.
|My laptop working hard at the Database |
|iPRES2016 and Swisse Toy 2016 meet outside the venue|
On the last afternoon of the conference it was good to be able to pop into the Swiss Toy Fair which was being held at the same venue as the conference. A great opportunity to buy some presents for the family before heading back to the UK.