iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements

As promised by the conference hosts it
was definitely Autumn in Chapel Hill!
Last week I was lucky enough to be at the iPRES conference.

iPRES is the international conference on digital preservation and is exactly the sort of conference I should be at (though somehow I have managed to miss the last 4 years). The conference was generally a fantastic opportunity to meet other people doing digital preservation and share experiences. Regardless of international borders, we are all facing very similar problems and grappling with the same issues.

Breakfast as provided at Friday's workshop
iPRES 2015 was in Chapel Hill, North Carolina this year. Jetlag aside (I gave up in the end and decided to maintain a more European concept of time) it was a really valuable experience. The large quantities of cakes, pastries and bagels also helped - hats off to the conference hosts for this!

One of the most useful sessions for me was Friday's workshop on ‘Using Open-Source Tools to Fulfill Digital Preservation Requirements’. This workshop was billed as a space to talk about open-source software and share experiences about implementing open-source solutions. As well as listening to a really interesting set of talks from others, it also gave me a valuable opportunity to talk about the Jisc “Filling the Digital Preservation Gap” project to an international audience.

Archivematica featured very heavily in the scheduled talks, Other tools such as Archivespace, Islandora and BitCurator (and BitCurator Access) were also discussed so it was good to learn more about them.

Of particular interest was an announcement from Sam Meister of the Educopia Institute about a project proposal called OSSArcFlow. This project will attempt to help institutions combine open source tools in order to meet their institutional needs. It will look at issues such as how systems can be combined and how integration and hand-offs (such as transfer of metadata) can be successfully established. They will be working directly with 11 partner institutions but the lessons learned (including workflow models, guidance and training) will be available to other interested partners. This project sounds really valuable and of relevance to the work we are currently doing in our "Filling the Digital Preservation Gap" project.

The workshop was held in the Sonja Haynes
 Center for Black Culture and History
Some highlights and takeaway thoughts from the contributed talks:
  • Some great ongoing work with Archivematica was described by Andrew Berger of the Computer History Museum in California. He mentioned that the largest file he has ingested so far is 320GB and that he has also successfully ingested 17,000 in one go. The material he is working with spans 40 years and includes lots of unidentified files. Having used Archivematica for real for 6 months now, he feels he understands what each microservice is doing and has had some success with troubleshooting problems.
  • Ben Fino-Radin from the Museum of Modern Art reported that the have ingested 20TB in total using Archivematica, the largest file being 580GB. He anticipates that soon they will be attempting to ingest larger files than this. He uses Archivematica with high levels of automation. The only time he logs in to the Archivematica dashboard is to change a policy - he doesn't watch the ingest process and see the microservices running. From my perspective this is great to know as this high level of automation is something we are keen to establish at York  for our institutional research data workflows.
  • Bonnie Gordon from the Rockefeller Archive Center talked about their work integrating Archivematica with ArchivesSpace. This integration was designed to pass rights and technical metadata from Archivematica to ArchivesSpace through automated processes.
  • Cal Lee from the University of North Carolina talked to us about BitCurator - now this is tool I would really like to get playing with. I'm holding back until project work calms down, but I could see that it would be useful to use BitCurator as an initial step before data is ingested into Archivematica.
  • Mark Leggott from University of Prince Edward Island talked about Islandora and also put out a general plea to everyone to find a way to support or contribute to an open source project. This is an idea I very much support! Although open source tools are freely available for anyone to use, this doesn't mean that we should just use them and give nothing back. Even if a contribution can not be made technically or financially, it could just be done through advocacy and publicity.
  • Me talking about "Filling the Digital Preservation Gap" - can I be one of my own highlights or is that bad form?
  • Courtney Mumma spoke on behalf of Artefactual Systems and gave us a step by step walk through of how to create a new Format Policy Rule in Archivematica. This was useful to see as it is not something I have ever attempted. Good to note also that instructions are available here.
  • Mike Shallcross and Max Eckard from Bentley Historical Library at the University of Michigan talked about their Mellon funded project to integrate Archivematica and ArchivesSpace in an end-to-end workflow that also includes the deposit of content into a DSpace repository. This project should be of great interest to any institution who is using Archivematica due to the enhancements that are being made to the interface. A new appraisal and arrangement tab will enable digital curators to see in a more interactive and visual way which file types are represented within the archive, tag files to aid arrangement and view a variety of reports. This project is a good example of open source tools working alongside each other, all fulfilling very specific functions.
  • Kari Smith from MIT Libraries is using BitCurator alongside Archivematica for ingest and described some of the challenges of establishing the right workflows and levels of automation. Here's hoping some of the work of the proposed OSSArcFlow project will help with these sorts of issues.
  • Nathan Tallman of the University of Cincinnati Libraries is working with Fedora and Hydra along with other systems and is actively exploring Archivematica. He raised some interesting issues and questions about scalability of systems, how many copies of the data we need to keep (and the importance of getting this right), whether we should reprocess whole AIPs just because of a small metadata change and how we make sensible and pragmatic appraisal decisions. He reminded us all of how complicated and expensive this all is and how making the wrong decisions can impact in a big way on an organisation's budget.
I had to leave the workshop early to catch a flight home, but before I left was able to participate in an interesting breakout discussion about the greatest opportunities and challenges of using open source tools for digital curation and the gaps that we see in the current workflows.

Goodbye iPRES and I very much hope to be back next year!

Jenny Mitcham, Digital Archivist


Popular posts from this blog

How can we preserve Google Documents?

Preserving emails. How hard can it be?

Checksum or Fixity? Which tool is for me?