Digital Archiving at the University of York: UK Archivematica group at Lancaster

Earlier this week UK Archivematica users descended on the University of Lancaster for our 5th user group meeting. As always it was a packed agenda, with lots of members keen to talk to the group and share their project plans or their experiences of working with Archivematica. Here are some edited highlights of the day. Also well worth a read is a blog about the day from our host which is better than mine because it contains pictures and links!

Rachel MacGregor and Adrian Albin-Clark from the University of Lancaster kicked off the meeting with an update on recent work to set up Archivematica for the preservation of research data. Adrian has been working on two Ruby gems to handle two specific parts of the workflow. The puree gem which gets metadata out of the PURE CRIS system in a format that it is easy to work with (we are big fans of this gem at York having used it in our phase 3 implementation work for Filling the Digital Preservation Gap). Another gem helps solve another problem, getting the deposited research data and associated data packaged up in a format that is suitable for Archivematica to ingest. Again, this is something we may be able to utilise in our own workflows.

Jasmin Boehmer, a student from Aberystwyth University presented some of the findings from the work she has been doing for her dissertation. She has been testing how metadata can be added to a Submission Information Package (SIP) for inclusion within an Archival Information Package (AIP) and has been looking at a range of different scenarios. It was interesting to hear her findings, particularly useful for those of us who haven’t managed to carry out systematic testing ourselves. She concluded that if you want to store descriptive metadata at a per file level within Archivematica you should submit this via a csv file as part of your SIP. If you only use the Archivematica interface itself for adding metadata, you can only do this on a per SIP basis rather than at file level. It was interesting to see that if you include rights metadata within your file level csv file this will not be stored within the PREMIS section of the XML as you might expect so this does not solve a problem we raised during our phase 1 project work for Filling the Digital Preservation Gap regarding ingesting a SIP with different rights recorded per file.

Jake Henry from the National Library of Wales discussed some work newly underway to build on the work of the ARCW digital preservation group. The project will enable members of ARCW to use Archivematica without having to install their own local version, using pydio as a means of storing files before transfer. As part of this project they are now looking at a variety of systems that they would like Archivematica to integrate with. They are hoping to work on an integration with CALM. There was some interest in this development around the room and I expect there would be many other institutions who would be keen to see this work carried out.

Kirsty Lee from the University of Edinburgh reported on her recent trip to the States to attend the inaugural ArchivematiCamp with her colleague Robin Taylor. It sounded like a great event with some really interesting sessions and discussions, particularly regarding workflows (recognising that there are many different ways you can use Archivematica) as well as some nice social events. We are looking forward to seeing an ArchivematiCamp in the UK next year!

Myself and Julie presented on some of the implementation work we have been doing over the last few months as we complete phase 3 of Filling the Digital Preservation Gap. Julie talked about what we were trying to achieve with our proof of concept implementation and then showed a screencast of the application itself. The challenges we faced and things that worked well during phase 3 were discussed before I summarised our plans for the future.

I went on to introduce the file formats problem (which I have previously touched on other blog posts) before taking the opportunity to pick people’s brains on a number of discussion points. I wanted to understand workflows around non identified files (not just for research data). I was interested to know three things really:

At what point would you pick up on unidentified file formats in a deposit - prior to using Archivematica or during the transfer stage within Archivematica?
What action would you take to resolve this situation (if any)?
Would you continue to ingest material into the archive whilst waiting for a solution, or keep files in the backlog until the correct identification could be made?

Answers from the floor suggested that one institution would always stop and carry out further investigations before ingesting the material and creating an Archival Information Package (AIP) but that most others would continue processing the data. With limited staff resource for curating research data in particular, it is possible that institutions will favour a fully automated workflow such as the one we have established in our proof of concept implementation, and regular interventions around file format identification may not be practical. Perhaps we need to consider how we can intervene in a sustainable and manageable way, rather than looking at each deposit of data separately. One of the new features in Archivematica is the AIP re-ingest which will allow you to pull AIPs back from storage so that tools (such as file identification) can be re-run - this was thought to be a good solution.

John Kaye from Jisc updated us on the Research Data Shared Service project. Archivematica is one of the products selected by Jisc to fulfill the preservation element of the Shared Service and John reported on the developments and enhancements to Archivematica that are proposed as part of this project. It is likely that these developments will be incorporated into the main code base thus be available to all Archivematica users in the future. The growth in interest in Archivematica within the research data community in the UK is only likely to continue as a result of this project.

Heather Roberts from the Royal Northern College of Music described where her institution is with digital preservation and asked for advice on how to get started with Archivematica. Attendees were keen to share their thoughts (many of which were not specific to Archivematica itself but would be things to consider whatever solution was being implemented) and Heather went away with some ideas and some further contacts to follow up with.

To round off the meeting we had an update and Q&A session with Sarah Romkey from Artefactual Systems (who is always cheerful no matter what time you get her out of bed for an transatlantic Skype call).

Some of the attendees even managed to find the recommended ice cream shop before heading back home!

We look forward to meeting at the University of Edinburgh next time.

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Friday, 16 September 2016

UK Archivematica group at Lancaster

No comments:

Post a Comment

The sustainability of a digital preservation blog...

Twitter

Subscribe