Digital Archiving at the University of York: UK Archivematica meeting at Westminster School

Yesterday the UK Archivematica user group meeting was held in the historic location of Westminster School in central London.

A pretty impressive location for a meeting!
(credit: Elizabeth Wells)

In the morning once fuelled with tea, coffee and biscuits we set about talking about our infrastructures and workflows. It was great to hear from a range of institutions and how Archivematica fits into the bigger picture for them. One of the points that lots of attendees made was that progress can be slow. Many of us were slightly frustrated that we aren't making faster progress in establishing our preservation infrastructures but I think it was a comfort to know that we were not alone in this!

I kicked things off by showing a couple of diagrams of our proposed and developing workflows at the University of York. Firstly illustrating our infrastructure for preserving and providing access to research data and secondly looking at our hypothetical workflow for born digital content that comes to the Borthwick Institute.

Now our AtoM upgrade is complete and that Archivematica 1.7 has been released, I am hoping that colleagues can set up a test instance of AtoM talking to Archivematica that I can start to play with. In a parallel strand, I am encouraging colleagues to consider and document access requirements for digital content. This will be invaluable when thinking about what sort of experience we are trying to implement for our users. The decision is yet to be made around whether AtoM and Archivematica will meet our needs on their own or whether additional functionality is needed through an integration with Fedora and Samvera (the software on which our digital library runs)...but that decision will come once we better understand what we are trying to achieve and what the solutions offer.

Elizabeth Wells from Westminster School talked about the different types of digital content that she would like Archivematica to handle and different workflows that may be required depending on whether it is born digital or digitised content, whether a hybrid or fully digital archive and whether it has been catalogued or not. She is using Archivematica alongside AtoM and considers that her primary problems are not technical but revolve around metadata and cataloguing. We had some interesting discussion around how we would provide access to digital content through AtoM if the archive hadn't been catalogued.

Anna McNally from the University of Westminster reminded us that information about how they are using Archivematica is already well described in a webinar that is now available on YouTube: Work in Progress: reflections on our first year of digital preservation. They are using the PERPETUA service from Arkivum and they use an automated upload folder in NextCloud to move digital content into Archivematica. They are in the process of migrating from CALM to AtoM to provide access to their digital content. One of the key selling points of AtoM for them is it's support for different languages and character sets.

Chris Grygiel from the University of Leeds showed us some infrastructure diagrams and explained that this is still very much a work in progress. Alongside Archivematica, he is using BitCurator to help appraise the content and EPrints and EMU for access.

Rachel MacGregor from Lancaster University updated us on work with Archivematica at Lancaster. They have been investigating both Archivematica and Preservica as part of the Jisc Research Data Shared Service pilot. The system that they use has to be integrated in some way with PURE for research data management.

After lunch in the dining hall (yes it did feel a bit like being back at school),
Rachel MacGregor (shouting to be heard over the sound of the bells at Westminster) kicked off the afternoon with a presentation about DMAonline. This tool, originally created as part of the Jisc Research Data Spring project, is under further development as part of the Jisc Research Data Shared Service pilot.

It provides reporting functionality for a range of systems in use for research data management including Archivematica. Archivematica itself does not come with advanced reporting functionality - it is focused on the primary task of creating an archival information package (AIP).

The tool (once in production) could be used by anyone regardless of whether they are part of the Jisc Shared Service or not. Rachel also stressed that it is modular - though it can gather data from a whole range of systems, it could also work just with Archivematica if that is the only system you are interested in reporting on.

An important part of developing a tool like this is to ensure that communication is clear - if you don’t adequately communicate to the developers what you want it to do, you won’t get what you want. With that in mind, Rachel has been working collaboratively to establish clear reporting requirements for preservation. She talked us through these requirements and asked for feedback. They are also available online for people to comment on:

Go to jira.dmao.org and click on create an account to create your account
Then go to: https://confluence.dmao.org/display/DMAO/DMAonline
To see all the preservation requirements you can click on Feature requests and choose the option Preservation features

Sean Rippington from the University of St Andrews talked us through some testing he has carried out, looking at how files in SharePoint could be handled by Archivematica. St Andrews are one of the pilot organisations for the Jisc Research Data Shared Service, and they are also interested in the preservation of their corporate records. There doesn’t seem to be much information out there about how SharePoint and Archivematica might work together, so it was really useful to hear about Sean’s work.

He showed us inside a sample SharePoint export file (a .cmp file). It consisted of various office documents (the documents that had been put into SharePoint) and other metadata files. The office documents themselves had lost much of their original metadata - they had been renamed with a consecutive number and given a .DAT file extension. The date last modified had changed to the date of export from SharePoint. However, all was not lost, a manifest file was included in the export and contained lots of valuable metadata, including the last modified date, the filename, the file extension and the name of the person who created file and last modified it.

Sean tried putting the .cmp file through Archivematica to see what happens. He found that Archivematica correctly identified the MS Office files (regardless of change of file extension) but obviously the correct (original) metadata was not associated with the files. This continued to be stored in the associated manifest file. This has potential for confusing future users of the digital archive - the metadata gives useful context to the files and if hidden in a separate manifest file it may not be discovered.

Another approach he took was to use the information in the manifest file to rename the files and assign them with their correct file extensions before pushing them into Archivematica. This might be a better solution in that the files that will be served up in the dissemination information package (DIP) will be named correctly and be easier for users to locate and understand. However, this was a manual process and probably not scalable unless it could be automated in some way.

He ended with lots of questions and would be very glad to hear from anyone who has done further work in this area.

Hrafn Malmquist from the University of Edinburgh talked about his use of Archivematica’s appraisal tab and described a specfic use case for Archivematica which had specific requirements. The records of the University court have been deposited as born digital since 2007 and need to be preserved and made accessible with full text searching to aid retrieval. This has been achieved using a combination of Archivematica and DSpace and by adding a package.csv file containing appropriate metadata that can be understood by DSpace.

Laura Giles from the University of Hull described ongoing work to establish a digital archive infrastructure for the Hull City of Culture archive. They had an appetite for open source and prior experience with Archivematica so they were keen to use this solution, but they did not have the in-house resource to implement it. Hull are now working with CoSector at the University of London to plan and establish a digital preservation solution that works alongside their existing repository (Fedora and Samvera) and archives management system (CALM). Once this is in place they hope to use similar principles for other preservation use cases at Hull.

We then had time for a quick tour of Westminster School archives followed by more biscuits before Sarah Romkey from Artefactual Systems joined us remotely to update us on the recent new Archivematica release and future plans. The group is considering taking her up on her suggestion to provide some more detailed and focused feedback on the appraisal tab within Archivematica - perhaps a task for one of our future meetings.

Talking of future meetings ...we have agreed that the next UK Archivematica meeting will be held at the University of Warwick at some point in the autumn.

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Friday, 18 May 2018

UK Archivematica meeting at Westminster School

No comments:

Post a Comment

The sustainability of a digital preservation blog...

Twitter

Subscribe