Friday 10 February 2017

Harvesting EAD from AtoM: a collaborative approach

In a previous blog post AtoM harvesting (part 1) - it works! I described how archival descriptions within AtoM are being harvested as Dublin Core for inclusion within our University Library Catalogue.* I also hinted that this wouldn’t be the last you would hear from me on AtoM harvesting and that plans were afoot to enable much richer metadata in EAD 2002 XML (Encoded Archival Description) format to be harvested via OAI-PMH.

I’m pleased to be able to report that this work is now underway.

The University of York along with five other organisations in the UK have clubbed together to sponsor Artefactual Systems to carry out the necessary development work to make EAD harvesting possible. This work is scheduled for release in AtoM version 2.4 (due out in the Spring).

The work is being jointly sponsored by:



We are also receiving much needed support in this project from The Archives Hub who are providing advice on the AtoM EAD and will be helping us test the EAD harvesting when it is ready. While the sponsoring institutions are all producers of AtoM EAD, The Archives Hub is a consumer of that EAD. We are keen to ensure that the archival descriptions that we enter into AtoM can move smoothly to The Archives Hub (and potentially to other data aggregators in the future), allowing the richness of our collections to be signposted as widely as possible.

Adding this harvesting functionality to AtoM will enable The Archives Hub to gather data direct from us on a regular schedule or as and when updates occur, ensuring that:


  • Our data within the Archives Hub doesn’t stagnate
  • We manage our own master copy of the data and only need to edit this in one place
  • A minimum of human interaction is needed to incorporate our data into the Hub
  • It is easier for researchers to find information about the archives that we hold without having to search all of our individual catalogues


So, what are we doing at the moment?


  • Developers at Artefactual Systems are beavering away working on the initial development and getting the test site ready for us to play with.
  • The sponsoring institutions have been getting samples of their own AtoM data ready for loading up into the test deployment. It is always better when testing something to have some of your own data to mess around with.
  • The Borthwick have been having discussions with The Archives Hub for some time about AtoM EAD (from version 2.2) but we’ve picked up these discussions again and other institutions have joined in by supplying their own EAD samples. This allows staff at the Hub to see how EAD has changed in version 2.3 of AtoM (it hasn’t very much) and also to see how consistent the EAD from AtoM is from different institutions. We have been having some pretty detailed discussions about how we can make the EAD better, cleaner, fuller - either by data entry at the institutions, automated data cleaning at The Hub prior to display online or by further developments in AtoM.


What we are doing at the moment is good and a huge step in the right direction, but perhaps not perfect. As we work together on this project we are coming across areas where future work would be beneficial in order to improve the quality of the EAD that AtoM produces or to expand the scope of what can be harvested from AtoM. I hope to report on this in more detail at the end of the project, but in the meantime, do get in touch if you are interested in finding out more.







* It is great to see that this is working well and our Library Catalogue is now appearing in the referrer reports for the Borthwick Catalogue on Google Analytics. People are clearly following these new signposts to our archives!

Jenny Mitcham, Digital Archivist

No comments:

Post a Comment

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...