Digital Archiving at the University of York: January 2017

Tuesday, 24 January 2017

Creating an annual accessions report using AtoM

So, it is that time of year where we need to complete our annual report on accessions for the National Archives. Along with lots of other archives across the UK we send The National Archives summary information about all the accessions we have received over the course of the previous year. This information is collated and provided online on the Accessions to Repositories website for all to see.

The creation of this report has always been a bit time consuming for our archivists, involving a lot of manual steps and some re-typing but since we have started using AtoM as our Archival Management System the process has become much more straightforward.

As I've reported in a previous blog post, AtoM does not do all that we want to do in the way of reporting via it's front end.

However, AtoM has an underlying MySQL database and there is nothing to stop you bypassing the interface, looking at the data behind the scenes and pulling out all the information you need.

One of the things we got set up fairly early in our AtoM implementation project was a free MySQL client called Squirrel. Using Squirrel or another similar tool, you can view the database that stores all your AtoM data, browse the data and run queries to pull out the information you need. It is also possible to update the data using these SQL clients (very handy if you need to make any global changes to your data). All you need initially is a basic knowledge of SQL and you can start pulling some interesting reports from AtoM.

The downside of playing with the AtoM database is of course that it isn't nearly as user friendly as the front end.

It is always a bit of an adventure navigating the database structure and trying to work out how the tables are linked. Even with the help of an Entity Relationship Diagram from Artefactual creating more complex queries is ...well ....complex!

AtoM's database tables - there are a lot of them!

However, on a positive note, the AtoM user forum is always a good place to ask stupid questions and Artefactual staff are happy to dive in and offer advice on how to formulate queries. I'm also lucky to have help from more technical colleagues here in Information Services (who were able to help me get Squirrel set up and talking to the right database and can troubleshoot my queries) so what follows is very much a joint effort.

So for those AtoM users in the UK who are wrestling with their annual accessions report, here is a query that will pull out the information you need:

SELECT accession.identifier, accession.date, accession_i18n.title, accession_i18n.scope_and_content, accession_i18n.received_extent_units,

accession_i18n.location_information, case when cast(event.start_date as char) like '%-00-00' then left(cast(event.start_date as char),4)

else cast(event.start_date as char)

end as start_date,

case when cast(event.end_date as char) like '%-00-00' then left(cast(event.end_date as char),4)

else cast(event.end_date as char)

end as end_date,

event_i18n.date

from accession

LEFT JOIN event on event.object_id=accession.id

LEFT JOIN event_i18n on event.id=event_i18n.id

JOIN accession_i18n ON accession.id=accession_i18n.id

where accession.date like '2016%'

order by identifier

A couple of points to make here:

In a previous version of the query, we included some other tables so we could also capture information about the creator of the archive. The addition of the relation, actor and actor_i18n tables made the query much more complicated and for some reason it didn't work this year. I have not attempted to troubleshoot this in any great depth for the time being as it turns out we are no longer recording creator information in our accessions records. Adding a creator record to an accessions entry creates an authority record for the creator that is automatically made public within the AtoM interface and this ends up looking a bit messy (as we rarely have time at this point in the process to work this into a full authority record that is worthy of publication). Thus as we leave this field blank in our accession record there is no benefit in trying to extract this bit of the database.
In an earlier version of this query there was something strange going on with the dates that were being pulled out of the event table. This seemed to be a quirk that was specific to Squirrel. A clever colleague solved this by casting the date to char format and including a case statement that will list the year when there's only a year and the full date when fuller information has been entered. This is useful because in our accession records we enter dates to different levels.

So, once I've exported the results of this query, put them in an Excel spreadsheet and sent them to one of our archivists, all that remains for her to do is to check through the data, do a bit of tidying up, ensure the column headings match what is required by The National Archives and the spreadsheet is ready to go!

Jenny Mitcham, Digital Archivist

Wednesday, 4 January 2017

Hello 2017

Looking back

2016 was a busy year.

I can tell that from just looking at my untidy desk...I was going to include a photo at this point but that would be too embarrassing.

The highlights of 2016 for me were getting our AtoM catalogue released and available to the world in April, completing Filling the Digital Preservation Gap (and seeing the project move from the early 'thinking' phases to actual implementation) and of course having our work on this project shortlisted in the Research and Innovation category of the Digital Preservation Awards.

...but other things happened too. Blogging really is a great way of keeping track of what I've been working on and of course what people are most interested to read about.

The top 5 most viewed posts from 2016 on this blog have been as follows:

Research Data - what does it *really* look like? - A post describing my (not entirely successful) efforts to automatically identify the file formats of research data deposited with Research Data York using DROID. This post spawned other similar posts profiling data using DROID and the cumulative value of all of these profiles is gradually increasing over time. I'm still keen to follow this up with a comparison using the born digital data that we hold at the Borthwick Institute so hopefully that is something for 2017.
A is for AtoM - An A-Z (actually I only got to 'Y'!) of implementing AtoM at the Borthwick. This post covers some of the problems and issues we have had to address and decisions we have made as we have gone through the process of getting our new archival management system up and running.
Modelling Research Data with PCDM - A guest post by Julie Allinson on some thinking carried out as part of the implementation work for Filling the Digital Preservation Gap project. The post describes some preliminary work to define a data model for datasets using the Portland Common Data Model.
Why AtoM? - A look back at why we selected AtoM for our archival management system and how it meets our requirements. This post was in response to a question I was frequently asked and hopefully is useful to others who are going through a similar selection process.
From Old York to New York: PASIG 2016 - Quite a long summary of the highlights of the PASIG conference that I attended in New York in October 2016. There was some fantastic content at this event and my post really just scrapes the surface of this!

Looking forward

So what is on the horizon for 2017?

Here are some of the things I'm going to be working on - expect blog posts on some or all of these things as the year progresses.

AtoM

I blogged about AtoM a fair bit last year as we prepared our new catalogue for release in the wild! I expect I'll be talking less about AtoM this year as it becomes business as usual at the Borthwick, but don't expect me to be completely silent on this topic.

A group of AtoM users in the UK is sponsoring some development work within AtoM to enable EAD to be harvested via OAI-PMH. This is a very exciting new collaboration and will see us being able to expose our catalogue entries to the wider world, enabling them to be harvested by aggregators such as the Archives Hub. I'm very much looking forward to seeing this take shape.

This year I'm also keen to explore the Locations functionality of AtoM to see whether it is fit for our purposes.

Archivematica

Work with Archivematica is of course continuing.

Post Filling the Digital Preservation Gap at York we are working on moving our proof of concept into production. We are also continuing our work with Jisc on the Research Data Shared Service. York is a pilot institution for this project so we will be improving and refining our processes and workflows for the management and preservation of research data through this collaboration.

Another priority for the year is to make progress with the preservation of the born digital data that is held by the Borthwick Institute for Archives. Over the year we will be planning a different set of Archivematica workflows specifically for the archives. I'm really excited about seeing this take shape.

We are also thrilled to be hosting the first European ArchivematiCamp here in York in the Spring. This will be a great opportunity to get current and potential Archivematica users across the UK and the rest of Europe together to share experiences and find out more about the system. There will no doubt be announcements about this over the next couple of months once the details are finalised so watch this space.

Ingest processes

Last year a new ingest PC arrived on my desk. I haven't yet had much chance to play with this but the plan is to get this set up for digital ingest work.

I'm keen to get BitCurator installed and to refine our current digital ingest procedures. After some useful chats about BitCurator with colleagues in the UK and the US over 2016 I'm very much looking forward to getting stuck into this.

...but really the first challenge of 2017 is to tidy my desk!