Showing posts with label ISAD(G). Show all posts
Showing posts with label ISAD(G). Show all posts

Tuesday, 15 November 2016

AtoM harvesting (part 1) - it works!

When we first started using Access to Memory (AtoM) to create the Borthwick Catalogue we were keen to enable our data to be harvested via OAI-PMH (more about this feature of AtoM is available in the documentation). Indeed the ability to do this was one of our requirements when we were looking to select a new Archival Management System (read about our system requirements here).

Look! Archives now available in Library Catalogue search
So it is with great pleasure that I can announce that we are now exposing some of our data from AtoM through our University Library catalogue YorSearch. Dublin Core metadata is automatically harvested nightly from our production AtoM instance - so we don't need to worry about manual updates or old versions of our data hanging around.

Our hope is that doing this will allow users of the Library Catalogue (primarily staff and students at the University of York) to happen upon relevant information about the archives that we hold here at the Borthwick whilst they are carrying out searches for other information resources.

We believe that enabling serendipitous discovery in this way will benefit those users of the Library Catalogue who may have no idea of the extent and breadth of our holdings and who may not know that we hold archives of relevance to their research interests. Increasing the visibility of the archives within the University of York is an useful way of signposting our holdings and we think this should bring benefits both to us and our potential user base.

A fair bit of thought (and a certain amount of tweaking within YorSearch) went into getting this set up. From the archives perspective, the main decision was around exactly what should be harvested. It was agreed that only top level records from the Borthwick Catalogue should be made available in this way. If we had enabled the harvesting of all levels of records, there was a risk that search results would have been swamped by hundreds of lower level records from those archives that have been fully catalogued. This would have made the search results difficult to understand, particularly given the fact that these results could not have been displayed in a hierarchical way so the relationships between the different levels would be unclear. We would still encourage users to go direct to the Borthwick Catalogue itself to search and browse lower levels of description.

It should also be noted that only a subset of the metadata within the Borthwick Catalogue will be available through the Library Catalogue. The metadata we create within AtoM is compliant with ISAD(G): General International Standard Archival Description which contains 26 different data elements. In order to facilitate harvesting using OAI-PMH, data within AtoM is mapped to simple Dublin Core and this information is available for search and retrieval via YorSearch. As you can see from the screen shot below, Dublin Core does allow a useful level of information to be harvested, but it is not as detailed as the original record.

An example of one of our archival descriptions converted to Dublin Core within YorSearch

Further work was necessary to change the default behaviour within Primo (the software that YorSearch runs on) which displayed results from the Borthwick Catalogue with the label Electronic resource. This is what it calls anything that is harvested as Dublin Core. We didn't think this would be helpful to users because even though the finding aid itself (within AtoM) is indeed an electronic resource, the actual archive that it refers to isn't. We were keen that users didn't come to us expecting everything to be digitised! Fortunately it was possible to change this label to Borthwick Finding Aid, a term that we think will be more helpful to users.
Searches within our library catalogue (YorSearch) now surface Borthwick finding aids, harvested from AtoM.
These are clearly labelled as Borthwick Finding Aids.


Click through to a Borthwick Finding Aid and you can see the full archival description in AtoM in an iFrame

Now this development has gone live we will be able to monitor the impact. It will be interesting to see whether traffic to the Borthwick Catalogue increases and whether a greater number of University of York staff and students engage with the archives as a result.

However, note that I called this blog post AtoM harvesting (part 1).

Of course that means we would like to do more.

Specifically we would like to move beyond just harvesting our top level records as Dublin Core and enable harvesting of all of our archival descriptions in full in Encoded Archival Description (EAD) - an XML standard that is closely modelled on ISAD(G).  This is currently not possible within AtoM but we are hoping to change this in the future.

Part 2 of this blog post will follow once we get further along with this aim...






Jenny Mitcham, Digital Archivist

Friday, 28 March 2014

Discovering archives: it's all about the standards

Yesterday at the UK Archives Discovery Forum we mostly talked about standards.*

Specifically metadata standards for resource discovery of archives, both physical and digital.. Standards are key to making archival data discoverable and of course this is our main reason for being - we preserve things so that they can be reused - they can only be reused if they can be discovered.

The day was really relevant to work we are currently doing at the Borthwick Institute, with the installation of a new archival management system (AtoM) underway and scoping work ongoing for a retroconversion project which will help us move our legacy catalogues into this new system - both major initiatives intended to make our catalogue data more widely discoverable.

Nick Poole from the Collections Trust talked about user focused design (both for physical buildings and digital interfaces), how we should avoid putting barriers between our users and the information we need. The gov.uk website is an obvious example of how this approach to design can work in a digital sphere and their design principles are on-line. This is something I think we can all learn from.

He also touched on the Open Data agenda and how the principles of making data ‘open by default’ are sometimes seen as being at odds with traditional models for income generation in the archives sector. Nick argues that by opening up data we are allowing more people to find us and making way for new opportunities and transactions as they engage further with the other services we have to offer.

He also mentioned that we can be ‘digitally promiscuous’ - making our data available in many different ways via many different platforms. We do not need to keep our data close to our chests but should be signposting what we have and drawing people in. We can only really do this if we make use of data standards. Standards help us to exchange and share our data and allow others to find and interpret it.

Jane Stevenson talked about the importance of standards to the Archives Hub.  Aggregating data from multiple sources would be very tricky if no-one used metadata standards. The problem is that the standards that we have are not perfect. Encoded Archival Description (EAD), the XML realisation of ISAD(G), can be too flexible and thus is realised in different ways by different institutions. Even those archives using CALM as their archival cataloguing system may have individual differences in how they use the metadata fields available to them. This does make life as an aggregator more challenging.

Once data is standardised into the Archives Hub flavour of EAD it can be transformed again into other data standards allowing it to be cross searchable beyond the UK archives sector. Jane touched on their work with RDF and linked data and the opportunities this can bring.

We should make use of opportunities to join the European stage. The Archives Hub are 'country manager' for Archives Portal Europe (APE) thus making it a simple matter for Hub contributors to push their data out beyond national borders. For those archival descriptions that link directly to a digital object, the opportunity exists to make this data available through Europeana. This takes our data beyond the archives sector, allowing our collections to be cross-searched alongside other European digital cultural heritage resources. In my mind, this really is the start of ‘digital promiscuity’ and an opportunity I feel we should be embracing (if we can accept the necessity to open up our metadata with a CC0 licence).

Geoff Browell from Kings College London talked about what we as archivists can offer our users over and above what they can get by visiting Google. He highlighted our years of experience at indexing data and pointed out that at approximately half of users of the AIM25 search interface appreciate our efforts in this area and use the index terms provided to browse for data in preference to the google-style free text search. He thinks that we should be talking more closely with both our users and the interface developers to ensure we are giving people what they need. He mentioned that delivery of data to users should be a conversation not a one-sided process.

The National Archives asked us for comment on a beta version of the new Discovery interface which will provide a new portal into selected UK archival holdings. They are encouraging conversation with users by encouraging ‘tagging’ of pages within the search interface.

Malcolm Howitt from Axiell discussed how systems and software can support standards. Standards is a topic that is often raised and they are asked to support many of them. They are keen to help where they can and need to work with the community to ensure that they know what is required of them. The different flavours of EAD was again raised as an issue but Malcolm pointed out that when standards work, the user doesn’t even need to be aware of them.

The National Archives, Kew, London by Jim Linwood on Flickr CC BY 2.0


Reflections

I think we are all in agreement that metadata standards are necessary and we need to work with them in order to make our catalogue data more visible. Some further issues were picked out in the final session of the day where attendees were invited to share their thoughts on the standards they use and the ones they would like to know more about.


  1. Do we need a standard for accessions data? Would this be a specific subset of ISAD(G) or does it need further definition? The next step in our planned implementation of AtoM is to populate it with accessions data from various different sources and I expect there will be some issues to deal with along the way as a result of lack of standards in this area.
  2. How do we describe digital material? Is ISAD(G) fit for this purpose? As born digital material becomes more and more prevalent in our collections this will become more of an issue. The use of PREMIS to hold technical preservation metadata will be essential alongside the resource discovery metadata but is this enough? This is undoubtedly an area for future exploration.
  3. Does the hierarchical nature of ISAD(G) and EAD hold us back? If we can’t create detailed resource discovery metadata for an archive until we know both the hierarchy and its place in the hierarchy does this slow us down in getting the information out there?



*…mostly standards - with the addition of a surprisingly entertaining session on copyright from Roman Deazley – check out the CREATe project for more on this topic


Jenny Mitcham, Digital Archivist

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...