Friday, 28 March 2014

Discovering archives: it's all about the standards

Yesterday at the UK Archives Discovery Forum we mostly talked about standards.*

Specifically metadata standards for resource discovery of archives, both physical and digital.. Standards are key to making archival data discoverable and of course this is our main reason for being - we preserve things so that they can be reused - they can only be reused if they can be discovered.

The day was really relevant to work we are currently doing at the Borthwick Institute, with the installation of a new archival management system (AtoM) underway and scoping work ongoing for a retroconversion project which will help us move our legacy catalogues into this new system - both major initiatives intended to make our catalogue data more widely discoverable.

Nick Poole from the Collections Trust talked about user focused design (both for physical buildings and digital interfaces), how we should avoid putting barriers between our users and the information we need. The website is an obvious example of how this approach to design can work in a digital sphere and their design principles are on-line. This is something I think we can all learn from.

He also touched on the Open Data agenda and how the principles of making data ‘open by default’ are sometimes seen as being at odds with traditional models for income generation in the archives sector. Nick argues that by opening up data we are allowing more people to find us and making way for new opportunities and transactions as they engage further with the other services we have to offer.

He also mentioned that we can be ‘digitally promiscuous’ - making our data available in many different ways via many different platforms. We do not need to keep our data close to our chests but should be signposting what we have and drawing people in. We can only really do this if we make use of data standards. Standards help us to exchange and share our data and allow others to find and interpret it.

Jane Stevenson talked about the importance of standards to the Archives Hub.  Aggregating data from multiple sources would be very tricky if no-one used metadata standards. The problem is that the standards that we have are not perfect. Encoded Archival Description (EAD), the XML realisation of ISAD(G), can be too flexible and thus is realised in different ways by different institutions. Even those archives using CALM as their archival cataloguing system may have individual differences in how they use the metadata fields available to them. This does make life as an aggregator more challenging.

Once data is standardised into the Archives Hub flavour of EAD it can be transformed again into other data standards allowing it to be cross searchable beyond the UK archives sector. Jane touched on their work with RDF and linked data and the opportunities this can bring.

We should make use of opportunities to join the European stage. The Archives Hub are 'country manager' for Archives Portal Europe (APE) thus making it a simple matter for Hub contributors to push their data out beyond national borders. For those archival descriptions that link directly to a digital object, the opportunity exists to make this data available through Europeana. This takes our data beyond the archives sector, allowing our collections to be cross-searched alongside other European digital cultural heritage resources. In my mind, this really is the start of ‘digital promiscuity’ and an opportunity I feel we should be embracing (if we can accept the necessity to open up our metadata with a CC0 licence).

Geoff Browell from Kings College London talked about what we as archivists can offer our users over and above what they can get by visiting Google. He highlighted our years of experience at indexing data and pointed out that at approximately half of users of the AIM25 search interface appreciate our efforts in this area and use the index terms provided to browse for data in preference to the google-style free text search. He thinks that we should be talking more closely with both our users and the interface developers to ensure we are giving people what they need. He mentioned that delivery of data to users should be a conversation not a one-sided process.

The National Archives asked us for comment on a beta version of the new Discovery interface which will provide a new portal into selected UK archival holdings. They are encouraging conversation with users by encouraging ‘tagging’ of pages within the search interface.

Malcolm Howitt from Axiell discussed how systems and software can support standards. Standards is a topic that is often raised and they are asked to support many of them. They are keen to help where they can and need to work with the community to ensure that they know what is required of them. The different flavours of EAD was again raised as an issue but Malcolm pointed out that when standards work, the user doesn’t even need to be aware of them.

The National Archives, Kew, London by Jim Linwood on Flickr CC BY 2.0


I think we are all in agreement that metadata standards are necessary and we need to work with them in order to make our catalogue data more visible. Some further issues were picked out in the final session of the day where attendees were invited to share their thoughts on the standards they use and the ones they would like to know more about.

  1. Do we need a standard for accessions data? Would this be a specific subset of ISAD(G) or does it need further definition? The next step in our planned implementation of AtoM is to populate it with accessions data from various different sources and I expect there will be some issues to deal with along the way as a result of lack of standards in this area.
  2. How do we describe digital material? Is ISAD(G) fit for this purpose? As born digital material becomes more and more prevalent in our collections this will become more of an issue. The use of PREMIS to hold technical preservation metadata will be essential alongside the resource discovery metadata but is this enough? This is undoubtedly an area for future exploration.
  3. Does the hierarchical nature of ISAD(G) and EAD hold us back? If we can’t create detailed resource discovery metadata for an archive until we know both the hierarchy and its place in the hierarchy does this slow us down in getting the information out there?

*…mostly standards - with the addition of a surprisingly entertaining session on copyright from Roman Deazley – check out the CREATe project for more on this topic

Jenny Mitcham, Digital Archivist

Monday, 17 March 2014

'Routine encounters with the unexpected' (or what we should tell our digital depositors)

I was very interested a few months back to hear about the release of a new and much-needed report on acquiring born-digital archives: Born Digital: Guidance for Donors, Dealers, and Archival Repositories published by the Council on Library and Information Resources. I read it soon after it was published and have been mulling over its content ever since.

The quote within the title of this post "routine encounters with the unexpected" is taken from the concluding section of the report and describes the stewardship of born-digital archival collections. The report intends to describe good practices that can help reduce these archival surprises.

The publication takes an interesting and inclusive approach, being aimed at both at archivists who will taking in born-digital material, and also at those individuals and organisations involved with offering born-digital material to an archive or repository.

It appeared at a time when I was developing new content for our new website aimed specifically at donors and depositors and also a couple of weeks before I went on my first trip to collect someone's digital legacy for inclusion in our archive. This last few months alongside archivist colleagues I have also been planning and documenting our own digital accessions workflow. This report has been a rich source of information and advice and has helped inform all of these activities.

There is lots of food for thought within the publication but what I like best are the checklists at the end which neatly summarise many of the key issues highlighted within the report and provide a handy quick reference guide.

Much as I find this a very useful and interesting publication it got me thinking about the alternative and apparently conflicting advice that I give depositors and how the two relate.

I have always thought that one of the most important things that anyone can do to ensure that their digital legacy survives into the future is to put into practice good data management strategies. These strategies are often just simple common sense rules, things like weeding out duplicate or unnecessary files, organising your data into sensible and logical directory structures and naming them well.

Where we have depositors who wish to give us born-digital material for our archive, I would like to encourage them to follow rules like these to help ensure that we can make better sense of their data when it comes our way. This also helps fulfil the OAIS responsibility to ensure the independent utility of data - the more we know about data from the original source, the greater the likelihood that others will be able to make sense of it in the future. I have put guidance to this effect on our new website which is based on an advice sheet from the Archaeology Data Service.

Screenshot of the donor and depositor FAQ page on the Borthwick Institute's new website

However, this goes against the advice in the 'Born Digital' report which states that "...donors and dealers should not manipulate, rearrange, extract, or copy files from their original sources in anticipation of offering the material for gift or purchase."

In a blog post last year I talked about a digital rescue project I had been working on, looking at the data on some 5 1/4 inch floppy disks from the Marks and Gran archive. This project would not have been nearly as interesting if someone had cleaned up the data before deposit - rationalising and re-naming files and deleting earlier versions. There would have been no detective story and information about the creative process would have been lost. However, if all digital deposits came to us like this would we be able to resource the amount of work required to make sense of them?

So, my question is as follows. What do we tell our depositors? Is there room for both sets of advice - the 'organise your data before deposit' approach aimed at those organisations who regularly deposit their administrative information with us, and the 'leave well alone' approach for the digital legacies of individuals? This is the route I have tried to take on our new website, however, I have concerns as to whether it will be clear enough to donors and depositors as to which advice they should follow, especially where there are areas of cross-over. I'm interested to hear how other archives handle this question.

Jenny Mitcham, Digital Archivist