A non-archivist's perspective on cataloguing born digital archives

As blogged in my previous post, earlier this week I attended an ARA Section for Archives and Technology event on Born Digital Cataloguing and also had the opportunity to talk about some of the Borthwick's current work in this area.

I gave a non-archivist's perspective on born digital cataloguing. These were the main points I tried to put across, though some of the points below were also informed by discussions on the day:

  • Born digital cataloguing within a purely digital archive is reasonably straightforward. The real complexity comes when working with hybrid archives where content is both physical and digital
  • The Archaeology Data Service are good at born digital cataloguing. This is partly because they only have digital material to worry about, but also down to the fact that they have many years of experience and the necessary systems in place. Their new ADS Easy system allows depositors to submit data for archiving along with the required metadata (which they can enter both at project level and individual file level). A web interface for disseminating this data can then be created in a largely automated fashion. It makes sense to ask the person who knows the most about the data to catalogue it, freeing up the digital archivists' time to focus on checking the received data and metadata and more specialist digital preservation work.
  • Communication can be a problem between traditional archivists and digital archivists. We may use different metadata standards and we may not always know what the other is talking about. I was at the Borthwick Institute for approximately a year before I worked out that when my colleagues talked about describing archives at file level (which may cover multiple physical documents within the same physical file), they didn't mean the same as my perception of 'file level metadata' (which would apply to a single digital item). It is important to recognise these differences and try and work around them so that we understand each other better when working with hybrid archives.
A digital archivist may speak a different language to traditional archivists,
but we can work around this

  • At the Borthwick we are in the process of implementing a new system for accessioning and cataloguing archives (both physical and digital archives). We have installed a version of AtoM (Access to Memory) and have imported one of our more complex catalogues into it. We now need to build on this proof of concept and fully establish and populate this system. As well as holding information about our physical holdings, this system will provide a means of cataloguing born digital data and also the foundations on which a digital archiving system can be built. It will also provide the means by which we can disseminate digital objects to our users.
  • There are other types of metadata that are required for digital material and these are outside the scope of AtoM which is primarily for resource discovery metadata. More technical metadata relating to digital objects and any transformations they undergo needs to reside within a digital archiving system. This is where Archivematica comes in. We are currently testing this digital preservation system to establish whether it meets our digital archiving needs.
  • I worry about the identifiers we use within archival catalogues. The traditional archival identifier is performing two jobs – firstly acting as a unique identifier or reference for an item or group of items, and secondly showing where those items sit within the archival hierarchy. This can lead to problems...
    • ...if the arrangement of the archive changes – this may lead to the identifier changing – never a good thing once it has been published and made available, or, if that identifer is being used to link between different systems.
    • ...if we want to start describing objects before we know where they sit within the hierarchy. This may be the case in particular for digital material where we may want to start working with it with greater urgency than the physical element of the archive.*
  • We can argue that digital isn't different, but with digital we do tend to think more at item level. Digital preservation activities and the technical and preservation metadata that this generates are all at file (item) level, so perhaps it makes sense for the resource discovery metadata to follow this pattern. Unlike physical archives, for digital archives we can pretty easily generate a title (or file name) for every item. If we are to deal with digital archives at file level would this cause confusion when cataloguing a hybrid archive?
  • Before we incorporate digital material into a digital archive, some selection and appraisal needs to be carried out  - depending on the digital archiving system in use, it can be non-trivial to remove files from an AIP (archival information package) once they have been transferred, so we really do need to have a good idea of what is and isn't included before we carry out our preservation activities. In order to carry out this selection we may wish to start putting together a skeletal description of each item. Wouldn't it be nice if we could start to do this in a way which could be easily transferred into an archival management system? At the moment I have been doing this in a separate spreadsheet but we need strategies that are more sustainable and scalable.
  • Workflows are crucially important. Who does the born digital cataloguing where hybrid archives are concerned? It's place within the archive as a whole is key so it should be catalogued in tandem with the physical, but if we want to archive the digital material more rapidly than the physical how do we ensure we have the right workflows and procedures in place? Much of this will come down to institutional policies and procedures and the capabilities of the technologies you are using. These are still issues we are grappling with here at the Borthwick as we try and establish a framework for carrying out born digital cataloguing.

* as an aside (and a bit off-topic), my other bugbear with archival identifers is that they contain slashes (which means we can’t use them in directory or file names) and that they don’t order correctly in a spreadsheet or database as they are a mixture of numeric and alphabetical characters

Jenny Mitcham, Digital Archivist


Popular posts from this blog

How can we preserve Google Documents?

Preserving emails. How hard can it be?

Checksum or Fixity? Which tool is for me?