Tuesday, 31 May 2016

Research data - what does it *really* look like?

Work continues on our Filling the Digital Preservation Gap project and I thought it was about time we updated you on some of the things we have been doing.

While my colleague Julie has been focusing on the more technical issues of implementing Archivematica for research data. I have been looking at some real research data and exploring in more detail some of the issues we discussed in our phase 1 report.

For the past year, we have been accepting research data for longer term curation. Though the systems for preservation and access to this data are still in development, we are for the time being able to allocate a DOI for each dataset, manage access and store it safely (ensuring it isn't altered) and intend to ingest it into our data curation systems once they are ready.

Having this data in one place on our filestore does give me the opportunity to test the hypothesis in our first report about the wide range of file formats that will be present in a research dataset and also the assertion that many of these will not be identified by the tools and registries in use for the creation of technical metadata.

So, I have done a fairly quick piece of analysis on the research data, running a tool called Droid developed by The National Archives over the data to get an indication of whether the files can be recognised and identified in an automated fashion.

All the data in our research data sample has been deposited with us since May 2015. The majority of the data is scientific in nature - much of it coming from the departments of Chemistry and Physics. (this may be a direct result of expectations from the EPSRC around data management). The data is mostly fairly recent, as suggested by the last modified dates on these files, which range from 2006 to 2016 with the vast majority having been modified in the last five years. The distribution of dates is illustrated below.

Here are some of the findings of this exercise:

Summary statistics

  • Droid reported that 3752 individual files were present*

  • 1382 (37%) of the files were given a file format identification by Droid

  • 1368 (99%) of those files that were identified were given just one possible identification. 12 files were given two possible identifications and a further two were given 18 possible identifications. In all these cases, the identification was done by file extension rather than signature - which perhaps explains the uncertainty

Files that were identified

  • Of the 1382 files that were identified: 
    • 668 (48%) were identified by signature (which suggests a fairly accurate identification - if a file is identified by signature it means that Droid has looked inside the file and seen something that it recognises. I'm told it does this by some sort of magic!)
    • 648 (47%) were identified by extension alone (which implies a less accurate identification)
    • 65 (5%) were identified by container. These were all Microsoft Office files - xlsx and docx as these are in effect zip files (which suggests a high level of accuracy)

  • 111 (8%) of the identified files had a file extension mismatch - this means that the file extension was not what you would expect given the identification by signature. 
    • All but 16 of these files were XML files that didn't have the .xml file extension (there were a range of extensions for these files including .orig, .plot, .xpr, .sc, .svg, .xci, .hwh, .bxml, .history). This isn't a very surprising finding given the ubiquity of XML files and the fact that applications often give their own XML output different extensions.

  • 34 different file formats were identified within the collection of research data

  • Of the identified files 360 (26%) were XML files. This was by far the most common file format identified within the research dataset. The top 10 identified files are as follows:
    • Extensible Markup Language - 360
    • Log File - 212
    • Plain Text File - 186
    • AppleDouble Resource Fork - 133
    • Comma Separated Values - 109
    • Microsoft Print File - 77
    • Portable Network Graphics - 73
    • Microsoft Excel for Windows - 57
    • ZIP Format - 23
    • XYWrite Document - 21

Files that weren't identified

  • Of the 2370 files that weren't identified by Droid, 107 different file extensions were represented

  • 614 (26%) of the unidentified files had no file extension at all. This does rather limit the chance of identification being that identification by file extension is relied on quite heavily by Droid and other similar tools. Of course it also limits our ability to actively curate this data unless we can identify it by another means.

  • The most common file extensions for the files that were not identified are as follows:
    • dat - 286
    • crl - 259
    • sd - 246
    • jdf - 130
    • out - 50
    • mrc - 47
    • inp - 46
    • xyz - 37
    • prefs - 32
    • spa - 32

Some thoughts

  • This is all very interesting and does back up our assertions about the long tail of file formats within a collection of research data and the challenges of identifying this data using current tools. I'd be interested to know whether for other collections of born digital data (not research data) a higher success rate would be expected? Is identification of 37% of files a particularly bad result or is it similar to what others have experienced?

  • As mentioned in a previous blog post, one area of work for us is to get some more research data file formats into PRONOM (a key database of file formats that is utilised by digital preservation tools). Alongside previous work on the top software applications used by researchers (a little of this is reported here) this has been helpful in informing our priorities when considering which formats we would like The National Archives to develop signatures for in phase 3 of the project.

  • Given the point made above, it could be suggested that one of our priorities for file format research should be .dat files. This would make sense being that we have 286 of these files and they are not identified by any means. However, here lies a problem. This is actually a fairly common file extension. There are many different types of .dat files produced by many different applications. PRONOM already holds information on two varieties of .dat file and the .dat files that we hold appear to come from several different software applications. In short, solving the .dat problem is not a trivial exercise!

  • It strikes me that we are really just scratching the surface here. Though it is good we are getting a few file signatures developed as an output for this project, this is clearly not going to make a big impact given the size of the problem. We will need to think about the community should continue this work going forward.

  • It has been really helpful having some genuine research data to investigate when thinking about preservation workflows - particularly those workflows for unidentified files that we were considering in some detail during phase 2 of our project. The unidentified file report that has been developed within Archivematica as a result of this project helpfully organises the files by extension and I had not envisaged at the time that so many files would have no file extension at all. We have discussed previously how useful it is to fall back on identification by file extension if identification by signature is unsuccessful but clearly for so many of our files this will not be worth attempting.

* note that this turns out not to be entirely accurate given that eight of the datasets were zipped up into a .rar archive. Though Droid looks inside several types of zip files, (including .zip, .gzip, .tar, and the web archival formats .arc and .warc) it does not yet look inside .rar, 7z, .bz, and .iso files. I didn't realise this until after I had carried out some analysis on the results. Consequently there are another 1291 files which I have not reported on here (though a quick run through with Droid after unzipping them manual identified 33% of the files so a similar ratio to the rest of the data. Note that this functionality is something that the team at The National Archives intend to develop in the future.

Friday, 27 May 2016

Why AtoM?

A long time ago I started to talk about how we needed a new archival management system. I described the process of how we came up with a list of requirements to find a system that would meet our needs.

People sometimes ask me about AtoM and why we chose it so I thought it would be useful to publish our requirements and how the system performed against these.

It is also interesting at this point - post launch to go back and review what we thought we knew about AtoM when we first made the selection back in 2014. Not only has our understanding of AtoM moved on since then, but the functionality of the system itself has moved on (and will move on again when version 2.3 is released). For several of the requirements, my initial assessment was negative or recorded only a partial success and coming back to it now, it is clear that things have improved.

I also discovered that for one of the requirements, I had recorded AtoM has having met it but experience and further research has demonstrated that it doesn't - this is something we are hoping to address with further funding in the future.

So here goes with the requirements and a revised and hopefully up-to-date commentary of how AtoM (version 2.2) meets them.

Cataloguing and accessioning

The system must allow us to record accessions

The short answer is Yes.

We looked at this one in some detail and produced a whole document about specifically whether AtoM met all of our requirements for accessioning. This included a mapping of the fields within AtoM to the data we held from previous systems and further thoughts related to our own accessioning workflows.

We have found that AtoM is a good tool for recording accessions but there are some issues. One of the problems we highlighted with the accessions module initially was the lack of ability to record covering dates of the material. We have been able to sponsor the development of covering dates fields by Artefactual Systems so this one is resolved now.

Another issue we have noted is the rather complex way that rights and licences are recorded within an accession record. I am still not sure we fully understand how best to use this section!

Need to be able to generate a receipt for accessioning

This one is still a No.

In our previous system for accessioning this was an auto generated report of key fields from the accessions data which could be printed out and sent to the donor or depositor with a covering letter as a record of us having received their archive. We did explore with Artefactual Systems the options around the creation of this feature within AtoM but were not able to find the money to follow up on this development.

The good news is that we have found a temporary workaround to this internally using a print style sheet to create a report. This solution isn't perfect but does at least mean we don't have to retype or copy and paste the accessions data into an accessions receipt. We are hoping a better solution emerges at some point in the future.

The system must allow us to enter catalogues/lists

Yes - this is very much what AtoM is designed to do.

Internally we are still discussing and testing out the methodology for data entry into AtoM. In some situations it makes sense to enter archival descriptions directly into the interface and in other situations the CSV or EAD import options are utilised. There are also situations where both of these methodologies might be used, for example importing the basic structure and then enhancing the records directly within AtoM.

Catalogue metadata should be ISAD(G) compatible

Yes - does what it says on the tin.

Data entry form for accessioning/cataloguing should include customisable drop down lists/controlled vocabularies where appropriate for ease and consistency of data entry

Yes - there is a taxonomies section of AtoM that allows you to manage the wordlists that are used within the system, editing and adding terms as appropriate. By default, some taxonomies (which contain controlled vocabularies directly from related standards) are locked, but can be edited by a developer.

Accessions data should be linked to catalogue data for ease of running queries and managing data (eg: producing lists of collections which haven't been catalogued)

Yes - Accessions data can be linked to archival descriptions within AtoM. This isn't mandatory (nor should it be) and since importing all of our accessions records into AtoM there would be quite a big piece of work involved in making these links. However, the functionality is there.

Running queries as described in this requirement isn't something that AtoM does currently, however access to a MySQL client (such as Squirrel) and a working knowledge of SQL does open up opportunities for querying the underlying data in a variety of ways.

It should be possible to create a collection record directly from the accessions record (avoiding the need for re-typing duplicate fields)

Yes - An accessions record can be turned into an archival description - and this is a feature we want to explore more as we incorporate AtoM fully into our internal workflows.

We should be able to create hard copy catalogues in a format suitable for search room use

Initially no...but now yes!

Functionality on this score was fairly limited when we assessed AtoM version 2.0 a three years ago. However, we knew this was on the AtoM development roadmap and since adopting AtoM there has been further work in this area. In AtoM 2.2 it is possible to create finding aids in PDF or RTF format and administrators can select whether a full finding aid or inventory summary is required.

We would like to record locations of material within the archive

In our initial assessment of AtoM we recorded this as a Yes

...but I'm reserving my judgement about whether we can use this functionality ourselves until we've finished testing this out.

We have a locations register (currently a separate spreadsheet) that records what is where within the strongroom. We also have signs for the end of each aisle which record which archives you can find in that aisle. I'd like to be able to say we could do away with these separate systems and store this information within AtoM but I'm just not sure whether the locations section of AtoM meets our needs currently. Further investigation required on this one and I'll try and blog about this another time.

We would like the system to be able to manage and log de-accessioning

Yes we believe this to be so - after creating an accession record, a simple click of a button allows the archivist to deaccession the whole or a part of the accession.

As far as I know we haven't had cause to test this one just yet.

Import and export of data

Data should be portable. We need to be able to export it to other platforms/portals in EAD, EAC (for authority records) and other formats.

Yes - this is certainly something that AtoM can do.

Data should be portable. We would like to be able to set up OAI PMH targets (or equivalent) so that our data can be harvested and re-used by others 

Ahhhh - now this is an area where perhaps we weren't specific enough with our requirement!

On the surface, this is a 'yes' - we have been able to set up OAI-PMH harvesting to our Library Catalogue using Dublin Core.

...however, what we really wanted to be able to do, which isn't articulated well enough in the requirement was to be able to expose EAD metadata via OAI-PMH. This isn't currently something that AtoM can do but watch this space! We hope to be able to make this happen at some point in the future.

We need adequate import mechanisms for incorporating backlog data in a variety of formats. This should include EAD and a means of importing accessions data

Yes - EAD can be imported, as can metadata in other formats (for example Dublin Core, MODS, EAC, SKOS). Having tested this we have discovered that the EAD import works but isn't necessarily as simple and straightforward as we would like. We know there are good reasons why this is the case but it has proved to be a barrier for us in getting more of our existing structured finding aids into AtoM. We will be doing more data import work over the next year or so.

CSV inport is also an option and this is the method we used to import all of our accessions data. We are currently testing how we can use this import functionality for archival descriptions and think this will be very useful to us.


We need to be able to run queries and reports on the data that we hold - both routine and custom

Yes and No - Reporting within AtoM itself is limited, however this doesn't matter if you have a MySQL client and the ability to query the underlying database. We have already had success at running specific reports on the data within AtoM (for our annual accessions return). Though this solution may not suit everyone, the ability to query the data outside of the web interface itself does offer flexibility above and beyond the functionality that could be programmed into the interface and is a really powerful tool.


Our users should be able to search and browse all of our collections on-line and access born-digital and digitised material that we hold

Yes absolutely - this is what AtoM does well. There are a variety of ways to search and browse the data within AtoM and one of the real strengths of the system is the ability to link between records using hyperlinked subject terms, places and authority records.

Our only limitation currently is that the vision of getting 'all' of our finding aids into AtoM may not be realised for some time!

We have not really started working with the digital object functionality within AtoM as we waiting to get Archivematica installed and in use so that we can be sure that access to digital content is backed up by a preservation system.

The system should allow different levels of access, and provide a high degree of data security

Yes - there is scope to configure what public users can and can't see within AtoM. Accessions records (including details of donor names and addresses) are only visible to staff when logged in to the system. Public users only see the records you want them to see. Within AtoM you can keep things in draft until you are ready to publish them.

AtoM also allows you to hide certain fields from public view using the 'Visible Elements' feature. This means that within an archival description you can hide the physical location field for example or a notes field if you want to keep this field for internal use only.

AtoM also allows different levels of access to specific user groups. Staff can be given access as either contributors, editors or administrators as appropriate depending on the functions that they are required to carry out within the system.

We should be able to get usage statistics from the web interface

In hindsight this perhaps wasn't a very sensible requirement. Yes, we can see web statistics for our AtoM instance, but this is not through AtoM but through Google Analytics.

We should be able to get error reports from the web site so we know if there is a problem

As above - this requirement is not being met by AtoM itself but is met by other tools our systems team have at their disposal. For example, if the AtoM search facility breaks, we have an automatic notification that is sent out to those people that have the skills to fix it!


We need to be able to record born-digital material

Yes - but looking back, I am not entirely clear what I meant here. Using AtoM you can record the presence of born digital material in an accessions record (by describing the format in a free text field) and you can also present born digital material via the web interface and describe it as you would any other item within an archive.

Of course AtoM is not a digital archive and in order to fully record born digital material (specifically all of the technical metadata required) you also need a digital preservation system. AtoM integrates with Archivematica which ticks our boxes for digital archiving. For more information on this see my blog post about how Archivematica meets our digital archiving requirements.

We need to be able to associate digitised files with their analogue masters

Yes - AtoM allows you to upload or link digitised files to an archival description. We have done a bit of testing but not really started using this feature in earnest. Watch this space though - when we come to finish our ongoing project to digitise the archive of the Retreat we will carry out a piece of work to make the links between the archival catalogue and the digitised material.

We would like to be able to record preservation actions and other technical metadata for digital material

No - this is very much outside of scope for AtoM...and this is why we are also looking at Archivematica which does tick the boxes in these areas and is designed to work alongside AtoM.

The system should allow us to allocate unique identifiers to digital objects

Not really.

In AtoM the identifier would be the archival reference code - but this might not be a unique identifier as such as there may be more than one digital object associated with an archival description.

However, this is actually a job that could be performed elsewhere. Archivematica will allocate identifiers to digital objects and AtoM is designed to work alongside Archivematica.

Other modules

We need to create authority records where appropriate

Yes - Authority records are one of the core entity types within AtoM and are based on ICA standards. Now we are using AtoM we have found this feature to be very powerful.

We would like to be able to record conservation actions that we have carried out on a particular collection or item (analogue)

No - this is not a feature within AtoM and is not currently on the roadmap (though it could be in the future if someone was to fund the necessary development work).

There is a field in the accessions module to record the physical condition of an archive which is useful but isn't enough. We will continue to maintain a separate system for recording conservation actions for the time being.

Searchroom staff need to be able to log enquiries, access, register users etc

No - again this is not a feature of AtoM and will need substantial work (and financial resource) to develop it. In the meantime we are happy to continue to maintain separate systems and processes for the day-to-day work of the searchroom staff.

We would like to have a workflow or management actions checklist so we can keep track of what work needs to be done

No - without a conservation module being included within AtoM this requirement is perhaps a bit ambitious. What we were hoping for here was a feature that tells you where you are with an archive - for example, reporting on whether a new accession has been catalogued or not or has had the necessary conservation treatment.

As described elsewhere in this list, AtoM users can choose to run reports directly on the MySQL using an SQL client. These could be tailored to help identify archives that fulfil particular requirements and could be used to help inform task allocation and team priorities.


The system should be under active development with established feedback routes for requesting enhancements

AtoM is under continuous active development. You can see what is coming up in the new release here. As well as showing how many new features and enhancements are being developed, it also shows the wide range of institutions who are involved in funding the development of AtoM.

Feedback and requests for enhancements are encouraged via the AtoM user forum. Requests for enhancements often lead to the same response from Artefactual Systems (the lead developers for AtoM) which goes along the lines of "yes we can do that if someone pays us to do it!" and that is fair enough really. It allows institutions to push various features to the top of the wishlist if they have the resources to pay for the development work. This does mean that these funded features are also available for other AtoM users to make use of if they wish.

The system should be flexible and customisable so it can be modified for our specific needs

Yes - AtoM is open source which does mean that we could tinker with the code and customise it as much as we like (assuming we have the resource and expertise to do so). We have done a bit of this - customising the way the global search works and the look and feel of the interface.

There is also scope within the AtoM interface to tweak the admin settings. Any AtoM user will want to spend a fair bit of time investigating these settings and considering what options will be best for their implementation.

The system should include technical support

Yes - technical support is available for AtoM. This can be bought as a package from Artefactual Systems and this will be particularly valuable if no technical staff resource is available in house.

Technical support is also available for free via the user forum - both from Artefactual Systems and from other AtoM users and all AtoM users are encouraged to join in the discussions and share their experiences.

The system should be used by other archives. This will provide us with another mechanism for advice, support and feedback

Yes - it is encouraging to see that AtoM is used by many institutions across the world. Some of these are listed on the Users page on the AtoM wiki

For us it was also useful to make contact with a friendly AtoM user in the UK who we could talk to directly and get advice on how they use the system.

So, that was a run down of our requirements and how AtoM performs against them. 

As you will have seen, AtoM did not get full marks when we assessed it originally, and still doesn't do everything we had originally wanted it to. However, over the last few years I would say that we have revised our expectations and have accepted the fact that one system can't necessarily do everything! AtoM works alongside other systems and processes at the Borthwick Institute in order to meet our needs. For other requirements we have developed workarounds to ensure that we have a solution that works for us.

When people ask us why we selected AtoM as our archival management system I do mention the requirements assessment but I think ultimately it was the fact that we saw potential and wanted to get on board, be part of the user community and influence the future development of this system. We have seen numerous enhancements over the last couple of years and are looking forward to seeing many more developments in the future.

Monday, 11 April 2016

Responding to the results of user testing

Did you notice that we launched our new AtoM catalogue last week? I hope so!

In the month whilst preparing for launch we wanted to take the time to find out what a sample of users thought about our new catalogue and here I will summarise some of the findings and the steps that we have taken to react to this feedback.

We had 14 people test the catalogue for us off-site and fill out an online questionnaire which was put together using Google forms. Testing was carried out on AtoM version 2.2.0. The volunteers for user testing were found by putting out a call on Twitter and the results were helpful and constructive (though one user could not access the site so was not able to answer the questions in any meaningful way). Despite the small sample size there were several themes that were mentioned more than once. Interestingly these weren't necessarily the themes that we thought would be mentioned more than once!

Let's start with the positives....

The good things

It's always nice to receive positive feedback and we were encouraged to see that there was plenty of this to come out of the user testing - things that were praised fell into the following categories:

Look and feel - The vast majority of users found the catalogue visually appealing. A couple of people mentioned that they liked the colour scheme and one appreciated the fact that it flowed nicely from our website. The image on the home page was also praised. Others commented on the fact that it was well set out with a clean and clear appearance. One respondent compared it very favourably with other leading archival catalogues.

Our home page image

Functionality - The search functionality of the catalogue was praised as was the faceted classification that allows you to filter your search results. The browse by subject feature had several positive mentions and one person liked the ability to download XML files. Navigation within the catalogue was praised, including a specific comment about the tree-view feature on the left side of the interface.

The data - We were pleased to hear people saying good things about the quality of the data that we have in the catalogue. The information was described as being 'full' and 'comprehensive'. The level of detail held in the Conditions of Access and Use field was mentioned specifically and the fact that you could see when each description was last updated. One respondent stated that they liked the fact the catalogue conformed to recognised archival standards and that it was clear from the interface which rules had been used to create the data.

Digital objects - Several of the testers mentioned specifically that they liked the inclusion of digital objects within the catalogue. We have not utilised this feature to full effect just yet, but for some of our descriptions a finding aid or an image is available. Users liked the way that AtoM displays the thumbnails in the results list. An archival catalogue can be quite text-heavy so using digital objects to break the text up was seen as a good thing.

The help pages - Our glossary page had a positive mention. We put this together as we recognised that archival terminology can be a bit of a mystery to non-archivists (myself included) so being able to define some of the key terms we use was a priority for us.

My favourite comment under the question "What did you like about the catalogue?" was "Almost everything". This highlights to me that we have pretty much got it right but of course we shouldn't put our feet up - there is always room for improvement!

The not-so-good things

We also received comments about the things which weren't working so well in our new catalogue:

Look and feel - Of the users who did not think the catalogue was visually appealing, one comment was that it was 'bland' and that too much space on the front page was taken up by the image. The same person didn't like the fact that all the navigation was on the left and they couldn't find the search box. Another respondent thought that the links on the left hand side were too small and their eye wasn't drawn to them because of the large image on the front page. It was thought by one person that the location of the main image on the front page looked odd because it wasn't central.

Our response: We wondered about trying to increase the size of the text in the left hand navigation bar in order to make these links stand out a bit more but concluded that this may well upset the balance of the current design. Being that the majority of respondents were very happy with the visual appearance of the site, we decided that no changes were needed at this point in time.

Search box - The visibility of the search box was an issue that was raised a couple of times. We are using a slightly customised version of the default Dominion theme within AtoM and this puts the search box at the top of the screen. One person didn't find the search box at all whilst testing the catalogue. Another found it but wasn't immediately sure of its purpose as its location and proximity to the University of York logo suggested it would search our website rather than our catalogue. This may have been a direct result of our decision to style the catalogue to mirror the look and feel of our website as we do have a similar sized website search box in the top bar of our website.

Our response: We have given some serious thought to how to make the search box more prominent within AtoM but I'm not convinced there is an obvious solution to this. Prior to the user testing we had already changed the colour of the search box from dark grey to white to make it more visible. We have since made another minor tweak to the default theme to turn the 'Search' text within the search box from grey to black to make it stand out more. We considered making the search box bigger (longer) but our top bar is already getting quite crowded and filling it up any more than necessary does have knock on effects to the responsive design when viewed on smaller screens. 

While I can see a benefit to having the main search box taking centre stage on the catalogue front page, I also see it is useful having it up in the top bar so it is always accessible where ever a user is within the catalogue. We don't intend to make any further changes for the time being.

Search results - Several people mentioned that there were simply too many results when you carry out a search ...and the results that come up are not always relevant. We had already been discussing this very issue on the AtoM mailing list and were not surprised that our users were struggling with this. 

Our response: We are hoping that this is something that will be resolved in future versions of AtoM, but in the meantime we are focusing on educating our users by giving them the information they need in order to run more effective and precise searches (even just using the powerful functionality that is available within the basic search box). 

We think that a change to AtoM's default behaviour which currently searches for multiple words by default with an 'OR' operator rather than an 'AND' would produce search results that were more in line with what our users were expecting. Also, although users of Google will happily run a search that produces many thousands of results and feel comfortable not moving beyond the first ten 'hits', users of archival catalogues do not necessarily take the same approach. There seems to be more of an assumption that the list of results will be relevant and each should be worked through in turn. This is something we are definitely hoping for a solution to in the future.

Filtering the search results - One person expressed a desire to be able to filter a search by date

Our response: We agree that this would be a really useful feature and we were pleased to hear from Artefactual Systems that this will be possible within the next version of AtoM (2.3) which is due out soon. This will also introduce the ability to search within the date field in the advanced search and order results by start date in the results list. I think these features are going to be really valuable to our users.

Navigation - One person reported that the catalogue was hard to navigate but didn't give further details. Another struggled with navigation and described a scenario in which they had got lost within the catalogue. 

Our response: I can easily understand how someone could get lost within our catalogue - it has happened to me too! In some respects this problem is directly related to the powerful functionality of the AtoM interface and relational nature of the underlying data structure. Searching and browsing AtoM isn't a linear journey but rather an opportunity to follow links between one record and another based on shared subject terms or creators. Getting lost is a fairly inevitable consequence of this functionality and I struggle to think of an effective solution (apart from encouraging repeated use of the browser 'back' button to get back to where you started!)

The data - One user reported that there is "not much material yet" and another asked for more digitised documents. It was also mentioned that there were "not enough categories for searching" (we speculate that this might relate to the subject terms we have entered). Another comment received was about the term 'accrual' which is used as a field name within AtoM and also within the data that we enter in that field. It was suggested that this word might be a bit off-putting for some users. It was also mentioned that the lists within the Scope and Content field were"pretty hard reading" and a suggestion was made that this would be more user-friendly if presented as a bulleted list rather than a paragraph of text.

Our response: We did expect to get comments about our data. Just because we have launched our catalogue we do not consider it to be a finished piece of work. Further work on populating the catalogue and a fuller exploration of the functionality around digital objects will follow over the next couple of years. It was interesting to get the feedback about the word 'accrual' - we had actually anticipated much more feedback about the terminology that we use but hadn't considered this word in particular. I do agree that this word is a tricky one for non-archivists and I'm pretty sure I had not encountered it before I came to work at the Borthwick Institute. We don't want to change it on the basis of one comment but did decide to add the term to our glossary (one of the help pages we have created within AtoM) and hope that this helps our users.

The help pages - In our questionnaire we asked people specifically whether they used the catalogue help pages. The majority of users surveyed didn't use the help pages and this was not a surprising result. One person's reason for not using the help was because they "should not need to in a well designed information system". Another person stated that they preferred to "just see if I could use the catalogue instinctively". A couple of people mentioned that the page was too text heavy and someone else reported that they didn't know there were any help pages. Someone also suggested that the help pages should open in a new window.

Our response: As a result of the user testing we have made several changes to our help pages. We have updated the text (specifically to explain how to reduce the number of search results) and added a number of screenshots to help convey the information in a more visual way. 

Our help pages are now more visual and include screenshots - the first graphic simply shows how to access the search box. We have also created some printed and laminated copies of these for use in the searchroom.

Of course we can put a lot of effort into putting the right level of information into our help pages but we can not force people to use them! So, over the last couple of weeks we have been ensuring that our searchroom assistants (the people who will be providing front line support to our users as they grapple with our new catalogue) are aware of the different search options within AtoM and understand how they can be used to best affect.

There are also things we can do to make it clearer to users where the help pages are so that they can easily find them if they want to. By default the help pages in AtoM appear under an 'i' icon alongside other static pages. Replacing this 'i' icon with a '?' seemed to be a sensible step to take in order to make it clearer where help could be found. Artefactual Systems were able to point us to the relevant icon in Font Awesome which was just what we needed to implement this little change. 

We agreed that it may be useful for the help pages to open in a new tab so that someone could access them without losing their place within the catalogue (particularly being that 'getting lost' was also an issue that had been reported). Our help pages now open within a separate tab. We will monitor how users respond to this and whether the potential proliferation of tabs becomes a problem.

It has been a useful exercise reviewing this initial sample of responses and giving some thought to how AtoM and our own implementation of it can be improved. We will be continuing to gather user feedback through further more detailed testing with a smaller sample of users and by pulling together the ad hoc comments we are likely to receive now our catalogue is live. 

Thursday, 7 April 2016

Our catalogue is now live!

Was it really 3.5 years ago when I first blogged about requirements for a new archival management system?

My main aim in getting involved in this project was to create a stable base to build a digital archive on.

If you build a digital archive on wobbly foundations there is a strong chance that it will fall over.

Much safer to build it on top of a system established as the single point of truth for all accessions information your organisation holds. A system which will become the means by which you disseminate information about your digital holdings (alongside the physical ones) and enable users to access copies of born digital and digitised material.

Finally we have such a solution in place!

We chose Access to Memory (AtoM) as our new archival management system, and over the last few years there has been a huge amount of work going on behind the scenes getting it up and running. I'm so pleased that today we are in a position to unveil the results of all of that hard work.

Our new catalogue can be viewed at

In a previous blog post "A is for AtoM" I talked about some of the tasks that have been going on and decisions that have been made to get us up and running, so I won't repeat all of that here.

Suffice to say that a considerable amount of work has gone in to getting AtoM installed, configured and styled. While this has been going on, Project Genesis has been key to getting the catalogue populated with archival descriptions. The task of populating our catalogue will continue via project Genesis until April 2017 and by other channels beyond that.

While our initial focus has been to get a collection level description for each of our archives into the catalogue, further work is required on the wider task of retroconversion - getting a variety of finding aids in a range of different formats into the system. We have managed to tackle some of this in an ad hoc way but there is still much to do.

Our AtoM catalogue is live, but our work is not yet done. I need to start thinking about how we can build digital preservation functionality on top of this (via Archivematica) and of course how we can start to provide more access to our digital holdings through the catalogue interface. Watch this space!

In the meantime, we'd be happy to hear any feedback about our catalogue so do get in touch.

Friday, 1 April 2016

Kicking off phase 3 of "Filling the Digital Preservation Gap"

I realise I've gone a bit quiet on "Filling the Digital Preservation Gap" since the release of our phase 2 project report. I am pleased to pass on the news that we have been funded by Jisc to continue some of our work into Phase 3.

Our Research Data Spring phase 3 kick off meeting was held yesterday at the Hull History Centre and we celebrated with a suitably spring-themed cake!

Our Research Data Spring chicken cake

So here is a run down of what we are planning to do in phase 3:

The big one at the top of the list is Archivematica implementation. Both York and Hull are going to be working on their own proof of concept implementations of Archivematica integrated with their existing repositories (and potentially other systems within the RDM workflow). We may not be able to follow the implementation plans from our phase 2 report in full (as we have not been funded in full) but both institutions plan to get an implementation up and running with a focus on a single use case.

I for one am very excited about this implementation phase. This is what our work over the previous two phases has been leading up to. The ground work laid in phases 1 and 2 has been incredibly valuable, but it will be great to move from talking about Archivematica to actually working with it!

We are also going to continue to look at the issue of unidentified file formats. This has been a recurring theme during phases 1 and 2 and is particularly pertinent for research data which comes in such a huge variety of formats. We are going to work with The National Archives to ensure a few more research data file formats are represented in PRONOM. We will also give further thought to our workflows for handling unidentified files and how tools such as Archivematica can help.

We will of course be continuing our dissemination and outreach work. Some of this has already happened over the last couple of months.
  • I gave a presentation at the IDCC16 conference in Amsterdam in February and discussed why active digital preservation is often left out of RDM workflows - the slides can be viewed here
  • Julie Allinson presented a case study about our project at a workshop entitled 'Digital Preservation: Strategic Issues' at the National Library of Wales in February
  • Myself and Simon Wilson from Hull produced a poster for the UK Archives Discovery Forum last month to promote some of the themes of our project so far and make sure the wider archives community is aware of our work

Our UKAD 2016 poster
  • At the UK Archivematica meeting last month I gave a presentation which summarised the outcomes of the development work we funded in phase 2. This can be found here
Watch out for us at 'Research Data, Records and Archives: Breaking the Boundaries' in Edinburgh later this month and Open Repositories in Dublin in June.

Of course we will also be keeping you posted on this blog as phase 3 of our project progresses, so watch this space

Friday, 18 March 2016

'A' is for AtoM

Over the past couple of years we have been busy working away behind the scenes on our implementation of Access to Memory (AtoM) at the Borthwick Institute for Archives and very soon we will be launching our new catalogue to the public.

I haven’t said much about AtoM on this blog thus far but it has been a huge preoccupation over the last couple of years. Here I attempt to redress that balance.

It turns out that deciding to adopt a system is relatively simple, working out exactly how you are going to use it is far more complex!

What follows is a list of just some of the things we have been thinking about and working on over the last couple of years as we move towards launch. I present you with the A to Z* of implementing AtoM….

A is for Accession Mask

We were very keen to use AtoM for accessioning…in fact the need to urgently find a new system for recording our accessions was the key driver for getting us moving with AtoM in the first place.

As we started using AtoM for recording new accessions we realised we needed to get the accessions mask right. This is just one of the configuration options within AtoM and it enables you to create unique references for your accessions. We wanted to ensure that our new accession numbers were in the same format as our previous ones so with a  little bit of help and advice from the AtoM user forum settled on the mask “%Y/%iii” which creates  numbers in our preferred format of [yyyy]/[no]. Now we just need to remember to reset the accession number to ‘0’ at the start of each new year so that our running number sequence starts again. This is just one of the ways that an institution can configure AtoM to suit local preferences.

B is for Business as Usual

Any organisation when adopting a new and complex system like AtoM needs to think beyond initial implementation and consider how the solution can be embedded into their workflows for the longer term? The ultimate goal for us is getting AtoM seen as 'business as usual' at the Borthwick. We are not there yet (though perhaps we almost are when it comes to working with accessions data). Getting us to the point where AtoM is not a standing item on our meeting agendas is something to aim for in the future!

C is for Customising the look and feel

AtoM gives you some options for customising of the look and feel of the front end. Being that the AtoM interface is going to be the primary means through which our users will browse and view information about our holdings, we want the interface to look consistent with our other communications. It needs to be clear that it belongs to us. Using our brand colours was a quick win and we also put some additional effort into creating an attractive image for the home page to make it look more visually appealing.

Note that there is a limit to the level of customisation that can be done without developer support. Within the admin interface of AtoM some basic changes to theme colours can be made, but I quickly found that changing the background colour to our Borthwick orange did not look pretty! Much better to call in our local technical experts to tweak the CSS behind the scenes.

D is for Drop Down Lists

AtoM comes ready populated with wordlists (called taxonomies) that populate the drop down lists to support data entry, however, institutions can change these to meet their own local needs. We have had to tweak a few of the taxonomies within AtoM, for example the deposit types in the accessions section and the levels of description (after much internal debate!).

E is for Experimenting

In order to understand AtoM we knew we really need to get some of our data into it. We experimented with some structured finding aids that already existed in EAD format and had a go at importing them. We discovered that data may not always import in the way you expect.

One of the key problem areas for us has been the way AtoM handles the <bioghist> element in our EAD files. The issue is documented here. Essentially what it tends to mean for us is that we end up with lots of untitled authority records when we import an EAD finding aid. This has been a bit of a barrier for us in getting more of our existing catalogues into AtoM. Experimenting and carrying out tests to check the behaviour however, does allow us to consider how we can tackle the issue and work towards a solution for future data imports.

F is for Friendly Advice

Though there is much detail in the AtoM documentation, anyone starting to use a new system such as AtoM will inevitably get to the point where they need to speak to someone, or see another implementation. The AtoM mailing list and the staff at Artefactual Systems are friendly and helpful and it is easy to get quick answers to specific questions. It is also incredibly valuable to have a local AtoM user to talk to, to bounce random questions off (particularly ones that may sound too silly or trivial for the mailing list!).

G is for Give it a Name

In the last few weeks before AtoM launch it occurred to us that we needed to decide what to call it. Internally we have simply been calling it ‘AtoM’ but we realised that this label is of little use to our users. As we started to finalise the interface and prepare the publicity for launch date we agreed that we would call it the ‘Borthwick Catalogue’. Perhaps not very imaginative but it is at the very least a concise description of its content and purpose!

H is for Help Pages

An online archival catalogue is quite a complex thing and we are aware that some of our users may be a bit daunted by it. Help pages are therefore really important to describe how to search and filter the results.

AtoM comes with some standard static pages, that can be very easy edited. We've been working on our help pages and expect we will be editing these further once we have completed our user testing. We have also created another static page to act as a glossary of archival terms. Although one of AtoM's big selling points for us was the fact it was aligned with archival standards and terminology, we are concerned that our users may struggle with some of the language used.

I is for ISDIAH

Within AtoM the archival descriptions from an institution all link back to an ISDIAH record that describes the archival institution. This record is useful for users of our data, whether browsing within the AtoM interface directly or through aggregators.

We have had some internal debate on the extent  to which we should replicate information that is on our website, but have decided that providing links to the relevant content would be better in many cases. For information about access and opening hours and the extent of our holdings, we want to ensure that the information is accurate and up to date, and having another place where this information would need to be edited adds an extra overhead.

J is for Just Start!

For a while we were stuck in a chicken and egg situation. Not sure how to use AtoM until it was set up properly and ready to go, and not sure how to set it up until we had started using it and fully understood the issues we would encounter.

Reading the documentation is essential but testing and experimenting with AtoM are really the best ways of working it out. Only by importing different datasets into AtoM or by creating new ones direct into the web form did we really understand how it worked and how this impacted on our own internal workflows. Learn by doing!

K is for Kittens (because they are never really free)

AtoM is open source and freely available for all. However, Artefactual Systems who support it stress it is “free as in free kittens”. In other words, you can have AtoM for free but it isn’t cost neutral - you need someone to install it, manage the server, configure it, and administer it. Populating it is also going to require a huge outlay of staff time.

On top of this, there will undoubtedly be things that you want AtoM to do that it doesn't yet do. If you are implementing AtoM, have a budget for funding further developments. Sponsored developments will then benefit the wider AtoM community and together we can make AtoM better and better. Quite early on in our AtoM implementation project we funded a small piece of work to include covering dates within the accessions module of AtoM as we felt that this was important information to record during the accessioning process and we did not want to lose this data from our existing accessions records when we imported them into the system. Of course we are hoping this feature will also be valuable for other AtoM users. There will undoubtedly be other feature developments we will sponsor in the future.

L is for Local Guidance

One of AtoM’s key selling points to us was the fact that it was created in association with the International Council on Archives (ICA) and is closely aligned with their metadata standards. There is however still a need for local guidance on how we intend to use some of these metadata fields.

In response to this we have created our own AtoM handbook to sit alongside the documentation that Artefactual provides. The handbook doesn't duplicate the official documentation, but describes our local procedures and requirements for data entry. This is all the more necessary given the fact that the majority of the data fields within AtoM are free text fields. With multiple users entering data into AtoM, it is important to have local guidance to ensure we maintain some consistency in the way we describe our archives.

M is for MySQL access

When we initially assessed AtoM against our requirements for an archival management system, it performed well but it didn't do everything we needed it to do.  Searching and reporting functionality within AtoM does not currently meet all of our needs. It was considered essential then that we had another method of querying the data within AtoM and producing reports and statistics. To do this, we need access to the MySQL database that sits behind AtoM.

Access to the the data via a free tool (I use Squirrel but there are other options out there) and a working knowledge of Structured Query Language allows you to do pull out exactly the data you require.

AtoM has quite a complex and involved data structure so getting to grips with this was a bit of a learning curve, but having now got a working query to enable me to extract an annual summary of all accessions we have received over a given year I feel ready for the next challenge that is thrown my way!

N is for Not Perfect

AtoM (like all complex systems) has its limitations. It ticks many boxes for us but it does not tick them all. There are several areas where we think it could improve and we have been discussing these with the user community and developers and hope to influence its roadmap. As with all open source solutions, rather than complaining about what it doesn't do well, the user community should be working together to solve problems and support improvements. AtoM is not perfect but we are confident that it is moving in the right direction and getting better all the time.

O is for Objects (digital ones!)

One of the main reasons I got involved with AtoM implementation was because I wanted a stable base to build a digital archive on – a single point of truth about our holdings and a single system through which our users could access information about our holdings. Being able to expose access copies of our born digital archives and digitised content via AtoM is something we haven’t yet explored in full but this work will become a priority over the next couple of years. Once AtoM is launched I will be turning my attention back to Archivematica in order to help get this moving.

P is for Populating AtoM

This is undoubtedly the biggest challenge we have. Over the course of the 60 years we have been in existence, the Borthwick has created a wealth of catalogues and finding aids. Of course, these are in a range of different formats and states of completion. Some are digital, some are not. Of the digital ones, some are structured data and some are not. Some comply to modern archival standards and some don’t. Some are complete but some do not include information about more recent accruals to the archives. Just working out the current state of play is a challenge in itself.

Being both pragmatic and realistic about what is achievable is a good place to start. Getting all of this information into AtoM is a huge task and not something we can do quickly. While we have managed to enter some full finding aids into AtoM, we have not had the staff time to do as much as we would have liked. What we have prioritised though, is the creation of a collection level description for each archive that we hold and this is being achieved through Project Genesis.

Populating AtoM with our accessions data was also not without its problems but now this has been achieved we are able to browse and search all of our accessions data in one place for the first time - a really important step for us!

Q is for Quality

In an ideal world, all our data within AtoM would be of a high quality.

...but we do not live in an ideal world.

Accepting that legacy data will not always meet current standards or be as accurate as we would like is key to moving forward with a system such as this.

We are striving for a full range of high quality and standards compliant finding aids within AtoM but difficult decisions have to be made. Is it better to expose a small number of perfect catalogues or a larger number of catalogues that don’t contain all the mandatory ISAD(G) fields? The second option gets my vote.

R is for Reference Codes

Quite early on, we had to make a decision about whether or not to inherit reference codes. This is a setting you can change within the admin section of AtoM and a very important one to give some thought to before you go too far down the data entry or import route.

AtoM can either be set up so that you enter the full reference code for each level of the hierarchy of archival description, or it can be set up to inherit previous levels of its reference code depending on its position within the hierarchy.

There is no right or wrong answer here and each institution will need to work out what will suit them best.  It can be hard to make a decision like this at the point where you are just starting out. Until you start to use AtoM in earnest you may not understand the full implications of your decision. Having initially agreed internally that we were going to inherit the reference code to save time with data entry and help guard against human errors, we subsequently changed our minds and decided not to inherit. This decision was influenced heavily by the way AtoM displays the reference numbers to the end user and how the archival hierarchy appears on the left side of the interface. We wanted the full reference to be displayed alongside each element of the hierarchy to help our users interpret the data and more easily see how the different levels relate to each other.

Time will tell whether we've made the right decision or not, but I imagine that once we have a substantial quantity of data within AtoM, this will become a harder decision to change!

S is for Session Timeout

Beware the inactive session timeout! AtoM times out by default after 30 minutes of inactivity. This has caused us problems when creating detailed descriptions within AtoM. If completing the Scope and Content field for a large and complex archive, it is necessary to spend some time consulting the physical archives and composing a description. Colleagues sometimes found that by the time they came to save their record the session had timed out. Naturally this was the source of great frustration.

We experimented with trying to extend the inactive session timeout period but these efforts were not successful. To avoid data loss we do encourage staff to regularly save their work. A text editor can also be used to compose descriptions. With an autosave function and no timeout, data is safer here and can be pasted into AtoM once it is complete.

T is for Training

Artefactual Systems offer introductory training sessions in AtoM and delivered one of these to Borthwick staff via WebEx at the start of our implementation project. This was well worth the expense, ensuring that staff understood the capabilities of the system and had a basic grounding in how to use it. I had my reservations about how well a training session via WebEx would work, but needn't have worried on that score. We heard Sarah Romkey from Artefactual Systems in Canada loud and clear and she was able to maintain a high level of enthusiasm throughout the session despite the fact that we had got her out of bed very early in the morning.

Training is not just a one off exercise. Now we are further along in our AtoM implementation we will be arranging further staff training to focus more on our local use of AtoM and internal processes and workflows.

U is for User Profiles and Roles

We have been giving some thought to who needs to do what within AtoM.

  • Who should have access to the import and export functions?
  • Who will be able to add new users to the system?
  • Who needs the ability to edit the static pages?
  • Who can publish and delete archival descriptions?
  • Who can change the accessions counter?

We are keen that AtoM is widely used by our staff and want to ensure that everyone has the necessary permissions to be able to carry out their work. User roles may evolve over time but some initial decisions do need to be made in the early stages of implementation.

V is for Volunteers

Prior to release of AtoM we have been calling for volunteers from our user base to help us test AtoM and give us their feedback.

We have put a lot of work into getting our AtoM instance ready to release and we have had our users in mind at many stages of the process. We now need to find out whether we have got it right. User testing is ongoing and we envisage we will be making some changes to AtoM once the feedback is collated.

We are really looking forward to seeing what people think.

W is for Web Address

We have made some decisions about the web address we will use for our production version of AtoM. The default url had ‘atom’ in it, but we wanted to change this to something more meaningful. AtoM means something to us and perhaps to other archives professionals but not to our users.

So, we have replaced ‘atom’ with something more descriptive and meaningful to our users – we will be plastering this url over the bookmarks and other publicity we are creating for our scheduled launch date so we want to get it right!

X is for XML

We do not want people to have to come to us to find out what we hold, we want our data to be signposted as widely as possible via other portals and aggregators both nationally and internationally. By doing so we facilitate serendipitous discovery and attract new users.

To this end we have been talking to external aggregators such as the Archives Hub to find out whether our AtoM data can be incorporated into their portal. We have been exporting sample data as EAD XML files so that the Archives Hub can assess it and see if it can be incorporated into their portal. A few initial problems with the EAD that AtoM creates have been ironed out and we are moving closer to being able to make this a reality over the next few months.

Y is for YorSearch

One of the features of AtoM we have been looking at before launch is the OAI-PMH functionality. We have used this to enable our AtoM data to be surfaced as simple Dublin Core metadata via our University Library Catalogue, YorSearch. It will be interesting to see whether students and staff members from the University (who may not have thought to consult our catalogue directly) will be approaching us in the future to consult our archives.

So, these are some of the things we have been thinking about and working on over the last year or so whilst moving our AtoM implementation from idea to reality. Hopefully it is of use to others who are embarking on the same process.

And of course, watch this space for news of launch!

* Actually an A-Y ...did anyone notice that there was no letter 'Z'?

Wednesday, 10 February 2016

I'll show you my research data if you show me yours...

My research data
A few months ago I was having a clear out at home and came across a bunch of floppy disks in the drawer of my bedside table.

This is my research data...

Actually, that is not strictly true. I did a taught masters course and my research consisted of just a short dissertation at the end of the course. Most of these disks contain files from the taught element of my course and the subsequent dissemination of results. 

I published a paper at the end of the masters on the findings of my dissertation. 

If you are interested in the placement of
Iron Age hillforts in the landscape then
this is the book to look for.
No-one has since approached me and asked if they can see the data that underlies this publication

...but this was the 1990's! 

Times are different now. We expect our researchers to be able to produce the data and share it (where appropriate) so that others can build on their research. 

I'm now involved in teaching researchers here at York about Research Data Management (RDM) and how they should look after their data for future re-use.

When I created and stored this data I was not a digital archivist. I had no idea I would become a digital archivist. I like to think I would have managed my data differently if I had known more. 

Let's start with documentation. Much of the documentation for this data is what is actually written on the disk labels. I gave myself a little pat on the back for having recorded what was on the floppies so well on *most* of the disks. This of course was particularly useful in those days. File names were restricted to 8.3 characters so very little detail about the files could be incorporated into the name. Documenting things on disk labels helps add a bit of context. All well and good until you notice the disk on the far right with no label at all. This one remains a mystery!

So what are the issues here. First and most obviously, as a student in the 90's I was using cutting edge storage technology - the floppy disk! Can we read these today? Yes and no. Floppy disks fall firmly into the category of 'obsolete media' which is a topic that we digital archivists like to talk about. I found I could read about a quarter of these using the USB floppy reader that is attached to my PC. For the others I saw a lot of error messages like this:

The answer is "No"!

Fortunately I had more success using an old PC I keep in my office for the very purpose of reading old floppy disks - all but two of the floppies could be read and copied using this PC. On one disk I could view the list of files on it but couldn't copy all of them off the disk so I considered this to be a partial success. The one disk which I couldn't access at all was interestingly the one with no label. Perhaps this mystery disk was in fact never formatted or put into active use. 

Not too bad a result so far?

So what about the contents of the disks?

The contents of one of the floppy disks. Windows Explorer identifies the DOC files as
Microsoft Word 97-2003 but they are likely to be an earlier version of Word than this

As mentioned above the file and folder naming is noticeably brief (as is the way with media from this period). Today we talk to our researchers about the importance of naming files in such a way that you know what it is before you double click on it. This was near on impossible when faced with only 8 characters. I created this data but have no idea what I might expect to find in a directory called 'DISTEX' (though the label on the disk does help give a clue).

Note too the lack of organisation of the contents. At the end of my masters degree whilst finishing off my papers and publications I was also clearly focusing on what my next steps would be. Personal data (my CV for job applications) is stored alongside data relating to my research*. This again is something we discourage when we talk to researchers about data management. It is much easier when working with filestore to organise and categorise data more effectively, keeping personal data separate from research data. We have come a long way since the days when we were squashing any files that would fit on to a floppy disk regardless of content or context.

Here is some data on another of the disks (viewed in Windows Explorer as tiles). I have no idea what possessed me to store scanned photographs as GIF images. They look terrible! Did they always look this bad? Choosing the right file format is something we also cover in our RDM training and though file size is still a consideration for today's research students, at least they don't have to try and fit numerous images for one presentation on a single floppy disk.

More coded file names - this was a necessity when you had so few characters available.
I still remember what these mean but very much doubt anyone else would.

Some are my files are fairly easy to read, others less so (more detective work is required to find the right software). The Word documents are OK but come up in 'Protected View' (which means I'm not allowed to edit them). The default settings here are to treat a Word 6 or 95 document with suspicion but this can be easily resolved by editing these settings.

These old MS Word docs are still readable (and editable if I change the policy settings)

So, digging out my old research data has been an interesting diversion. I now use this as an example at the beginning of RDM teaching sessions and ask the students to imagine how their research data might look 20 years from now. 

Another added bonus from this exercise is that I now have even more files that I play with as I test Archivematica and file identification tools.

*Interesting to note that a first (unsuccessful) attempt to get a job in York occurred in 1998. I got here 5 years later!