Friday, 18 May 2018

UK Archivematica meeting at Westminster School

Yesterday the UK Archivematica user group meeting was held in the historic location of Westminster School in central London.

A pretty impressive location for a meeting!
(credit: Elizabeth Wells)

In the morning once fuelled with tea, coffee and biscuits we set about talking about our infrastructures and workflows. It was great to hear from a range of institutions and how Archivematica fits into the bigger picture for them. One of the points that lots of attendees made was that progress can be slow. Many of us were slightly frustrated that we aren't making faster progress in establishing our preservation infrastructures but I think it was a comfort to know that we were not alone in this!

I kicked things off by showing a couple of diagrams of our proposed and developing workflows at the University of York. Firstly illustrating our infrastructure for preserving and providing access to research data and secondly looking at our hypothetical workflow for born digital content that comes to the Borthwick Institute.

Now our AtoM upgrade is complete and that Archivematica 1.7 has been released, I am hoping that colleagues can set up a test instance of AtoM talking to Archivematica that I can start to play with. In a parallel strand, I am encouraging colleagues to consider and document access requirements for digital content. This will be invaluable when thinking about what sort of experience we are trying to implement for our users. The decision is yet to be made around whether AtoM and Archivematica will meet our needs on their own or whether additional functionality is needed through an integration with Fedora and Samvera (the software on which our digital library runs)...but that decision will come once we better understand what we are trying to achieve and what the solutions offer.

Elizabeth Wells from Westminster School talked about the different types of digital content that she would like Archivematica to handle and different workflows that may be required depending on whether it is born digital or digitised content, whether a hybrid or fully digital archive and whether it has been catalogued or not. She is using Archivematica alongside AtoM and considers that her primary problems are not technical but revolve around metadata and cataloguing. We had some interesting discussion around how we would provide access to digital content through AtoM if the archive hadn't been catalogued.

Anna McNally from the University of Westminster reminded us that information about how they are using Archivematica is already well described in a webinar that is now available on YouTube: Work in Progress: reflections on our first year of digital preservation. They are using the PERPETUA service from Arkivum and they use an automated upload folder in NextCloud to move digital content into Archivematica. They are in the process of migrating from CALM to AtoM to provide access to their digital content. One of the key selling points of AtoM for them is it's support for different languages and character sets.

Chris Grygiel from the University of Leeds showed us some infrastructure diagrams and explained that this is still very much a work in progress. Alongside Archivematica, he is using BitCurator to help appraise the content and EPrints and EMU for access.

Rachel MacGregor from Lancaster University updated us on work with Archivematica at Lancaster. They have been investigating both Archivematica and Preservica as part of the Jisc Research Data Shared Service pilot. The system that they use has to be integrated in some way with PURE for research data management.

After lunch in the dining hall (yes it did feel a bit like being back at school),
Rachel MacGregor (shouting to be heard over the sound of the bells at Westminster) kicked off the afternoon with a presentation about DMAonline. This tool, originally created as part of the Jisc Research Data Spring project, is under further development as part of the Jisc Research Data Shared Service pilot.

It provides reporting functionality for a range of systems in use for research data management including Archivematica. Archivematica itself does not come with advanced reporting functionality - it is focused on the primary task of creating an archival information package (AIP).

The tool (once in production) could be used by anyone regardless of whether they are part of the Jisc Shared Service or not. Rachel also stressed that it is modular - though it can gather data from a whole range of systems, it could also work just with Archivematica if that is the only system you are interested in reporting on.

An important part of developing a tool like this is to ensure that communication is clear - if you don’t adequately communicate to the developers what you want it to do, you won’t get what you want. With that in mind, Rachel has been working collaboratively to establish clear reporting requirements for preservation. She talked us through these requirements and asked for feedback. They are also available online for people to comment on:

Sean Rippington from the University of St Andrews talked us through some testing he has carried out, looking at how files in SharePoint could be handled by Archivematica. St Andrews are one of the pilot organisations for the Jisc Research Data Shared Service, and they are also interested in the preservation of their corporate records. There doesn’t seem to be much information out there about how SharePoint and Archivematica might work together, so it was really useful to hear about Sean’s work.

He showed us inside a sample SharePoint export file (a .cmp file). It consisted of various office documents (the documents that had been put into SharePoint) and other metadata files. The office documents themselves had lost much of their original metadata - they had been renamed with a consecutive number and given a .DAT file extension. The date last modified had changed to the date of export from SharePoint. However, all was not lost, a manifest file was included in the export and contained lots of valuable metadata, including the last modified date, the filename, the file extension and the name of the person who created file and last modified it.

Sean tried putting the .cmp file through Archivematica to see what happens. He found that Archivematica correctly identified the MS Office files (regardless of change of file extension) but obviously the correct (original) metadata was not associated with the files. This continued to be stored in the associated manifest file. This has potential for confusing future users of the digital archive - the metadata gives useful context to the files and if hidden in a separate manifest file it may not be discovered.

Another approach he took was to use the information in the manifest file to rename the files and assign them with their correct file extensions before pushing them into Archivematica. This might be a better solution in that the files that will be served up in the dissemination information package (DIP) will be named correctly and be easier for users to locate and understand. However, this was a manual process and probably not scalable unless it could be automated in some way.

He ended with lots of questions and would be very glad to hear from anyone who has done further work in this area.

Hrafn Malmquist from the University of Edinburgh talked about his use of Archivematica’s appraisal tab and described a specfic use case for Archivematica which had specific requirements. The records of the University court have been deposited as born digital since 2007 and need to be preserved and made accessible with full text searching to aid retrieval. This has been achieved using a combination of Archivematica and DSpace and by adding a package.csv file containing appropriate metadata that can be understood by DSpace.

Laura Giles from the University of Hull described ongoing work to establish a digital archive infrastructure for the Hull City of Culture archive. They had an appetite for open source and prior experience with Archivematica so they were keen to use this solution, but they did not have the in-house resource to implement it. Hull are now working with CoSector at the University of London to plan and establish a digital preservation solution that works alongside their existing repository (Fedora and Samvera) and archives management system (CALM). Once this is in place they hope to use similar principles for other preservation use cases at Hull.

We then had time for a quick tour of Westminster School archives followed by more biscuits before Sarah Romkey from Artefactual Systems joined us remotely to update us on the recent new Archivematica release and future plans. The group is considering taking her up on her suggestion to provide some more detailed and focused feedback on the appraisal tab within Archivematica - perhaps a task for one of our future meetings.

Talking of future meetings ...we have agreed that the next UK Archivematica meeting will be held at the University of Warwick at some point in the autumn.

Friday, 4 May 2018

The anatomy of an AtoM upgrade

Yesterday we went live with our new upgraded production version of AtoM.

We've been using AtoM version 2.2 since we first unveiled the Borthwick Catalogue to the world two years ago. Now we have finally taken the leap to version 2.4.

We are thrilled to benefit from some of the new features - including the clipboard, being able to search by date range and the full width treeview. Of course we are also keen to test the work we jointly sponsored last year around exposing EAD via OAI-PMH for harvesting.

But what has taken us so long you might ask?

...well, upgrading AtoM has been a new experience for us and one that has involved a lot of planning behind the scenes. The technical process of upgrading has been ably handled by our systems administrator. Much of his initial work behind the scenes has been on 'puppetising' AtoM to make it easier to manage multiple versions of AtoM going forward. In this post though I will focus on the less technical steps we have taken to manage the upgrade and the decisions we have made along the way.

Checking the admin settings

One of the first things I did when I was given a test version of 2.4 to play with was to check out all of the admin settings to see what had changed.

All of our admin settings for AtoM are documented in a spreadsheet alongside a rationale for our decisions. I wanted to take some time to understand the new settings, read the documentation and decide what would work for us.

Some of these decisions were taken to a meeting for a larger group of staff to discuss. I've got a good sense of how we use AtoM but I am not really an AtoM user so it was important that others were involved in the decision making.

Most decisions were relatively straightforward and uncontroversial but the one that we spent most time on was deciding whether or not to change the slugs...


In AtoM, the 'slug' is the last element of the url for each individual record within the catalogue - it has to be unique so that all the urls go to the right place. In previous versions of AtoM the slugs were automatically generated from the title of each record. This led to some interesting and varied urls.

  • Some of them were really long - if the title of the record was really long
  • Some of them were short and very cryptic - if the record hadn't been given a title prior to the first save
  • Many of our titles are not unique - for example, we have lots of records simply called 'correspondence' in the catalogue. Where titles are not unique, AtoM will use the title and then append it with a number in order to create a unique slug (eg: correspondence-150)

Slugs are therefore hard to predict ...and it is not always possible to look at a slug and know which archive it refers to.

This possibly doesn't matter, but could become an issue for us in the future should we wish to carry out more automated data manipulation or system integrations.

AtoM 2.4 now allows you to choose which fields your slugs are generated from. We have decided that it would be better if ours were generated from the identifier of the record rather than the title. The reason being that identifiers are generally quite short and sweet and of course should be unique (though we recently realised that this isn't enforced in AtoM).

But of course this is not a decision that can be taken lightly. Our catalogue has been live for 2 years now and users will have set up links and bookmarks to particular records within it. On balance we decided that it would be better to change the slugs and do our best to limit the impact on users.

So, we have changed the admin setting to ensure future slugs are generated using the identifier. We have run a script provided by Artefactual Systems that changed all the slugs that are already in the database. We have set up a series of redirects from all the old urls of top level descriptions in the catalogue to the new urls (note that having had a good look at the referrer report in Google Analytics it was apparent that external links to the catalogue generally point at top level descriptions).

Playing and testing

It was important to do a certain amount of testing and playing around with AtoM 2.4 and it was important that it wasn't just myself who did this - I encouraged all my colleagues to also have a go.

First I checked the release notes for versions 2.3 and 2.4 so I had a good sense of what had changed and where I should focus my attention. I was then able to test these new features and direct colleagues to them as appropriate for further testing or discussion.

While doing so, I tried to think about whether any of these changes would necessitate changes in our workflows and processes or updates to our staff handbook.

As an example - it was noted that there was a new field to record occupations for authority records. Rather than letting individuals to decide how to use this field, it is important to agree an institutional approach and consider an appropriate methodology or taxonomy. As it happens, we have decided not to use this field for the time being and this will be documented accordingly.

Assessing known bugs

Being a bit late to the upgrade party gives us the opportunity to assess known bugs and issues with a release. I spent some time looking at Artefactual's issues log for AtoM and establish if any of them were going to cause us major problems or required a workaround to be put in place.

There are lots of issues recorded and I looked through many of them (but not all!). Fortunately, very few looked like they would have an impact on us. Most related to functionality we don't utilise - such as the ability to use AtoM with multiple institutions or translate it into multiple languages.

The one bug that I thought would be irritating for us was related to the accessions counter which was not incrementing in version 2.4. Having spent a bit of time testing, it seemed that this wasn't a deal breaker for us and there was a workaround we could put in place to enable staff to continue to create accession records with a unique identifier relatively easily.

Testing local workarounds

Next I tested one of the local workarounds we have for AtoM. We use a CSS print stylesheet to help us to generate an accessions report to send donors and depositors to confirm receipt of an archive. This still worked in the new version of AtoM with no issues. Hoorah!

Look and feel

We gave a bit of thought to how AtoM should be styled. Two years ago we went live with a slightly customised version of the Dominion theme. This had been styled to look similar to our website (which at the time was branded orange).

In the last year, the look and feel of the University website has changed and we are no longer orange! Some thought needed to be given to whether we should change the look of our catalogue now to keep it consistent with our website. After some discussion it was agreed that our existing AtoM theme should be maintained for the time being.

We did however think it was a good idea to adopt the font of the University website, but when we tested this out on our AtoM instance it didn't look as that decision was quickly reversed.

Usability testing

When we first launched our catalogue we carried out a couple of rounds of user testing (read about it here and here) but this was quite a major piece of work and took up a substantial amount of staff time.

With this upgrade we were keen to give some consideration to the user experience but didn't have resource to invest in more user testing.

Instead we recruited the Senior User Experience Designer at our institution to cast his eye over our version of AtoM 2.4 and give us some independent feedback on usability and accessibility. It was really useful to get a fresh pair of eyes to look at our site, but as this could be a whole blog post in itself so I won't say anymore this space!

Updating our help pages

Another job was to update both the text and the screenshots on our static help pages within AtoM. There have been several changes since 2.2 and some of these are reflected in the look and feel of the interface. 

The advanced search looks a bit different in version 2.4 - here is the refreshed screenshot for our help pages

We were also keen to add in some help for our users around the clipboard feature and to explain how the full width treeview works.

The icons for different types of information within AtoM have also been brought out more strongly in this version, so we also wanted to flag up what these meant for our users.

...and that reminds me, we really do need a less Canada-centric way to indicate place!

Updating our staff handbook

Since we adopted AtoM a few years ago we have developed a whole suite of staff manuals which record how we use AtoM, including tips for carrying out certain procedures and information about what to put in each field. With the new changes brought in with this upgrade, we of course had to update our internal documentation.

When to upgrade?

As we drew ever closer to our 'go live' date for the upgrade we were aware that Artefactual were busy preparing their 2.4.1 bug fix release. We were very keen to get the bug fixes (particularly for that accessions counter bug that I mentioned) but were not sure how long we were prepared to wait.

Luckily with helpful advice from Artefactual we were able to follow some instructions from the user forum and install from the GitHub code repository instead of the tarball download on the website. This meant we could benefit from those bug fixes that were already stable (and pull others to test as they become available) without having to wait for the formal 2.4.1 release.

No need to delay our upgrade further!

As it happens it was good news we upgraded when we did. The day before the upgrade we hit a bug in version 2.2 during a re-index of elasticsearch. Nice to know we had a nice clean version of 2.4 ready to go the next day!

Finishing touches

On the 'go live' date we'd put word around to staff not to edit the catalogue while we did the switch. Our systems administrator got all the data from our production version of 2.2 freshly loaded into 2.4, ran the scripts to change the slugs and re-indexed the database. I just needed to do a few things before we asked IT to do the Domain Name System switch.

First I needed to check all the admin settings were right - a few final tweaks were required here and there. Second I needed to load up the Borthwick logo and banner to our archival institution record. Thirdly I needed to paste the new help and FAQ text into the static pages (I already had this prepared and saved elsewhere).

Once the DNS switch was done we were live at last! 

Sharing the news

Of course we wanted to publicise the upgrade to our users and tell them about the new features that it brings.

We've put AtoM back on the front page of our website and added a news item.

Let's tell the world all about it, with a catalogue banner and news item

My colleague has written a great blog post aimed at our users and telling them all about the new features, and of course we've all been enthusiastically tweeting!

...and a whole lot of tweeting

Future work

The upgrade is done but work continues. We need to ensure harvesting to our library catalogue still works and of course test out the new EAD harvesting functionality. Later today we will be looking at Search Engine Optimisation (particularly important since we changed our slugs). We also have some remaining tasks around finding aids - uploading pdfs of finding aids for those archives that aren't yet fully catalogued in AtoM using the new functionality in 2.4.

But right now I've got a few broken links to fix...

Friday, 20 April 2018

The 2nd UK AtoM user group meeting

I was pleased to be able to host the second meeting of the UK AtoM user group here in York at the end of last week. AtoM (or Access to Memory) is the Archival Management System that we use here at the Borthwick Institute and it seems to be increasing in popularity across the UK.

We had 18 attendees from across England, Scotland and Wales representing both archives and service providers. It was great to see several new faces and meet people at different stages of their AtoM implementation.

We started off with introductions and everyone had the chance to mention one recent AtoM triumph and one current problem or challenge. A good way to start the conversation and perhaps a way of considering future development opportunities and topics for future meetings.

Here is a selection of the successes that were mentioned:

  • Establishing a search facility that searches across two AtoM instances
  • Getting senior management to agree to establishing AtoM
  • Getting AtoM up and running
  • Finally having an online catalogue
  • Working with authority records in AtoM
  • Working with other contributors and getting their records displaying on AtoM
  • Using the API to drive another website
  • Upgrading to version 2.4
  • Importing legacy EAD into AtoM
  • Uploading finding aids into AtoM 2.4
  • Adding 1000+ urls to digital resources into AtoM using a set of SQL update statements

...and here are some of the current challenges or problems users are trying to solve:
  • How to bar code boxes - can this be linked to AtoM?
  • Moving from CALM to AtoM
  • Not being able to see the record you want to link to when trying to select related records
  • Using the API to move things into an online showcase
  • Advocacy for taking the open source approach
  • Working out where to start and how best to use AtoM
  • Sharing data with the Archives Hub
  • How to record objects alongside archives
  • Issues with harvesting EAD via OAI-PMH
  • Building up the right level of expertise to be able to contribute code back to AtoM
  • Working out what to do when AtoM stops working
  • Discovering that AtoM doesn't enforce uniqueness in identifiers for archival descriptions

After some discussion about some of the issues that had been raised, Louise Hughes from the University of Gloucestershire showed us her catalogue and talked us through some of the decisions they had made as they set this up. 

The University of Gloucestershire's AtoM instance

She praised the digital object functionality and has been using this to add images and audio to the archival descriptions. She was also really happy with the authority records, in particular, being able to view a person and easily see which archives relate to them. She discussed ongoing work to enable records from AtoM to be picked up and displayed within the library catalogue. She hasn't yet started to use AtoM for accessioning but hopes to do so in the future. Adopting all the functionality available within AtoM needs time and thought and tackling it one step at a time (particularly if you are a lone archivist) makes a lot of sense.

Tracy Deakin from St John's College, Cambridge talked us through some recent work to establish a shared search page for their two institutional AtoM instances. One holds the catalogue of the college archives and the other is for the Special Collections Library. They had taken the decision to implement two separate instances of AtoM as they required separate front pages and the ability to manage the editing rights separately. However, as some researchers will find it helpful to search across both instances a search page has been developed that accesses the Elasticsearch index of each site in order to cross search.

The interface for a shared search across St John's College AtoM sites

Vicky Phillips from the National Library of Wales talked us through their processes for upgrading their AtoM instance to version 2.4 and discussed some of the benefits of moving to 2.4. They are really happy to have the full width treeview and the drag and drop functionality within it.

The upgrade has not been without it's challenges though. They have had to sort out some issues with invalid slugs, ongoing issues due to the size of some of their archives (they think the XML caching functionality will help with this) and sometimes find that MySQL gets overwhelmed with the number of queries and needs a restart. They still have some testing to do around bilingual finding aids and have also been working on testing out the new functionality around OAI PMH harvesting of EAD.

Following on from this I gave a presentation on upgrading AtoM to 2.4 at the Borthwick Institute. We are not quite there yet but I talked about the upgrade plan and process and some decisions we have made along the way. I won't say any more for the time being as I think this will be the subject of a future blog post.

Before lunch my colleague Charles Fonge introduced VIAF (Virtual International Authority File) to the group. This initiative will enable Authority Records created by different organisations across the world to be linked together more effectively. Several institutions may create an authority record about the same individual and currently it is difficult to allow these to be linked together when data is aggregated by services such as The Archives Hub. It is worth thinking about how we might use VIAF in an AtoM context. At the moment there is no place to store a VIAF ID in AtoM and it was agreed this would be a useful development for the future.

After lunch Justine Taylor from the Honourable Artillery Company introduced us to the topic of back up and disaster recovery of AtoM. She gave the group some useful food for thought, covering techniques and the types of data that would need to be included (hint: it's not solely about the database). This was particularly useful for those working in small institutions who don't have an IT department that just does all this for them as a matter of course. Some useful and relevant information on this subject can be found in the AtoM documentation.

Max Communications are a company who provide services around AtoM. They talked through some of their work with institutions and what services they can offer.  As well as being able to provide hosting and support for AtoM in the UK, they can also help with data migration from other archival management systems (such as CALM). They demonstrated their crosswalker tool that allows archivists to map structured data to ISAD(G) before import to AtoM.

They showed us an AtoM theme they had developed to allow Vimeo videos to be embedded and accessible to users. Although AtoM does have support for video, the files can be very large in size and there are large overheads involved in running a video server if substantial quantities are involved. Keeping the video outside of AtoM and managing the permissions through Vimeo provided a good solution for one of their clients.

They also demonstrated an AtoM plugin they had developed for Wordpress. Though they are big fans of AtoM, they pointed out that it is not the best platform for creating interesting narratives around archives. They were keen to be able to create stories about archives by pulling in data from AtoM where appropriate.

At the end of the meeting Dan Gillean from Artefactual Systems updated us (via Skype) about the latest AtoM developments. It was really interesting to hear about the new features that will be in version 2.5. Note, that none of this is ever a secret - Artefactual make their road map and release notes publicly available on their wiki - however it is still helpful to hear it enthusiastically described.

The group was really pleased to hear about the forthcoming audit logging feature, the clever new functionality around calculating creation dates, and the ability for users to save their clipboard across sessions (and share them with the searchroom when they want to access the items). Thanks to those organisations that are funding this exciting new functionality. Also worth a mention is the slightly less sexy, but very valuable work that Artefactual is doing behind the scenes to upgrade Elasticsearch.

Another very useful meeting and my thanks go to all who contributed. It is certainly encouraging to see the thriving and collaborative AtoM community we have here in the UK.

Our next meeting will be in London in the autumn.

Back to the classroom - the Domesday project

Yesterday I was invited to speak to a local primary school about my job. The purpose of the event was to inspire kids to work in STEM subjects (science, technology, engineering and maths) and I was faced with an audience of 10 and 11 year old girls.

One member of the audience (my daughter) informed me that many of the girls were only there because they had been bribed with cake.

This could be a tough gig!

On a serious note, there is a huge gender imbalance in STEM careers with women only making up 23% of the workforce in core STEM occupations. In talking to the STEM ambassador who was at this event, it was apparent that recruitment in engineering is quite hard, with not enough boys OR girls choosing to work in this area. This is also true in my area of work and is one of the reasons we are involved in the "Bridging the Digital Gap" project led by The National Archives. They note in a blog post about the project that:

"Digital skills are vital to the future of the archives sector ...... if archives are going to keep up with the pace of change, they need to attract members of the workforce who are confident in using digital technology, who not only can use digital tools, but who are also excited and curious about the opportunities and challenges it affords."

So why not try and catch them really young and get kids interested in our profession?

There were a few professionals speaking at the event and subjects were varied and interesting. We heard from someone who designed software for cars (who knew how many different computers are in a modern car?), someone who had to calculate exact mixes of seed to plant in Sites of Special Scientific Interest in order to encourage the right wild birds to nest there, a scientist who tested gelatin in sweets to find out what animal it was made from, an engineer who uses poo to heat houses....I had some pretty serious competition!

I only had a few minutes to speak so my challenge was to try and make digital preservation accessible, interesting and relevant in a short space of time. You could say that this was a bit of an elevator pitch to school kids.

Once I got thinking about this I had several ideas of different angles I could take.

I started off looking at the Mount School Archive that is held at the Borthwick. This is not a digital archive but was a good introduction to what archives are all about and why they are interesting and important. Up until 1948 the girls at this school created their own school magazine that is beautifully illustrated and gives a fascinating insight into what life was like at the school. I wanted to compare this with how schools communicate and disseminate information today and discuss some of the issues with preserving this more modern media (websites, twitter feeds, newsletters sent to parents via email).

Several powerpoint slides down the line I realised that this was not going to be short and snappy enough.

I decided to change my plans completely and talk about something that they may already know about, the Domesday Book.

I began by asking them if they had heard of the Domesday Book. Many of them had. I asked what they knew about it. They thought it was from 1066 (not far off!), someone knew that it had something to do with William the Conqueror, they guessed it was made of parchment (and they knew that parchment was made of animal skin). They were less certain of what it was actually for. I filled in the gaps for them.

I asked them whether they thought this book (that was over 900 years old) could still be accessed today and they weren't so sure about this. I was able to tell them that it is being well looked after by The National Archives and can still be accessed in a variety of ways. The main barrier to understanding the information is that it is written in Latin.

I talked about what the Domesday Book tells us about our local area. A search on Open Domesday tells us that Clifton only had 12 households in 1086. Quite different from today!

We then moved forward in time, to a period of history known as 'The 1980's' (a period that the children had recently been studying at school - now that makes me feel old!). I introduced them to the BBC Domesday Project of 1986. Without a doubt one of digital preservation's favourite case studies!

I explained how school children and communities were encouraged to submit information about their local areas. They were asked to include details of everyday life and anything they thought might be of interest to people 1000 years from then. People took photographs and wrote information about their lives and their local area. The data was saved on to floppy disks (what are they?) and posted to the BBC (this was before email became widely available). The BBC collated all the information on to laser disc (something that looks a bit like a CD but with a diameter of about 30cm).

I asked the children to consider the fact that the 900 year old Domesday Book is still accessible and  think about whether the 30 year old BBC Domesday Project discs were equally accessible. In discussion this gave me the opportunity to finally mention what digital archivists do and why it is such a necessary and interesting job. I didn't go into much technical detail but all credit to the folks who actually rescued the Domesday Project data. There is lots more information here.

Searching the Clifton and Rawcliffe area on Domesday Reloaded

Using the Domesday Reloaded website I was then able to show them what information is recorded about their local area from 1986. There was a picture of houses being built, and narratives about how a nearby lake was created. There were pieces written by a local school child and a teacher describing their typical day. I showed them a piece that was written about 'Children's Crazes' which concluded with:

" Another new activity is break-dancing
 There is a place in York where you can
 learn how to break-dance. Break     
 dancing means moving and spinning on
 the floor using hands and body. Body-
 popping is another dance craze where
 the dancer moves like a robot."

Disappointingly the presentation didn't entirely go to plan - my powerpoint only partially worked and the majority of my carefully selected graphics didn't display.

A very broken powerpoint presentation

There was thus a certain amount of 'winging it'!

This did however allow me to make the point that working with technology can be challenging as well as perhaps frustrating and exciting in equal measure!

Thursday, 29 March 2018

Digital preservation begins at home

A couple of things happened recently to remind me of the fact that I sometimes need to step out of my little bubble of digital preservation expertise.

It is a bubble in which I assume that everyone knows what language I'm speaking, in which everyone knows how important it is to back up your data, knows where their digital assets are stored, how big they might be and even what file formats they hold.

But in order to communicate with donors and depositors I need to move outside that bubble otherwise opportunities may be missed.

A disaster story

Firstly a relative of mine lost their laptop...along with all their digital photographs, documents etc.

I won't tell you who they are or how they lost it for fear of embarrassing them...

It wasn’t backed up...or at least not in a consistent way.

How can this have happened?

I am such a vocal advocate of digital preservation and do try and communicate outside my echo chamber (see for example my blog for International Digital Preservation Day "Save your digital stuff!") but perhaps I should take this message closer to home.

Lesson #1:

Digital preservation advocacy should definitely begin at home

When a back up is not a back up...

In a slightly delayed response to this sad event I resolved to help another family member ensure that their data was 'safe'. I was directed to their computer and a portable hard drive that is used as their back up. They confessed that they didn’t back up their digital photographs very often...and couldn’t remember the last time they had actually done so.

I asked where their files were stored on the computer and they didn’t know (well at least, they couldn’t explain it to me verbally).

They could however show me how they get to them, so from that point I could work it out. Essentially everything was in ‘My Documents’ or ‘My Pictures’.

Lesson #2:

Don’t assume anything. Just because someone uses a computer regularly it doesn’t mean they know where they put things.

Having looked firstly at what was on the computer and then what was on the hard drive it became apparent that the hard drive was not actually a ‘back up’ of the PC at all, but contained copies of data from a previous PC.

Nothing on the current PC was backed up and nothing on the hard drive was backed up.

There were however multiple copies of the same thing on the portable hard drive. I guess some people might consider that a back up of sorts but certainly not a very robust one.

So I spent a bit of time ensuring that there were 2 copies of everything (one on the PC and one on the portable hard drive) and promised to come back and do it again in a few months time.

Lesson #3:

Just because someone says they have 'a back up' it does not mean it actually is a back up.

Talking to donors and depositors

All of this made me re-evaluate my communication with potential donors and depositors.

Not everyone is confident in communicating about digital archives. Not everyone speaks the same language or uses the same words to mean the same thing.

In a recent example of this, someone who was discussing the transfer of a digital archive to the Borthwick talked about a 'database'. I prepared myself to receive a set of related tables of structured data alongside accompanying documentation to describe field names and table relationships, however, as the conversation evolved it became apparent that there was actually no database at all. The term database had simply been used to describe a collection of unstructured documents and images.

I'm taking this as a timely reminder that I should try and leave my assumptions behind me when communicating about digital archives or digital housekeeping practices from this point forth.

Thursday, 15 February 2018

Feel the love for digital archives!

Yesterday was Valentine's Day.

I spent most of the day at work thinking about advocacy for digital preservation. I've been pretty quiet this month, beavering away at a document that I hope might help persuade senior management that digital preservation matters. That digital archives are important. That despite their many flaws and problems, we should look after them as best we can.

Yesterday I also read an inspiring blog post by William Kilbride: A foot in the door is worth two on the desk. So many helpful messages around digital preservation advocacy in here but what really stuck with me was this:

"Digital preservation is not about data loss, it’s about coming good on the digital promise. It’s not about the digital dark age, it’s about a better digital future."

Perhaps we should stop focusing on how flawed and fragile and vulnerable digital archives are, but instead celebrate all that is good about them! Let's feel the love for digital archives!

So whilst cycling home (in the rain) I started thinking about Valentine's cards that celebrate digital archives. Then with a glass of bubbly in one hand and a pen in the other I sketched out some ideas.

Let's celebrate that obsolete media that is still in good working
order (against all odds)

Even file migration can be romantic..

A card to celebrate all that is great about Broadcast
WAV format

Everybody loves a well-formed XML file

I couldn't resist creating one for all you PREMIS fans out there

I was also inspired by a Library of Congress blog post by Abbie Grotke that I keep going back to: Dear Husband: I’m So Sorry for Your Data Loss. I've used these fabulous 'data loss' cards several times over the years to help illustrate the point that we need to look after our digital stuff.

I'm happy for you to use these images if you think they might help with your own digital preservation advocacy. An acknowledgement is always appreciated!

I don't think I'll give up my day job just yet though...

Best get back to the more serious advocacy work I have to do today.

Friday, 12 January 2018

New year, new tool - TeraCopy

For various reasons I'm not going to start 2018 with an ambitious to do list as I did in 2017 ...I've still got to do much of what I said I was going to do in 2017 and my desk needs another tidy!

In 2017 I struggled to make as much progress as I would have liked - that old problem of having too much to do and simply not enough hours in the day.

So it seems like a good idea to blog about a new tool I have just adopted this week to help me use the limited amount of time I've got more effectively!

The latest batch of material I've been given to ingest into the digital archive consists of 34 CD-ROMs and I've realised that my current ingest procedures were not as efficient as they could be. Virus checking, copying files over from 1 CD and then verifying the checksums is not very time consuming, but when you have to do this 34 times, you do start to wonder whether your processes could be improved!

In my previous ingest processes, copying files and then verifying checksums had been a two stage process. I would copy files over using Windows Explorer and then use FolderMatch to confirm (using checksums) that my copy was identical to the original.

But why use a two stage process when you can do it in one go?

The dialog that pops up when you copy
I'd seen TeraCopy last year whilst visiting The British Library (thanks Simon!) so decided to give it a go. It is a free file transfer utility with a focus on data integrity.

So, I've installed it on my PC. Now, whenever I try and copy anything in Windows it pops up and asks me whether I want to use TeraCopy to make my copy.

One of the nice things about this is that this will also pop up when you accidentally click and drop a directory into another directory in Windows Explorer (who hasn't done that at least once?) and gives you the opportunity to cancel the operation.

When you copy with TeraCopy it doesn't just copy the files for you, but also creates checksums as it goes along and then at the end of the process verifies that the checksums are the same as they were originally. Nice! You need to tweak the settings a little to get this to work.

TeraCopy busy copying some files for me and creating checksums as it goes

When copying and verifying is complete it tells you how many files it has
verified and shows matching checksums for both copies - job done!

So, this has made the task of copying data from 34 CDs into the digital archive a little bit less painful and has made my digital ingest process a little bit more efficient.

...and that from my perspective is a pretty good start to 2018!