Digital Archiving at the University of York: upgrade

Showing posts with label upgrade. Show all posts

Friday, 8 June 2018

An imperfect migration story

Over the past six years as a digital archivist at the Borthwick Institute I have carried out a very very small amount of file migration. The focus here has been on getting things 'safe', backed up and documented (along with running a few tools to find out what exactly we have and ensure that what we have doesn't change).

I've been deliberately avoiding file migration because:

there is little time to do this sort of stuff
we don't have a digital archiving system in place
we don't have a means to record the PREMIS metadata about the migrations (and who wants to create PREMIS by hand?)

The catalyst for a file migration

Recently I had to update my work PC to Windows 10.

Whereas colleagues might be able to just set this upgrade off and get it done while they had lunch, I left myself a big chunk of time to try and manage the process. As a digital archivist I have downloaded and installed lots of tools to help me do my job - some I rely on quite heavily to help me ingest digital content, monitor files over time and understand the born digital archives that I work with.

So, I wanted to spend some time capturing all the information about the tools I use and how I have them set up before I can upgrade, and then more time post-upgrade to get them all installed and configured again.

...so with a bit of thought and preparation, everything should be fine...shouldn't it?

Well it turns out everything wasn't fine.

Backwards compatibility is not always guaranteed

One of the tools I rely on and have blogged about previously is Quick View Plus. I have been using Quick View Plus version 12 for the last 6 years and it is a great tool for viewing a range of files that I might not have the software to read otherwise.

In particular it was invaluable in allowing me to access and identify a set of WordStar 4.0 files from the Marks and Gran archive. These files were not accessible through any of the other software that was available to me (apart from in a version of WordStar I have installed on an old Windows 98 PC that I keep under my desk for special occasions).

But when I tried to install Quick View Plus 12 on my PC after upgrading to Windows 10 I discovered it was not compatible with Windows 10.

This was an opportunity to try out a newer version of the Quick View Plus software, so I duly downloaded an evaluation copy of Quick View Plus 2017. My first impressions were good. It seemed the tool had come along a bit in the last few years and there was some nice new functionality around the display of metadata (a potential big selling point for digital archivists).

However, when I tried to open some of the 120 or so WordStar files we have in our digital archive I discovered they were no longer supported.

They were no longer identified as WordStar 4.0.

They were no longer displaying correctly in the viewer.

They looked just like they do in a basic text processing application

...which isn't ideal because as described in the PRONOM record for WordStar 4.0 files:

"On the surface it's a plain text file, however the format 'shifts' the last byte of each word. Effectively it is 'flipping' the first bit of the ASCII character from 0 to 1. so a lower case 'r' (hex value 0x72) becomes 'ò' (hex value 0xF2); lower case 'd' (hex 0x64) becomes 'ä' (hex 0xE4) and so on."

This means that viewing a WordStar file in an application that doesn't interpret and decode this behaviour can be a bit taxing for the brain.

Having looked back at the product description for Quick View Plus 2017 I discovered that WordStar for DOS is one of their supported file formats. It seems this functionality had not been intentionally deprecated.

I emailed Avantstar Customer Technical Support to report this issue and with a bit of testing they confirmed my findings. However, they were not able to tell me whether this would be fixed or not in a future release.

A 'good enough' rescue

This prompted me to kick off a little rescue mission. Whilst we still had one or two computers in the building on Windows 7, I installed Quick View Plus 12 on one of them and started a colleague off on a basic file migration task to ensure we have a copy of the files that can be more easily accessed on current software.

A two-pronged attack using limited resources is described below:

Open file in QVP12 and print to PDF/A-1b. This appears to effectively capture the words on the page, the layout and the pagination of the document as displayed in QVP12.
Open file in QVP12, select all the text and copy and paste into MS Word (keeping the source formatting). The file is then saved as DOCX. Although this doesn’t maintain the pagination, it does effectively capture the content and some of the formatting of the document in a reusable format and gives us an alternative preservation version of the document that we can work with in the future.

Files were saved with the same names as the originals (including the use of SHOUTY 1980's upper case) but with new file extensions. Original file extensions were also captured in the names of these migrated files. This is because (as described in a previous post) users of early WordStar for DOS packages were encouraged to make use of the 3 character file extension to add additional contextual information related to the file (gulp!).

The methodology was fully documented and progress has been noted on a spreadsheet. In the absence of a system for me to record PREMIS metadata, all of this information will be stored alongside the migrated files in the digital archive.

Future work

We've still got some work to do. For example some spot checking against the original files in their native WordStar environment - I believe that the text has been captured well but that there are a few formatting issues that I'd like to investigate.

I'd also like to use VeraPDF to check whether the PDF/A files that we have created are actually valid (am keeping my fingers firmly crossed!).

This was possibly not the best thought out migration strategy but as there was little time available my focus was to come up with a methodology that was 'good enough' for enabling continued access to the content of these documents. Of course the original files are also retained and we can go back to these at any time to carry out further (better?) migrations in the future.*

In the meantime, a follow up e-mail from Avantstar Technical Support has given me an alternative solution. Apparently, Quick View Plus version 13 (which our current licence for version 12 enables us to install at no extra cost) is compatible with Windows 10 and will enable me to continue to view WordStar 4.0 files on my PC. Good news!

* I'm very interested in the work carried out at the National Library of New Zealand to convert WordStar to HTML and would be interested in exploring this approach at a later date if resources allow.

Jenny Mitcham, Digital Archivist

Friday, 4 May 2018

The anatomy of an AtoM upgrade

Yesterday we went live with our new upgraded production version of AtoM.

We've been using AtoM version 2.2 since we first unveiled the Borthwick Catalogue to the world two years ago. Now we have finally taken the leap to version 2.4.

We are thrilled to benefit from some of the new features - including the clipboard, being able to search by date range and the full width treeview. Of course we are also keen to test the work we jointly sponsored last year around exposing EAD via OAI-PMH for harvesting.

But what has taken us so long you might ask?

...well, upgrading AtoM has been a new experience for us and one that has involved a lot of planning behind the scenes. The technical process of upgrading has been ably handled by our systems administrator. Much of his initial work behind the scenes has been on 'puppetising' AtoM to make it easier to manage multiple versions of AtoM going forward. In this post though I will focus on the less technical steps we have taken to manage the upgrade and the decisions we have made along the way.

Checking the admin settings

One of the first things I did when I was given a test version of 2.4 to play with was to check out all of the admin settings to see what had changed.

All of our admin settings for AtoM are documented in a spreadsheet alongside a rationale for our decisions. I wanted to take some time to understand the new settings, read the documentation and decide what would work for us.

Some of these decisions were taken to a meeting for a larger group of staff to discuss. I've got a good sense of how we use AtoM but I am not really an AtoM user so it was important that others were involved in the decision making.

Most decisions were relatively straightforward and uncontroversial but the one that we spent most time on was deciding whether or not to change the slugs...

Slugs

In AtoM, the 'slug' is the last element of the url for each individual record within the catalogue - it has to be unique so that all the urls go to the right place. In previous versions of AtoM the slugs were automatically generated from the title of each record. This led to some interesting and varied urls.

Some of them were really long - if the title of the record was really long
Some of them were short and very cryptic - if the record hadn't been given a title prior to the first save
Many of our titles are not unique - for example, we have lots of records simply called 'correspondence' in the catalogue. Where titles are not unique, AtoM will use the title and then append it with a number in order to create a unique slug (eg: correspondence-150)

Slugs are therefore hard to predict ...and it is not always possible to look at a slug and know which archive it refers to.

This possibly doesn't matter, but could become an issue for us in the future should we wish to carry out more automated data manipulation or system integrations.

AtoM 2.4 now allows you to choose which fields your slugs are generated from. We have decided that it would be better if ours were generated from the identifier of the record rather than the title. The reason being that identifiers are generally quite short and sweet and of course should be unique (though we recently realised that this isn't enforced in AtoM).

But of course this is not a decision that can be taken lightly. Our catalogue has been live for 2 years now and users will have set up links and bookmarks to particular records within it. On balance we decided that it would be better to change the slugs and do our best to limit the impact on users.

So, we have changed the admin setting to ensure future slugs are generated using the identifier. We have run a script provided by Artefactual Systems that changed all the slugs that are already in the database. We have set up a series of redirects from all the old urls of top level descriptions in the catalogue to the new urls (note that having had a good look at the referrer report in Google Analytics it was apparent that external links to the catalogue generally point at top level descriptions).

Playing and testing

It was important to do a certain amount of testing and playing around with AtoM 2.4 and it was important that it wasn't just myself who did this - I encouraged all my colleagues to also have a go.

First I checked the release notes for versions 2.3 and 2.4 so I had a good sense of what had changed and where I should focus my attention. I was then able to test these new features and direct colleagues to them as appropriate for further testing or discussion.

While doing so, I tried to think about whether any of these changes would necessitate changes in our workflows and processes or updates to our staff handbook.

As an example - it was noted that there was a new field to record occupations for authority records. Rather than letting individuals to decide how to use this field, it is important to agree an institutional approach and consider an appropriate methodology or taxonomy. As it happens, we have decided not to use this field for the time being and this will be documented accordingly.

Assessing known bugs

Being a bit late to the upgrade party gives us the opportunity to assess known bugs and issues with a release. I spent some time looking at Artefactual's issues log for AtoM and establish if any of them were going to cause us major problems or required a workaround to be put in place.

There are lots of issues recorded and I looked through many of them (but not all!). Fortunately, very few looked like they would have an impact on us. Most related to functionality we don't utilise - such as the ability to use AtoM with multiple institutions or translate it into multiple languages.

The one bug that I thought would be irritating for us was related to the accessions counter which was not incrementing in version 2.4. Having spent a bit of time testing, it seemed that this wasn't a deal breaker for us and there was a workaround we could put in place to enable staff to continue to create accession records with a unique identifier relatively easily.

Testing local workarounds

Next I tested one of the local workarounds we have for AtoM. We use a CSS print stylesheet to help us to generate an accessions report to send donors and depositors to confirm receipt of an archive. This still worked in the new version of AtoM with no issues. Hoorah!

Look and feel

We gave a bit of thought to how AtoM should be styled. Two years ago we went live with a slightly customised version of the Dominion theme. This had been styled to look similar to our website (which at the time was branded orange).

In the last year, the look and feel of the University website has changed and we are no longer orange! Some thought needed to be given to whether we should change the look of our catalogue now to keep it consistent with our website. After some discussion it was agreed that our existing AtoM theme should be maintained for the time being.

We did however think it was a good idea to adopt the font of the University website, but when we tested this out on our AtoM instance it didn't look as clear...so that decision was quickly reversed.

Usability testing

When we first launched our catalogue we carried out a couple of rounds of user testing (read about it here and here) but this was quite a major piece of work and took up a substantial amount of staff time.

With this upgrade we were keen to give some consideration to the user experience but didn't have resource to invest in more user testing.

Instead we recruited the Senior User Experience Designer at our institution to cast his eye over our version of AtoM 2.4 and give us some independent feedback on usability and accessibility. It was really useful to get a fresh pair of eyes to look at our site, but as this could be a whole blog post in itself so I won't say anymore here...watch this space!

Updating our help pages

Another job was to update both the text and the screenshots on our static help pages within AtoM. There have been several changes since 2.2 and some of these are reflected in the look and feel of the interface.

The advanced search looks a bit different in version 2.4 - here is the refreshed screenshot for our help pages

We were also keen to add in some help for our users around the clipboard feature and to explain how the full width treeview works.

The icons for different types of information within AtoM have also been brought out more strongly in this version, so we also wanted to flag up what these meant for our users.

...and that reminds me, we really do need a less Canada-centric way to indicate place!

Updating our staff handbook

Since we adopted AtoM a few years ago we have developed a whole suite of staff manuals which record how we use AtoM, including tips for carrying out certain procedures and information about what to put in each field. With the new changes brought in with this upgrade, we of course had to update our internal documentation.

When to upgrade?

As we drew ever closer to our 'go live' date for the upgrade we were aware that Artefactual were busy preparing their 2.4.1 bug fix release. We were very keen to get the bug fixes (particularly for that accessions counter bug that I mentioned) but were not sure how long we were prepared to wait.

Luckily with helpful advice from Artefactual we were able to follow some instructions from the user forum and install from the GitHub code repository instead of the tarball download on the website. This meant we could benefit from those bug fixes that were already stable (and pull others to test as they become available) without having to wait for the formal 2.4.1 release.

No need to delay our upgrade further!

As it happens it was good news we upgraded when we did. The day before the upgrade we hit a bug in version 2.2 during a re-index of elasticsearch. Nice to know we had a nice clean version of 2.4 ready to go the next day!

Finishing touches

On the 'go live' date we'd put word around to staff not to edit the catalogue while we did the switch. Our systems administrator got all the data from our production version of 2.2 freshly loaded into 2.4, ran the scripts to change the slugs and re-indexed the database. I just needed to do a few things before we asked IT to do the Domain Name System switch.

First I needed to check all the admin settings were right - a few final tweaks were required here and there. Second I needed to load up the Borthwick logo and banner to our archival institution record. Thirdly I needed to paste the new help and FAQ text into the static pages (I already had this prepared and saved elsewhere).

Once the DNS switch was done we were live at last!

Sharing the news

Of course we wanted to publicise the upgrade to our users and tell them about the new features that it brings.

We've put AtoM back on the front page of our website and added a news item.

Let's tell the world all about it, with a catalogue banner and news item

My colleague has written a great blog post aimed at our users and telling them all about the new features, and of course we've all been enthusiastically tweeting!

...and a whole lot of tweeting

Future work

The upgrade is done but work continues. We need to ensure harvesting to our library catalogue still works and of course test out the new EAD harvesting functionality. Later today we will be looking at Search Engine Optimisation (particularly important since we changed our slugs). We also have some remaining tasks around finding aids - uploading pdfs of finding aids for those archives that aren't yet fully catalogued in AtoM using the new functionality in 2.4.

But right now I've got a few broken links to fix...

Jenny Mitcham, Digital Archivist

Friday, 20 April 2018

The 2nd UK AtoM user group meeting

I was pleased to be able to host the second meeting of the UK AtoM user group here in York at the end of last week. AtoM (or Access to Memory) is the Archival Management System that we use here at the Borthwick Institute and it seems to be increasing in popularity across the UK.

We had 18 attendees from across England, Scotland and Wales representing both archives and service providers. It was great to see several new faces and meet people at different stages of their AtoM implementation.

We started off with introductions and everyone had the chance to mention one recent AtoM triumph and one current problem or challenge. A good way to start the conversation and perhaps a way of considering future development opportunities and topics for future meetings.

Here is a selection of the successes that were mentioned:

Establishing a search facility that searches across two AtoM instances
Getting senior management to agree to establishing AtoM
Getting AtoM up and running
Finally having an online catalogue
Working with authority records in AtoM
Working with other contributors and getting their records displaying on AtoM
Using the API to drive another website
Upgrading to version 2.4
Importing legacy EAD into AtoM
Uploading finding aids into AtoM 2.4
Adding 1000+ urls to digital resources into AtoM using a set of SQL update statements

...and here are some of the current challenges or problems users are trying to solve:

How to bar code boxes - can this be linked to AtoM?
Moving from CALM to AtoM
Not being able to see the record you want to link to when trying to select related records
Using the API to move things into an online showcase
Advocacy for taking the open source approach
Working out where to start and how best to use AtoM
Sharing data with the Archives Hub
How to record objects alongside archives
Issues with harvesting EAD via OAI-PMH
Building up the right level of expertise to be able to contribute code back to AtoM
Working out what to do when AtoM stops working
Discovering that AtoM doesn't enforce uniqueness in identifiers for archival descriptions

After some discussion about some of the issues that had been raised, Louise Hughes from the University of Gloucestershire showed us her catalogue and talked us through some of the decisions they had made as they set this up.

The University of Gloucestershire's AtoM instance

She praised the digital object functionality and has been using this to add images and audio to the archival descriptions. She was also really happy with the authority records, in particular, being able to view a person and easily see which archives relate to them. She discussed ongoing work to enable records from AtoM to be picked up and displayed within the library catalogue. She hasn't yet started to use AtoM for accessioning but hopes to do so in the future. Adopting all the functionality available within AtoM needs time and thought and tackling it one step at a time (particularly if you are a lone archivist) makes a lot of sense.

Tracy Deakin from St John's College, Cambridge talked us through some recent work to establish a shared search page for their two institutional AtoM instances. One holds the catalogue of the college archives and the other is for the Special Collections Library. They had taken the decision to implement two separate instances of AtoM as they required separate front pages and the ability to manage the editing rights separately. However, as some researchers will find it helpful to search across both instances a search page has been developed that accesses the Elasticsearch index of each site in order to cross search.

The interface for a shared search across St John's College AtoM sites

Vicky Phillips from the National Library of Wales talked us through their processes for upgrading their AtoM instance to version 2.4 and discussed some of the benefits of moving to 2.4. They are really happy to have the full width treeview and the drag and drop functionality within it.

The upgrade has not been without it's challenges though. They have had to sort out some issues with invalid slugs, ongoing issues due to the size of some of their archives (they think the XML caching functionality will help with this) and sometimes find that MySQL gets overwhelmed with the number of queries and needs a restart. They still have some testing to do around bilingual finding aids and have also been working on testing out the new functionality around OAI PMH harvesting of EAD.

Following on from this I gave a presentation on upgrading AtoM to 2.4 at the Borthwick Institute. We are not quite there yet but I talked about the upgrade plan and process and some decisions we have made along the way. I won't say any more for the time being as I think this will be the subject of a future blog post.

Before lunch my colleague Charles Fonge introduced VIAF (Virtual International Authority File) to the group. This initiative will enable Authority Records created by different organisations across the world to be linked together more effectively. Several institutions may create an authority record about the same individual and currently it is difficult to allow these to be linked together when data is aggregated by services such as The Archives Hub. It is worth thinking about how we might use VIAF in an AtoM context. At the moment there is no place to store a VIAF ID in AtoM and it was agreed this would be a useful development for the future.

After lunch Justine Taylor from the Honourable Artillery Company introduced us to the topic of back up and disaster recovery of AtoM. She gave the group some useful food for thought, covering techniques and the types of data that would need to be included (hint: it's not solely about the database). This was particularly useful for those working in small institutions who don't have an IT department that just does all this for them as a matter of course. Some useful and relevant information on this subject can be found in the AtoM documentation.

Max Communications are a company who provide services around AtoM. They talked through some of their work with institutions and what services they can offer. As well as being able to provide hosting and support for AtoM in the UK, they can also help with data migration from other archival management systems (such as CALM). They demonstrated their crosswalker tool that allows archivists to map structured data to ISAD(G) before import to AtoM.

They showed us an AtoM theme they had developed to allow Vimeo videos to be embedded and accessible to users. Although AtoM does have support for video, the files can be very large in size and there are large overheads involved in running a video server if substantial quantities are involved. Keeping the video outside of AtoM and managing the permissions through Vimeo provided a good solution for one of their clients.

They also demonstrated an AtoM plugin they had developed for Wordpress. Though they are big fans of AtoM, they pointed out that it is not the best platform for creating interesting narratives around archives. They were keen to be able to create stories about archives by pulling in data from AtoM where appropriate.

At the end of the meeting Dan Gillean from Artefactual Systems updated us (via Skype) about the latest AtoM developments. It was really interesting to hear about the new features that will be in version 2.5. Note, that none of this is ever a secret - Artefactual make their road map and release notes publicly available on their wiki - however it is still helpful to hear it enthusiastically described.

The group was really pleased to hear about the forthcoming audit logging feature, the clever new functionality around calculating creation dates, and the ability for users to save their clipboard across sessions (and share them with the searchroom when they want to access the items). Thanks to those organisations that are funding this exciting new functionality. Also worth a mention is the slightly less sexy, but very valuable work that Artefactual is doing behind the scenes to upgrade Elasticsearch.

Another very useful meeting and my thanks go to all who contributed. It is certainly encouraging to see the thriving and collaborative AtoM community we have here in the UK.

Our next meeting will be in London in the autumn.

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Friday, 8 June 2018

An imperfect migration story

The catalyst for a file migration

Backwards compatibility is not always guaranteed

A 'good enough' rescue

Future work

Friday, 4 May 2018

The anatomy of an AtoM upgrade

Checking the admin settings

Slugs

Playing and testing

Assessing known bugs

Testing local workarounds

Look and feel

Usability testing

Updating our help pages

Updating our staff handbook

When to upgrade?

Finishing touches

Sharing the news

Future work

Friday, 20 April 2018

The 2nd UK AtoM user group meeting

The sustainability of a digital preservation blog...

Twitter

Subscribe