Friday, 8 June 2018

An imperfect migration story

Over the past six years as a digital archivist at the Borthwick Institute I have carried out a very very small amount of file migration. The focus here has been on getting things 'safe', backed up and documented (along with running a few tools to find out what exactly we have and ensure that what we have doesn't change).

I've been deliberately avoiding file migration because:

  1. there is little time to do this sort of stuff 
  2. we don't have a digital archiving system in place
  3. we don't have a means to record the PREMIS metadata about the migrations (and who wants to created PREMIS by hand?)

The catalyst for a file migration

Recently I had to update my work PC to Windows 10.

Whereas colleagues might be able to just set this upgrade off and get it done while they had lunch, I left myself a big chunk of time to try and manage the process. As a digital archivist I have downloaded and installed lots of tools to help me do my job - some I rely on quite heavily to help me ingest digital content, monitor files over time and understand the born digital archives that I work with.

So, I wanted to spend some time capturing all the information about the tools I use and how I have them set up before I can upgrade, and then more time post-upgrade to get them all installed and configured again. with a bit of thought and preparation, everything should be fine...shouldn't it?

Well it turns out everything wasn't fine.

Backwards compatibility is not always guaranteed

One of the tools I rely on and have blogged about previously is Quick View Plus. I have been using Quick View Plus version 12 for the last 6 years and it is a great tool for viewing a range of files that I might not have the software to read otherwise.

In particular it was invaluable in allowing me to access and identify a set of WordStar 4.0 files from the Marks and Gran archive. These files were not accessible through any of the other software that was available to me (apart from in a version of WordStar I have installed on an old Windows 98 PC that I keep under my desk for special occasions).

But when I tried to install Quick View Plus 12 on my PC after upgrading to Windows 10 I discovered it was not compatible with Windows 10.

This was an opportunity to try out a newer version of the Quick View Plus software, so I duly downloaded an evaluation copy of Quick View Plus 2017. My first impressions were good. It seemed the tool had come along a bit in the last few years and there was some nice new functionality around the display of metadata (a potential big selling point for digital archivists).

However, when I tried to open some of the 120 or so WordStar files we have in our digital archive I discovered they were no longer supported.

They were no longer identified as WordStar 4.0.

They were no longer displaying correctly in the viewer.

They looked just like they do in a basic text processing application

...which isn't ideal because as described in the PRONOM record for WordStar 4.0 files:

"On the surface it's a plain text file, however the format 'shifts' the last byte of each word. Effectively it is 'flipping' the first bit of the ASCII character from 0 to 1. so a lower case 'r' (hex value 0x72) becomes 'ò' (hex value 0xF2); lower case 'd' (hex 0x64) becomes 'ä' (hex 0xE4) and so on."

This means that viewing a WordStar file in an application that doesn't interpret and decode this behaviour can be a bit taxing for the brain.

Having looked back at the product description for Quick View Plus 2017 I discovered that WordStar for DOS is one of their supported file formats. It seems this functionality had not been intentionally deprecated.

I emailed Avantstar Customer Technical Support to report this issue and with a bit of testing they confirmed my findings. However, they were not able to tell me whether this would be fixed or not in a future release.

A 'good enough' rescue

This prompted me to kick off a little rescue mission. Whilst we still had one or two computers in the building on Windows 7, I installed Quick View Plus 12 on one of them and started a colleague off on a basic file migration task to ensure we have a copy of the files that can be more easily accessed on current software.

A two-pronged attack using limited resources is described below:

  • Open file in QVP12 and print to PDF/A-1b. This appears to effectively capture the words on the page, the layout and the pagination of the document as displayed in QVP12.
  • Open file in QVP12, select all the text and copy and paste into MS Word (keeping the source formatting). The file is then saved as DOCX. Although this doesn’t maintain the pagination, it does effectively capture the content and some of the formatting of the document in a reusable format and gives us an alternative preservation version of the document that we can work with in the future.
Files were saved with the same names as the originals (including the use of SHOUTY 1980's upper case) but with new file extensions. Original file extensions were also captured in the names of these migrated files. This is because (as described in a previous post) users of early WordStar for DOS packages were encouraged to make use of the 3 character file extension to add additional contextual information related to the file (gulp!).

The methodology was fully documented and progress has been noted on a spreadsheet. In the absence of a system for me to record PREMIS metadata, all of this information will be stored alongside the migrated files in the digital archive.

Future work

We've still got some work to do. For example some spot checking against the original files in their native WordStar environment - I believe that the text has been captured well but that there are a few formatting issues that I'd like to investigate.

I'd also like to use VeraPDF to check whether the PDF/A files that we have created are actually valid (am keeping my fingers firmly crossed!).

This was possibly not the best thought out migration strategy but as there was little time available my focus was to come up with a methodology that was 'good enough' for enabling continued access to the content of these documents. Of course the original files are also retained and we can go back to these at any time to carry out further (better?) migrations in the future.*

In the meantime, a follow up e-mail from Avantstar Technical Support has given me an alternative solution. Apparently, Quick View Plus version 13 (which our current licence for version 12 enables us to install at no extra cost) is compatible with Windows 10 and will enable me to continue to view WordStar 4.0 files on my PC. Good news!

* I'm very interested in the work carried out at the National Library of New Zealand to convert WordStar to HTML and would be interested in exploring this approach at a later date if resources allow.

No comments:

Post a Comment