Pages

Friday, 23 June 2017

Emulation for preservation - is it for me?

I’ve previously been of the opinion that emulation isn’t really for me.

I’ve seen presentations about emulation at conferences such as iPRES and it is fair to say that much of it normally goes over my head.

This hasn’t been helped by the fact that I’ve not really had a concrete use case for it in my own work - I find it so much easier to relate and engage to a topic or technology if I can see how it might be directly useful to me.

However, for a while now I’ve been aware that emulation is what all the ‘cool kids’ in the digital preservation world seem to be talking about. From the very migration heavy thinking of the 2000’s it appears that things are now moving in a different direction.

This fact first hit my radar at the 2014 Digital Preservation Awards where the University of Freiburg won the The OPF Award for Research and Innovation award for their work on Emulation as a Service with bwFLA Functional Long Term Archiving and Access.

So I was keen to attend the DPC event Halcyon, On and On: Emulating to Preserve to keep up to speed... not only because it was hosted on the doorstep in the centre of my home town of York!

It was an interesting and enlightening day. As usual the Digital Preservation Coalition did a great job of getting all the right experts in the room (sometimes virtually) at the same time, and a range of topics and perspectives were covered.

After an introduction from Paul Wheatley we heard from the British Library about their experiences of doing emulation as part of their Flashback project. No day on emulation would be complete without a contribution from the University of Freiburg. We had a thought provoking talk via WebEx from Euan Cochrane of Yale University Library and an excellent short film created by Jason Scott from the Internet Archive. One of the highlights for me was Jim Boulton talking about Digital Archaeology - and that wasn’t just because it had ‘Archaeology’ in the title (honest!). His talk didn’t really cover emulation, it related more to that other preservation strategy that we don’t talk about much anymore - hardware preservation. However, many of the points he raised were entirely relevant to emulation - for example, how to maintain an authentic experience, how you define what the significant properties of an item actually are and what decisions you have to make as a curator of the digital past. It was great to see how engaged the public were with his exhibitions and how people interacted with it.

Some of the themes of the day and take away thoughts for me:


  • Choosing the best strategy - It is not all about which preservation strategy to use it is more about how we can use them together - as Paul Wheatley pointed out - emulation is a good partner to migration as it can help you to test a migration strategy. The British Library showed off their lab of old hardware - they use this to check whether their emulators are working OK. As digital archivists we can (and should) use all of the tools at our disposal to make sure we are doing the job well.
  • A window of emulation opportunity? - Simon Whibley from the British Library mentioned that older material tends to emulate better than the more recent material they worked with. Later on in the day Euan Cochrane talked about the ways technology is rapidly moving forward (see for example The Internet of Things). This offers up new challenges for those working in digital preservation, whatever strategy they employ. Will there be a relatively small window of opportunity for emulation (from the 1980's to the 2000's)? Beyond that point, will it all get just too complex?
  • Software is a problem - setting up the emulation environments is easy (in that some people have this solved) but if you don’t have the necessary software to install in order to read your files then you are stuck. Obviously this is a thorny problem due to licencing and IPR and not one which has been systematically solved. The British Library have been ‘accidentally’ collecting software but this area continues to be a problematic one.
  • What constitutes an 'authentic experience'? - most of the presentations mentioned this idea of the authentic experience - ultimately this is what we are trying to provide. Simon Whibley asked whether an emulation that appears in full colour is authentic if it would have been monochrome on the original hardware? Jim Boulton mentioned that some of the artists he worked with wanted the bandwidth to be throttled on their historic websites to recreate the authentic speed (or lack of it!). Some of the emulators demonstrated over the course of the day also provided the original sounds of the operating system and this is an important element in providing an authentic experience. It isn't just about serving up the data.
Thinking about how this all relates to me and my work, I am immediately struck by two use cases.

Firstly research data - we are taking great steps forward in enabling this data to be preserved and maintained for the long term but will it be re-usable? For many types of research data there is no clear migration strategy. Emulation as a strategy for accessing this data ten or twenty years from now needs to be seriously considered. In the meantime we need to ensure we can identify the files themselves and collect adequate documentation - it is these things that will help us to enable reuse through emulators in the future.

Secondly, there are some digital archives that we hold at the Borthwick Institute from the 1980's. For example I have been working on a batch of WordStar files in my spare moments over the last few years. I'd love to get a contemporary emulator fired up and see if I could install WordStar and work with these files in their native setting. I've already gone a little way down the technology preservation route, getting WordStar installed on an old Windows 98 PC and viewing the files, but this isn't exactly contemporary. These approaches will help to establish the significant properties of the files and assess how successful subsequent migration strategies are....but this is a future blog post.

It was a fun event and it was clear that everybody loves a bit of nostalgia. Jim Boulton ended his presentation saying "There is something quite romantic about letting people play with old hardware".

We have come a long way and this is most apparent when seeing artefacts (hardware, software, operating systems, data) from early computing. Only this week whilst taking the kids to school we got into a conversation about floppy disks (yes, I know...). I asked the kids if they knew what they looked like and they answered "Yes, it is the save icon on the computer"(see Why is the save icon still a floppy disk?)...but of course they've never seen a real one. Clearly some obsolete elements of our computer history will remain in our collective consciousness for many years and perhaps it is our job to continue to keep them alive in some form.


Friday, 16 June 2017

A typical week as a digital archivist?

Sometimes (admittedly not very often) I'm asked what I actually do all day. So at the end of a busy week being a digital archivist I've decided to blog about what I've been up to.

Monday

Today I had a couple of meetings. One specifically to talk about digital preservation of electronic theses submissions. I've also had a work experience placement in this week so have set up a metadata creation task which he has been busy working on.

When I had a spare moment I did a little more testing work on the EAD harvesting feature the University of York is jointly sponsoring Artefactual Systems to develop in AtoM. Testing this feature from my perspective involves logging into the test site that Artefactual has created for us and tweaking some of the archival descriptions. Once those descriptions are saved, I can take a peek at the job scheduler and make sure that new EAD files are being created behind the scenes for the Archives Hub to attempt to harvest at a later date.

This piece of development work has been going on for a few months now and communications have been technically quite complex so I'm also trying to ensure all the organisations involved are happy with what has been achieved and will be arranging a virtual meeting so we can all get together and talk through any remaining issues.

I was slightly surprised today to have a couple of requests to talk to the media. This has sprung from the news that the Queen's Speech will be delayed. One of the reasons for the delay relates to the fact that the speech has to be written on goat's skin parchment, which takes a few days to dry. I had previously been interviewed for a article entitled Why is the UK still printing its laws on vellum? and am now mistaken for someone who knows about vellum. I explained to potential interviewers that this is not my specialist subject!

Tuesday

In the morning I went to visit a researcher at the University of York. I wanted to talk to him about how he uses Google Drive in relation to his research. This is a really interesting topic to me right now as I consider how best we might be able to preserve current research datasets. Seeing how exactly Google Drive is used and what features the researcher considers to be significant (and necessary for reuse) is really helpful when thinking about a suitable approach to this problem. I sometimes think I work a little bit too much in my own echo chamber, so getting out and hearing different perspectives is incredibly valuable.

Later that afternoon I had an unexpected meeting with one of our depositors (well, there were two of them actually). I've not met them before but have been working with their data for a little while. In our brief meeting it was really interesting to chat and see the data from a fresh perspective. I was able to reunite them with some digital files that they had created in the mid 1980's, had saved on to floppy disk and had not been able to access for a long time.

Digital preservation can be quite a behind the scenes sort of job - we always give a nod to the reason why we do what we do (ie: we preserve for future reuse), but actually seeing the results of that work unfold in front of your eyes is genuinely rewarding. I had rescued something from the jaws of digital obsolescence so it could now be reused and revitalised!

At the end of the day I presented a joint webinar for the Open Preservation Foundation called 'PRONOM in practice'. Alongside David Clipsham (The National Archives) and Justin Simpson (Artefactual Systems), I talked about my own experiences with PRONOM, particularly relating to file signature creation, and ending with a call to arms "Do try this at home!". It would be great if more of the community could get involved!

I was really pleased that the webinar platform worked OK for me this time round (always a bit stressful when it doesn't) and that I got to use the yellow highlighter pen on my slides.

In my spare moments (which were few and far between), I put together a powerpoint presentation for the following day...

Wednesday

I spent the day at the British Library in Boston Spa. I'd been invited to speak at a training event they regularly hold for members of staff who want to find out a bit more about digital preservation and the work of the team.

I was asked specifically to talk through some of the challenges and issues that I face in my work. I found this pretty easy - there are lots of challenges - and I eventually realised I had too many slides so had to cut it short! I suppose that is better than not having enough to say!

Visiting Boston Spa meant that I could also chat to the team over lunch and visit their lab. They had a very impressive range of old computers and were able to give me a demonstration of Kryoflux (which I've never seen in action before) and talk a little about emulation. This was a good warm up for the DPC event about emulation I'm attending next week: Halcyon On and On: Emulating to Preserve.

Still left on my to do list from my trip is to download Teracopy. I currently use Foldermatch for checking that files I have copied have remained unchanged. From the quick demo I saw at the British Library I think that Teracopy would be a more simple one step solution. I need to have a play with this and then think about incorporating it into the digital ingest workflow.

Sharing information and collaborating with others working in the digital preservation field really is directly beneficial to the day to day work that we do!

Thursday

Back in the office today and a much quieter day.

I extracted some reports from our AtoM catalogue for a colleague and did a bit of work with our test version of Research Data York. I also met with another colleague to talk about storing and providing access to digitised images.

In the afternoon I wrote another powerpoint presentation, this time for a forthcoming DPC event: From Planning to Deployment: Digital Preservation and Organizational Change.

I'm going to be talking about our experiences of moving our Research Data York application from proof of concept to production. We are not yet in production and some of the reasons why will be explored in the presentation! Again I was asked to talk about barriers and challenges and again, this brief is fairly easy to fit! The event itself is over a week away so this is unprecedentedly well organised. Long may it continue!


Friday

On Fridays I try to catch up on the week just gone and plan for the week ahead as well as reading the relevant blogs that have appeared over the week. It is also a good chance to catch up with some admin tasks and emails.

Lunch time reading today was provided by William Kilbride's latest blog post. Some of it went over my head but the final messages around value and reuse and the need to "do more with less" rang very true.

Sometimes I even blog myself - as I am today!




Was this a typical week - perhaps not, but in this job there is probably no such thing! Every week brings new ideas, challenges and surprises!

I would say the only real constant is that I've always got lots of things to keep me busy.