Monday 19 November 2018

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me.

Oh the irony of spending much of the last couple of months fretting about the future preservation of my digital preservation blog...!

I have invested a fair bit of time over the last 6 or so years writing posts for this blog. Not only has it been useful for me as a way of documenting what I've done or what I've learnt, but it has also been of interest to a much wider audience.

The most viewed posts have been on the topic of preserving Google Drive formats and disseminating the outputs of the Filling the Digital Preservation Gap project. Access statistics show that the audience is truly international.

When I decided to accept a job elsewhere I was of course concerned about what would happen to my blog. I hoped that all would be well, given that Blogger is a Google supported solution and part of the suite of Google tools that University of York staff and students use. But what would happen when my institutional Google account was closed down?

Initially I believed that as long as I handed over ownership of the blog to another member of staff who remained at the University, then all would be well. However, I soon realised that there were going to be some bigger challenges.

The problem

Once I leave the institution and my IT account is closed, Blogger will no longer have a record of who I am.

All posts that have been written my me will be marked as 'Unknown'. They will no longer have my name automatically associated with them. Not ideal from my perspective and also not ideal for anyone who might want to cite the blog posts in the future.

The other problem is the fact that once my account is closed down, all images within blog posts that I have posted will completely disappear.

This is pretty bad news!

When a member of staff adds images to a blog post the usual method of doing this is to select an image from the local PC or network drive. What Google then does is stores a copy of that image in https://get.google.com/albumarchive/ (in a location that is tied to that individual's account). When the account is closed, all of these blog related images are also wiped. The images are not recoverable.

So, I could make copies of all my images now and hand them to my colleagues, so that they could put them all back in again once I leave...but who is going to want to do that?

A solution of sorts

I asked IT Support to help me, and a colleague has had some success at extracting the contents of my blog, amending the image urls in the XML and importing the posts back into a test Blogger account with images hosted in a location that isn't associated with an individual staff account.

There is a description of how this result was achieved here and I'm hugely grateful for all of the time that was spent trying to fix this problem for me.

The XML was also amended directly to add the words 'Jenny Mitcham, Digital Archivist' to the end of every blog post, to save me having to open each of the 120 posts in turn and adding my name to them. That was a big help too.

So, in my last couple of weeks at work I have been experimenting with importing the tweaked XML file back into blogger.

Initially, I just imported the XML file back into the blog without deleting the original blog posts. I had understood that the imported blogs would merge with the original ones and that all would be well. Unfortunately though, I ended up with two versions of each blog post - the original one and the new one at a slightly different url.

So, I held my breath, took the plunge and deleted everything and then re-imported the amended XML.

I had envisaged that the imported blog posts would be assigned their original urls but was disappointed to see that this was not the case. Despite the url being included within the XML, blogger clearly had a record that these urls had already been used and would not re-use them.

I know some people link to the blog posts from other blogs and websites. I also interlink between blog posts from within the blog, so a change to all the urls will lead to lots of broken links. Bad news!

I tried going into individual posts and changing the permalink by hand back to the original link, but Blogger would not accept this and kept adding a different number to the end of the url to ensure it did not replicate the url of one of my deleted posts. Hugely frustrating!

Luckily my colleague in IT came up with an alternative solution, adding some clever code into the header of the blog which carries out a search every time a page is requested. This seems to work well, serving up one or more posts based on the url that is requested. Being that the new urls are very similar to the old ones (essentially the same but with some numbers added to the end), the search is very effective and the right post is served up at the top of the page. Hopefully this will work for the foreseeable future and should lead to minimal impact for users of the blog.


Advice for Blogger users

If you are using Blogger from an institutional Google account, think about what will happen to your posts after your account is closed down.

There are a few things you can do to help future proof the blog:
  • Host images externally in a location that isn't tied to your institutional account - for example a Google Team Drive or an institutional website - link to this location from the blog post rather than uploading images directly.
  • Ensure that your name is associated with the blog posts you write by hard coding it in to the text of your blog post - don't rely on blogger knowing who you are forever.
  • Ensure that there are others who have administrative control of the blog so that it continues after your account has been closed.
And lastly - if just starting out, consider using a different blogging platform. Presumably they are not all this unsustainable...!

Apologies...

Unfortunately, with the tweak that has been made to how the images are hosted and pulled in to the posts, some of them appear to have degraded in quality. I began to edit each post and resizing the images (which appears to fix the problem) but have run out of time to work through 120 posts before my account is closed.

Generally, if an image looks bad in the blog, you can see a clearer version of it if you click on it so this isn't a disaster.

Also, there may be some images that are out of place - I have found (and fixed) one example of this but have not had time to check all of them.

Apologies to anyone who subscribes to this blog - I understand you may have had a lot of random emails as a result of me re-importing or republishing blog posts over the last few weeks!

Thanks to...

As well as thanking Tom Smith at the University of York for his help with fixing the blog, I'd also like to thank the Web Archiving team at the British Library who very promptly harvested my blog before we started messing around with it. Knowing that it was already preserved and available within a web archive did give me comfort as I repeatedly broke it!

A plea to Google

Blogger could (and should) be a much more sustainable blogging platform. It should be able to handle situations where someone's account closes down. It should be possible to make the blogs (including images) more portable. It should be possible for an institution to create a blog that can be handed from one staff member to another without breaking it. A blog should be able to outlive its primary author.

I genuinely don't think these things would be that hard for a clever developer at Google to fix. The current situation creates a very real headache for those of us who have put a lot of time and effort into creating content within this platform.

It really doesn't need to be this way!


No comments:

Post a Comment

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...