Showing posts with label disaster. Show all posts
Showing posts with label disaster. Show all posts

Friday, 20 April 2018

Back to the classroom - the Domesday project

Yesterday I was invited to speak to a local primary school about my job. The purpose of the event was to inspire kids to work in STEM subjects (science, technology, engineering and maths) and I was faced with an audience of 10 and 11 year old girls.

One member of the audience (my daughter) informed me that many of the girls were only there because they had been bribed with cake.

This could be a tough gig!

On a serious note, there is a huge gender imbalance in STEM careers with women only making up 23% of the workforce in core STEM occupations. In talking to the STEM ambassador who was at this event, it was apparent that recruitment in engineering is quite hard, with not enough boys OR girls choosing to work in this area. This is also true in my area of work and is one of the reasons we are involved in the "Bridging the Digital Gap" project led by The National Archives. They note in a blog post about the project that:

"Digital skills are vital to the future of the archives sector ...... if archives are going to keep up with the pace of change, they need to attract members of the workforce who are confident in using digital technology, who not only can use digital tools, but who are also excited and curious about the opportunities and challenges it affords."

So why not try and catch them really young and get kids interested in our profession?

There were a few professionals speaking at the event and subjects were varied and interesting. We heard from someone who designed software for cars (who knew how many different computers are in a modern car?), someone who had to calculate exact mixes of seed to plant in Sites of Special Scientific Interest in order to encourage the right wild birds to nest there, a scientist who tested gelatin in sweets to find out what animal it was made from, an engineer who uses poo to heat houses....I had some pretty serious competition!

I only had a few minutes to speak so my challenge was to try and make digital preservation accessible, interesting and relevant in a short space of time. You could say that this was a bit of an elevator pitch to school kids.

Once I got thinking about this I had several ideas of different angles I could take.

I started off looking at the Mount School Archive that is held at the Borthwick. This is not a digital archive but was a good introduction to what archives are all about and why they are interesting and important. Up until 1948 the girls at this school created their own school magazine that is beautifully illustrated and gives a fascinating insight into what life was like at the school. I wanted to compare this with how schools communicate and disseminate information today and discuss some of the issues with preserving this more modern media (websites, twitter feeds, newsletters sent to parents via email).

Several powerpoint slides down the line I realised that this was not going to be short and snappy enough.

I decided to change my plans completely and talk about something that they may already know about, the Domesday Book.

I began by asking them if they had heard of the Domesday Book. Many of them had. I asked what they knew about it. They thought it was from 1066 (not far off!), someone knew that it had something to do with William the Conqueror, they guessed it was made of parchment (and they knew that parchment was made of animal skin). They were less certain of what it was actually for. I filled in the gaps for them.

I asked them whether they thought this book (that was over 900 years old) could still be accessed today and they weren't so sure about this. I was able to tell them that it is being well looked after by The National Archives and can still be accessed in a variety of ways. The main barrier to understanding the information is that it is written in Latin.

I talked about what the Domesday Book tells us about our local area. A search on Open Domesday tells us that Clifton only had 12 households in 1086. Quite different from today!

We then moved forward in time, to a period of history known as 'The 1980's' (a period that the children had recently been studying at school - now that makes me feel old!). I introduced them to the BBC Domesday Project of 1986. Without a doubt one of digital preservation's favourite case studies!

I explained how school children and communities were encouraged to submit information about their local areas. They were asked to include details of everyday life and anything they thought might be of interest to people 1000 years from then. People took photographs and wrote information about their lives and their local area. The data was saved on to floppy disks (what are they?) and posted to the BBC (this was before email became widely available). The BBC collated all the information on to laser disc (something that looks a bit like a CD but with a diameter of about 30cm).

I asked the children to consider the fact that the 900 year old Domesday Book is still accessible and  think about whether the 30 year old BBC Domesday Project discs were equally accessible. In discussion this gave me the opportunity to finally mention what digital archivists do and why it is such a necessary and interesting job. I didn't go into much technical detail but all credit to the folks who actually rescued the Domesday Project data. There is lots more information here.

Searching the Clifton and Rawcliffe area on Domesday Reloaded


Using the Domesday Reloaded website I was then able to show them what information is recorded about their local area from 1986. There was a picture of houses being built, and narratives about how a nearby lake was created. There were pieces written by a local school child and a teacher describing their typical day. I showed them a piece that was written about 'Children's Crazes' which concluded with:

" Another new activity is break-dancing
 There is a place in York where you can
 learn how to break-dance. Break     
 dancing means moving and spinning on
 the floor using hands and body. Body-
 popping is another dance craze where
 the dancer moves like a robot."


Disappointingly the presentation didn't entirely go to plan - my powerpoint only partially worked and the majority of my carefully selected graphics didn't display.

A very broken powerpoint presentation

There was thus a certain amount of 'winging it'!

This did however allow me to make the point that working with technology can be challenging as well as perhaps frustrating and exciting in equal measure!



Jenny Mitcham, Digital Archivist

Thursday, 29 March 2018

Digital preservation begins at home

A couple of things happened recently to remind me of the fact that I sometimes need to step out of my little bubble of digital preservation expertise.

It is a bubble in which I assume that everyone knows what language I'm speaking, in which everyone knows how important it is to back up your data, knows where their digital assets are stored, how big they might be and even what file formats they hold.

But in order to communicate with donors and depositors I need to move outside that bubble otherwise opportunities may be missed.

A disaster story

Firstly a relative of mine lost their laptop...along with all their digital photographs, documents etc.

I won't tell you who they are or how they lost it for fear of embarrassing them...

It wasn’t backed up...or at least not in a consistent way.

How can this have happened?

I am such a vocal advocate of digital preservation and do try and communicate outside my echo chamber (see for example my blog for International Digital Preservation Day "Save your digital stuff!") but perhaps I should take this message closer to home.

Lesson #1:

Digital preservation advocacy should definitely begin at home

When a back up is not a back up...

In a slightly delayed response to this sad event I resolved to help another family member ensure that their data was 'safe'. I was directed to their computer and a portable hard drive that is used as their back up. They confessed that they didn’t back up their digital photographs very often...and couldn’t remember the last time they had actually done so.

I asked where their files were stored on the computer and they didn’t know (well at least, they couldn’t explain it to me verbally).

They could however show me how they get to them, so from that point I could work it out. Essentially everything was in ‘My Documents’ or ‘My Pictures’.

Lesson #2:

Don’t assume anything. Just because someone uses a computer regularly it doesn’t mean they know where they put things.

Having looked firstly at what was on the computer and then what was on the hard drive it became apparent that the hard drive was not actually a ‘back up’ of the PC at all, but contained copies of data from a previous PC.

Nothing on the current PC was backed up and nothing on the hard drive was backed up.

There were however multiple copies of the same thing on the portable hard drive. I guess some people might consider that a back up of sorts but certainly not a very robust one.

So I spent a bit of time ensuring that there were 2 copies of everything (one on the PC and one on the portable hard drive) and promised to come back and do it again in a few months time.

Lesson #3:

Just because someone says they have 'a back up' it does not mean it actually is a back up.

Talking to donors and depositors

All of this made me re-evaluate my communication with potential donors and depositors.

Not everyone is confident in communicating about digital archives. Not everyone speaks the same language or uses the same words to mean the same thing.

In a recent example of this, someone who was discussing the transfer of a digital archive to the Borthwick talked about a 'database'. I prepared myself to receive a set of related tables of structured data alongside accompanying documentation to describe field names and table relationships, however, as the conversation evolved it became apparent that there was actually no database at all. The term database had simply been used to describe a collection of unstructured documents and images.

I'm taking this as a timely reminder that I should try and leave my assumptions behind me when communicating about digital archives or digital housekeeping practices from this point forth.











Jenny Mitcham, Digital Archivist

Monday, 4 December 2017

Cakes, quizzes, blogs and advocacy

Last Thursday was International Digital Preservation Day and I think I needed the weekend to recover.

It was pretty intense...

...but also pretty amazing!

Amazing to see what a fabulous international community there is out there working on the same sorts of problems as me!

Amazing to see quite what a lot of noise we can make if we all talk at once!

Amazing to see such a huge amount of advocacy and awareness raising going on in such a small space of time!

International Digital Preservation Day was crazy but now I have had a bit more time to reflect, catch up...and of course read a selection of the many blog posts and tweets that were posted.

So here are some of my selected highlights:

Cakes

Of course the highlights have to include the cakes and biscuits including those produced by Rachel MacGregor and Sharon McMeekin. Turning the problems that we face into something edible helps does seem to make our challenges easier to digest!

Quizzes and puzzles

A few quizzes and puzzles were posed on the day via social media - a great way to engage the wider world and have a bit of fun in the process.


There was a great quiz from the Parliamentary Archives (the answers are now available here) and a digital preservation pop quiz from Ed Pinsent of CoSector which started here. Also for those hexadecimal geeks out there, a puzzle from the DP0C Fellows at Oxford and Cambridge which came just at the point that I was firing up a hexadecimal viewer as it happens!

In a blog post called Name that item in...? Kirsty Chatwin-Lee at Edinburgh University encourages the digital preservation community to help her to identify a mysterious large metal disk found in their early computing collections. Follow the link to the blog to see a picture - I'm sure someone out there can help!

Announcements and releases

There were lots of big announcements on the day too. IDPD just kept on giving!

Of course the 'Bit List' (a list of digitally endangered species) was announced and I was able to watch this live. Kevin Ashley from the Digital Curation Coalition discusses this in a blog post. It was interesting to finally see what was on the list (and then think further about how we can use this for further advocacy and awareness raising).

I celebrated this fact with some Fake News but to be fair, William Kilbride had already been on the BBC World Service the previous evening talking about just this so it wasn't too far from the truth!

New versions of JHOVE and VeraPDF were released as well as a new PRONOM release.  A digital preservation policy for Wales was announced and a new course on file migration was launched by CoSector at the University of London. Two new members also joined the Digital Preservation Coalition - and what a great day to join!

Roadshows

Some institutions did a roadshow or a pop up museum in order to spread the message about digital preservation more widely. This included the revival of the 'fish screensaver' at Trinity College Dublin and a pop up computer museum at the British Geological Survey.

Digital Preservation at Oxford and Cambridge blogged about their portable digital preservation roadshow kit. I for one found this a particularly helpful resource - perhaps I will manage to do something similar myself next IDPD!

A day in the life

Several institutions chose to mark the occasion by blogging or tweeting about the details of their day. This gives an insight into what we DP folks actually do all day and can be really useful being that the processes behind digital preservation work are often less tangible and understandable than those used for physical archives!

I particularly enjoyed the nostalgia of following ex colleagues at the Archaeology Data Service for the day (including references to those much loved checklists!) and hearing from  Artefactual Systems about the testing, breaking and fixing of Archivematica that was going on behind the scenes.

The Danish National Archives blogged about 'a day in the life' and I was particularly interested to hear about the life-cycle perspective they have as new software is introduced, assessed and approved.

Exploring specific problems and challenges

Plans are my reality from Yvonne Tunnat of the ZBW Leibniz Information Centre for Economics was of particular interest to me as it demonstrates just how hard the preservation tasks can be. I like it when people are upfront and honest about the limitations of the tools or the imperfections of the processes they are using. We all need to share more of this!

In Sustaining the software that preserves access to web archives, Andy Jackson from the British Library tells the story of an attempt to maintain a community of practice around open source software over time and shares some of the lessons learned - essential reading for any of us that care about collaborating to sustain open source.

Kirsty Chatwin-Lee from Edinburgh University invites us to head back to 1985 with her as she describes their Kryoflux-athon challenge for the day. What a fabulous way to spend the day!

Disaster stories

Digital Preservation Day wouldn't be Digital Preservation Day without a few disaster stories too! Despite our desire to move away beyond the 'digital dark age' narrative, it is often helpful to refer to worse case scenarios when advocating for digital preservation.

Cees Hof from DANS in the Netherlands talks about the loss of digital data related to rare or threatened species in The threat of double extinction, Sarah Mason from Oxford University uses the recent example of the shutdown of DCist to discuss institutional risk, José Borbinha from Lisbon University, Portugal talks about his own experiences of digital preservation disaster and Neil Beagrie from Charles Beagrie Ltd highlights the costs of inaction.

The bigger picture

Other blogs looked at the bigger picture

Preservation as a present by Barbara Sierman from the National Library of the Netherlands is a forward thinking piece about how we could communicate and plan better in order to move forward.

Shira Peltzman from the University of California, Los Angeles tries to understand some of the results of the 2017 NDSA Staffing Survey in It's difficult to solve a problem if you don't know what's wrong.

David Minor from the University of San Diego Library, provides his thoughts on What we’ve done well, and some things we still need to figure out.

I enjoyed reading a post from Euan Cochrane from Yale University Library on The Emergence of “Digital Patinas”. A really interesting piece... and who doesn't like to be reminded of the friendly and helpful Word 97 paperclip?

In Towards a philosophy of digital preservation, Stacey Erdman from Beloit College, Wisconsin USA asks whether archivists are born or made and discusses her own 'archivist "gene"'.




So much going on and there were so many other excellent contributions that I missed.

I'll end with a tweet from Euan Cochrane which I thought nicely summed up what International Digital Preservation Day is all about and of course the day was also concluded by William Kilbride of the DPC with a suitably inspirational blog post.



Congratulations to the Digital Preservation Coalition for organising the day and to the whole digital preservation community for making such a lot of noise!




Jenny Mitcham, Digital Archivist

Monday, 31 July 2017

The mysterious case of the changed last modified dates

Today's blog post is effectively a mystery story.

Like any good story it has a beginning (the problem is discovered, the digital archive is temporarily thrown into chaos), a middle (attempts are made to solve the mystery and make things better, several different avenues are explored) and an end (the digital preservation community come to my aid).

This story has a happy ending (hooray) but also includes some food for thought (all the best stories do) and as always I'd be very pleased to hear what you think.

The beginning

I have probably mentioned before that I don't have a full digital archive in place just yet. While I work towards a bigger and better solution, I have a set of temporary procedures in place to ingest digital archives on to what is effectively a piece of locked down university filestore. The procedures and workflows are both 'better than nothing' and 'good enough' as a temporary measure and actually appear to take us pretty much up to Level 2 of the NDSA Levels of Preservation (and beyond in some places).

One of the ways I ensure that all is well in the little bit of filestore that I call 'The Digital Archive' is to run frequent integrity checks over the data, using a free checksum utility. Checksums (effectively unique digital fingerprints) for each file in the digital archive are created when content is ingested and these are checked periodically to ensure that nothing has changed. IT keep back-ups of the filestore for a period of three months, so as long as this integrity checking happens within this three month period (in reality I actually do this 3 or 4 times a month) then problems can be rectified and digital preservation nirvana can be seamlessly restored.

Checksum checking is normally quite dull. Thankfully it is an automated process that runs in the background and I can just get on with my work and cheer when I get a notification that tells me all is well. Generally all is well, it is very rare that any errors are highlighted - when that happens I blog about it!

I have perhaps naively believed for some time that I'm doing everything I need to do to keep those files safe and unchanged because if the checksum is the same then all is well, however this month I encountered a problem...

I've been doing some tidying of the digital archive structure and alongside this have been gathering a bit of data about the archives, specifically looking at things like file formats, number of unidentified files and last modified dates.

Whilst doing this I noticed that one of the archives that I had received in 2013 contained 26 files with a last modified date of 18th January 2017 at 09:53. How could this be so if I have been looking after these files carefully and the checksums are the same as they were when the files were deposited?

The 26 files were all EML files - email messages exported from Microsoft Outlook. These were the only EML files within the whole digital archive. The files weren't all in the same directory and other files sitting in those directories retained their original last modified dates.

The middle

So this was all a bit strange...and worrying too. Am I doing my job properly? Is this something I should be bringing to the supportive environment of the DPC's Fail Club?

The last modified dates of files are important to us as digital archivists. This is part of the metadata that comes with a file. It tells us something about the file. If we lose this date are we losing a little piece of the authentic digital object that we are trying to preserve?

Instead of beating myself up about it I wanted to do three things:

  1. Solve the mystery (find out what happened and why)
  2. See if I could fix it
  3. Stop it happening again
So how could it have happened? Has someone tampered with these 26 files? Perhaps unlikely considering they all have the exact same date/time stamp which to me suggests a more automated process. Also, the digital archive isn't widely accessible. Quite deliberately it is only really me (and the filestore administrators) who have access.

I asked IT whether they could explain it. Had some process been carried out across all filestores that involved EML files specifically? They couldn't think of a reason why this may have occurred. They also confirmed my suspicions that we have no backups of the files with the original last modified dates.

I spoke to a digital forensics expert from the Computer Science department and he said he could analyse the files for me and see if he could work out what had acted on them and also suggest a methodology of restoring the dates.

I have a record of the last modified dates of these 26 files when they arrived - the checksum tool that I use writes the last modified date to the hash file it creates. I wondered whether manually changing the last modified dates back to what they were originally was the right thing to do or whether I should just accept and record the change.

...but I decided to sit on it until I understood the problem better.

The end

I threw the question out to the digital preservation community on Twitter and as usual I was not disappointed!




In fact, along with a whole load of discussion and debate, Andy Jackson was able to track down what appears to be the cause of the problem.


He very helpfully pointed me to a thread on StackExchange which described the issue I was seeing.

It was a great comfort to discover that the cause of this problem was apparently a bug and not something more sinister. It appears I am not alone!

...but what now?

So I now I think I know what caused the problem but questions remain around how to catch issues like this more quickly (not six months after it has happened) and what to do with the files themselves.

IT have mentioned to me that an OS upgrade may provide us with better auditing support on the filestore. Being able to view reports on changes made to digital objects within the digital archive would be potentially very useful (though perhaps even that wouldn't have picked up this Windows bug?). I'm also exploring whether I can make particular directories read only and whether that would stop issues such as this occurring in the future.

If anyone knows of any other tools that can help, please let me know.

The other decision to make is what to do with the files themselves. Should I try and fix them? More interesting debate on Twitter on this topic and even on the value of these dates in the first place. If we can fudge them then so can others - they may have already been fudged before they got to the digital archive - in which case, how much value do they really have?


So should we try and fix last modified dates or should we focus our attention on capturing and storing them within the metadata. The later may be a more sustainable solution in the longer term, given their slightly slippery nature!

I know there are lots of people interested in this topic - just see this recent blog post by Sarah Mason and in particular the comments - When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates. It is great that we are talking about real nuts and bolts of digital preservation and that there are so many people willing to share their thoughts with the community.

...and perhaps if you have EML files in your digital archive you should check them too!




Jenny Mitcham, Digital Archivist

Tuesday, 5 January 2016

When digital preservation really matters...

Of course digital preservation always matters* but recent events in York and beyond over the festive period really do highlight the importance of looking after your stuff - both physical and digital.

Not everyone is lucky enough to get much warning before a disaster of any type strikes but in some situations (such as that which I found myself in just after Christmas) we have some time to prepare.

Hang on...there isn't normally a lake near my house

Beyond relocating important things such as the hamster and photo albums upstairs and moving the Christmas decorations higher up the tree, it is also important to remember the digital....



Digital is robust in some respects but perhaps more at risk in others. Robust in that it is possible to very quickly make as many additional copies as you like and store them in different places (perfect for a disaster scenario such as this), but the risk is that it is more easily forgotten.

Of course I back up my personal data (digital photographs mostly) regularly, but with the chaos of the build up to Christmas I had not done so for a few weeks, so was prompted to do so before unplugging the PC and moving it to higher ground.

We were some of the lucky ones in York - the water levels didn't reach us so the preparations were not necessary but others were not so lucky. Many houses and businesses in York and in other areas of the country were flooded and many did not have the luxury of time to prepare for the worst. The very basics of digital preservation, (maintaining a regular back up strategy and storing copies of the data in different locations) really is something that should happen in a proactive way not just in response to specific threats.



* I have to say that - it is in my job description

Jenny Mitcham, Digital Archivist

The sustainability of a digital preservation blog...

So this is a topic pretty close to home for me. Oh the irony of spending much of the last couple of months fretting about the future prese...