To reorganise or to not reorganise?
A blog post from Jim Costin our Bridging the Digital Gap trainee - written for International Digital Preservation Day
Last year, Jenny Mitcham, our former digital archivist, posted about saving your digital stuff from becoming files which are unable to be opened and how to manage a personal digital archive. What I’m going to be talking about is following on from that and a unique issue which we both came across recently when putting some new items into our digital archive. This phenomenon can be referred to as the folders within folders with folders...
I’ll forgive you if you’ve never heard that term before. To put it simply, the phrase refers to having multiple folders nested within each other. The picture below will give you an example of it which was created just for this post:
Now, whilst the example above shows how to not name your folders, it does show how easily you can end up nesting folders. Whilst this might be a very good way of organising data and knowing exactly what it is, this can create problems when you are trying to preserve the data in a archival sense. The operating systems we use at the moment to manage our digital archive can only go so deep in the current iteration. So now that I have explained a little bit about nesting, let me explain the problem that can occur.
“I can’t delete?”
As that section title says, that was the problem. It is possible that you may never have come across not being able to delete a file for the reason I am about to explain, and might well never do, but this was an issue for us as we needed to remove that file. Now the traditional method of file deletion would have worked in any other circumstance. Simply click on the file and press delete. Simple right? Well, not in this case.
Windows decided to throw an error saying:
‘Unable to find file’
and as such Jen and myself were a little confused as to why that issue was occuring. Both of us had never come across this issue before until I remembered hearing something about reaching the file extension limit a while ago. When you contain folders within folders within folders at a certain point windows explorer stops being able to find the file. The maximum path length (path is where the file is eg. C:/John/Documents/Letters/2018) for explorer is 255/260 characters which as you can imagine can cause issues when items are nested so deeply. Thankfully our IT team were able to fix the issue but how can we avoid it in future? To answer that question, we need to answer another question first, one which is a lot harder to answer than that one.
Do you preserve the structure or not?
From a paper records perspective, preserving a structure is a little easier as you have catalogues and indexes which can be used to refer to where an item is and what it is. When dealing with digital data however, that becomes a little more complicated. The way which we receive the data might be in an incoherent mess which makes no sense to anyone but the donor. We then have to decide whether to preserve the data in that format or to reset the structure into something more logical. But just what is more logical?
That question is a little easier to answer as it can come down to personal preference and is governed under very similar laws to traditional archival items. The issue then comes when you try to catalogue and preserve the data. Do you keep it nested within folders and run the risk of encountering the issue of windows explorer being unable to find it, or do you take the approach of using labels and tags?
The latter approach might make a lot of sense in certain circumstances where you are dealing a large amount of data. To enable this to be done, you would organise the data into sectors and build relationships between the data. The easiest way to describe this would be in the form of a diagram.
The above diagram shows how a typical business might split its records down. It will have employee records and records on its assets. The overarching hierarchy will be defining the data set and then tags on the data can be used to help build the relationships between the data. For example, a part time staff member might help with a project that creates a physical asset. Therefore, a tag would be added to that members record saying that they worked on X project. By using those tags, a database can be created which will allow the easy access to the files, meaning that not only are researchers able to more easily find what they want, we can avoid the issue of excessively long path names.
Now whilst that might seem that it is problem solved, it sort of isn’t really. What happens when that business decides to reorganise and as a result that part time staff member gets reassigned and is no longer working on that project?. Do you preserve that data that says that they worked on that project or do you modify your records to suit what the business is now like?
Those are all questions which a digital archivist must answer and makes the tasks of preserving digital data all the more challenging!