My walk to the conference on the first day |
On the conference website PASIG is described as "a place to learn from each other's practical experiences, success stories, and challenges in practising digital preservation." This sounded right up my street and I was not disappointed. The practical focus proved to be a real strength.
The conference was three days long and I took pages of notes (and lots of photographs!). As always, it would be impossible to cover everything in one blog post so here is a round up of some of my highlights. Apologies to all of those speakers who I haven't mentioned.
Bootcamp! |
The highlight of the first day for me was an excellent talk by Bert Lyons from AVPreserve called "The Anatomy of Digital Files". This talk was a bit of a whirlwind (I couldn't type my notes fast enough) but it was so informative and hugely valuable. Bert talked us through the binary and hexadecimal notation systems and how they relate to content within a file. This information backed up some of the things I had learnt when investigating how file format signatures are created and really should be essential learning for all digital archivists. If we don't really understand what digital files are made up of then it is hard to preserve them.
Bert also went on to talk about the file system information - which is additional to the bytes within the file - and how crucial it is to also preserve this information alongside the file itself. If you want to know more, there is a great blog post by Bert that I read earlier this year - What is the chemistry of digital preservation?. It includes a comparison about the need to understand the materials you are working with whether you are working in physical conservation or digital preservation. One of the best blog posts I've read this year so pleased to get the chance to shout about it here!
Hands up if you love ISO 16363! |
Jon Tilbury from Preservica gave a thought provoking talk entitled "Preservation Architectures - Now and in the Future". He talked about how tool provision has evolved, from individual tools (like PRONOM and DROID) to integrated tools designed for an institution, to out of the box solutions. He suggested that the fourth age of digital preservation will be embedded tools - with digital preservation being seamless and invisible and very much business as usual. This will take digital preservation from the libraries and archives sector to the business world. Users will be expecting systems to be intuitive and highly automated - they won't want to think in OAIS terms. He went on to suggest that the fifth age will be when every day consumers (specifically his mum!) are using the tools without even thinking about it! This is a great vision - I wonder how long it will take us to get there?
Erin O'Meara from University of Arizona Libraries gave an interesting talk entitled "Digital Storage: Choose your own adventure". She discussed how we select suitable preservation storage and how we can get a seat at the table for storage discussions and decisions within our institutions. She suggested that often we are just getting what we are given rather than what we actually need. She referenced the excellent NDSA Levels of Digital Preservation which are a good starting point when trying to articulate preservation storage needs (and one which I have used myself). Further discussions on Twitter following on from this presentation highlighted the work on preservation storage requirements being carried out as a result of a workshop at iPRES 2016, so this is well worth following up on.
A talk from Amy Rushing and Julianna Barrera-Gomez from the University of Texas at San Antonio entitled "Jumping in and Staying Afloat: Creating Digital Preservation Capacity as a Balancing Act" really highlighted for me one of the key messages that has come out of our recent project work for Filling the Digital Preservation Gap. This is that, choosing a digital preservation system is relatively easy but actually deciding how to use it is the harder! After ArchivesDirect (a combination of Archivematica and DuraSpace) was selected as their preservation system (which included 6TB of storage), Amy and Julianna had a lot of decisions to make in order to balance the needs of their collections with the available resources. It was a really interesting case study and valuable to hear how they approached the problem and prioritised their collections.
The Museum of Modern Art in New York |
In order to meet these needs, Rosetta is moving towards greater openness, enabling institutions to swap out any of the tools for ingest, preservation, deposit or publication. This flexibility allows the system to be better suited for a greater range of use cases. They are also being more open with their documentation and this is a very encouraging sign. The Rosetta Developer Network documentation is open to all and includes information, case studies and workflows from Rosetta users that help describe how Rosetta can be used in practice. We can all learn a lot from other people even if we are not using the same DP system so this kind of sharing is really great to see.
MOMA in the rain on day 2! |
One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution.
Another engaging talk from Joanna Phillips from the Guggenheim Museum and and Deena Engel of New York University described a really productive collaboration between the two institutions. Computer Science students from NYU have been working closely with the time-based media conservator at the museum on the digital artworks in their care. This symbiotic relationship enables the students to earn credit towards their academic studies whilst the museum receives valuable help towards understanding and preserving some of their complex digital objects. Work that the students carry out includes source code analysis and the creation of full documentation of the code so that is can be understood by others. Some also engage with the unique preservation challenges within the artwork, considering how it could be migrated or exhibited again. It was clear from the speakers that both institutions get a huge amount of benefit from this collaboration. A great case study!
Karen Cariani from WGBH Educational Foundation talked about their work (with Indiana University Libraries) to build HydraDAM2. This presentation was of real interest to me given our recent Filling the Digital Preservation Gap project in which we introduced digital preservation functionality to Hydra by integrating it with Archivematica. HydraDAM2 was a different approach, building a preservation head for audio-visual material within Hydra itself. Interesting to see a contrasting solution and to note the commonalities between their project and ours (particularly around the data modelling work and difficulties recruiting skilled developers).
More rain at the end of day 2 |
The lightning talks on the afternoon of the second day were also of interest. Great to hear from such a range of practitioners.... though I did feel guilty that I didn't volunteer to give one myself! Next time!
On the morning of day 3 we were treated to an excellent presentation by Dragan Espenschied from Rhizome who showed us Webrecorder. Webrecorder is a new open source tool for creating web archives. It uses a single system both for initial capture and subsequent access. One of its many strengths appears to be the ability to capture dynamic websites as you browse them and it looks like it will be particularly useful for websites that are also digital artworks. This is definitely one to watch!
MOMA again! |
Eira Tansey from the University of Cincinnati gave a very thought provoking talk with a key question for us to think about - why do we continue to buy more storage rather than appraise? This is particularly important considering the environmental costs of continuing to store more and more data of unknown value.
Ben Goldman of Penn State University also picked up this theme, looking at the carbon footprint of digital preservation. He pointed out the paradox in the fact we are preserving data for future generations but we are powering this work with fossil fuels. Is preserving the environment not going to be more important to future generations than our digital data? He suggested that we consider the long term impacts of our decision making and look at our own professional assumptions. Are there things that we do currently that we could do with less impact? Are we saving too many copies of things? Are we running too many integrity checks? Is capturing a full disk image wasteful? He ended his talk by suggesting that we should engage in a debate about the impacts of what we do.
Amelia Acker from the University of Texas at Austin presented another interesting perspective on digital preservation in mobile networks, asking how our collections will change as we move from an information society to a networked era and how mobile phones change the ways we read, write and create the cultural record. The atomic level of the file is no longer there on mobile devices. Most people don't really know where the actual data is on their phones or tablets, they can't show you the file structure. Data is typically tied up with an app and stored in the cloud and apps come and go rapidly. There are obvious preservation challenges here! She also mentioned the concept of the legacy contact on Facebook...something which had passed me by, but which will be of interest to many of us who care about our own personal digital legacy.
Yes, there really is steam coming out of the pavements in NYC |
Diacritics can cause problems when trying to open the files or use our preservation tools (for example Bagger). When she encountered problems like these she put a question out to the digital preservation community asking how to solve the problem and she was grateful to receive so many responses but at the same time was concerned about the language used. It was suggested that she 'scrub', 'clean' or 'detox' the file names in order to remove the 'illegal characters' but she was concerned that our attitudes towards accented characters further marginalises those who do not fit into our western ideals.
She also explored how removing or replacing these accented characters would impact on the files themselves and it was clear that meaning would change significantly. 'Campaign' (a word included in so many of the filenames) would change to 'bell'. She decided not to change the file names but to try and find a work around and she was eventually successful in finding a way to keep the filenames as they were (using the command line to turn the latin characters to UTF8). The message that she ended on was that we as archivists should do no harm whether we are dealing with physical or digital archives. We must juggle our priorities but think hard about where we compromise and what is important to preserve. It is possible to work through problems rather than work around them and we need to be conscious of the needs of collections that fall outside our defaults. This was real food for thought and prompted an interesting conversation on twitter afterwards.
Times Square selfie! |
Jenny Mitcham, Digital Archivist
Thank you for taking the time to summarize your experience from PASIG. For those of us not being able to attend, it's the next best thing and gives a great overview of what is being discussed. Especially the bits about understanding what you are trying to preserve are of interest, both in terms of files and the materials themselves. I also finished reading the Filing the Digital Preservation Gap report trilogy from York/Hull last week (would have read it earlier but was busy with my MSc diss on DP...) which perfectly captures the complexity of what we're trying to achieve in terms of preserving digital research data and has given us some ideas for the future.
ReplyDeleteJenny Mitcham, Digital Archivist
Thanks Jaana - that is the first time I've thought of those reports as being a trilogy. Makes them sound so much more exciting I think! Glad the summary of PASIG is of interest. My only regret is that I couldn't mention more of the talks.
DeleteJenny Mitcham, Digital Archivist
Thanks so much for this, Jen. I wasn't able to attend, and trying to keep up on Twitter didn't work out very well. Really appreciate the time you've take to write this up. So many useful links and projects in one place :)
ReplyDeleteJenny Mitcham, Digital Archivist
Thank you for writing up and sharing your take aways with us. I found the information very helpful!
ReplyDeleteJenny Mitcham, Digital Archivist