King's Lynn's borough archives are cared for jointly by the Borough Council and the Norfolk Record Office |
Profiling Digital Records with DROID
With any local authority archive there is an assumption that the accession deposited might be literally anything. What it means in 'digital terms' is that it is impossible to predict what sort of data might be coming in in the future. That is the reason why NRO have been actively involved in developing their digital preservation strategy, aiming at achieving capability so as to be able to choose digital records over their paper-based equivalents (hard copies/printouts).The archive service has been receiving digital records accessions since the late 1990's. The majority of digitally born archives came in as hybrid accessions from local schools that were being closed down. For many records there were no paper equivalents. Among other deposits containing digital records are architectural surveys, archives of private individuals and local organisations (for example Parish Council meetings minutes).
The archive service have been using DROID as part of their digital records archival processing procedure as it connects to the most comprehensive and continuously updated file formats registry PRONOM. Archivematica, an ingest system that uses the PRONOM registry, is currently being introduced at NRO. It contains other file format identification tools like FIDO or Siegfried (which both use PRONOM identifiers).
The results of DROID survey were as follows:
With the latest signature file (v.86) out of 49,117 files identification was successful for 96.46%.
DROID identified 107 various file formats. The ten most recurring file formats were:
Classification | File Format Name | Versions | PUIDS |
Image (Raster) | JPEG File Interchange Format | 1.01, 1.02 | fmt/43, fmt/44 |
Image (Raster) | Exchangeable Image File Format (Compressed) | 2.1, 2.2 | x-fmt/390, x-fmt/391 |
Image (Raster) | Windows Bitmap | 3 | fmt/116 |
Text (Mark-up) | Hypertext Markup Language | 4 | fmt/96, fmt/99 |
Word Processor | Microsoft Word Document | 97-2003 | fmt/40 |
Image (Raster) | Tagged Image File Format | fmt/353 | |
Email | Microsoft Outlook Email Message | 97-2003 | x-fmt/430 |
Miscellaneous | AppleDouble Resource Fork | fmt/503 | |
Image (Raster) | Graphics Interchange Format | 89a | fmt/4 |
Image (Raster) | Exchangeable Image File Format (Compressed) | 2.2.1 | fmt/645 |
Identification method breakdown:
- 83.31% was identified by signature
- 14.95% by container
- 1.73% by Extension
458 files had their extensions mismatched - that amounts to less than one per cent (0.97%). These were a variety of common raster image file formats (JPEG, PNG, TIFF) word processor (Microsoft Word Document, ClarisWorks Word Processor) and desktop publishing (Adobe Illustrator, Adobe InDesign Document, Quark Xpress Data File).
Among 3.54% of unidentified files there were 160 different unknown file extensions. Top five were:
- .cmp
- .mov
- .info
- .eml
- .mdb
Two files returned more than 1 identification:
A spreadsheet file with .xls extension (last modified date 2006-12-17) had 3 possible file format matches:
- fmt/175 Microsoft Excel for Macintosh 2001
- fmt/176 Microsoft Excel for Macintosh 2002
- fmt/177 Microsoft Excel for Macintosh 2004
And an image file with extension .bmp (last modified date 2007-02-06) received 2 file format matches
- fmt/116 Windows Bitmap 3
- fmt/625 Apple Disk Copy Image 4.2
After closer inspection the actual file was a bitmap image file and PUID fmt/116 was the correct one.
Understanding the Results
DROID offers very useful classification of file formats and puts all results into categories, which enables an overview of the digital collection. It is easy to understand what sort of digital content is predominantly included within the digitally born accession/archive/collection. It uses classification system that assigns file formats to broader groups like: Audio, Word Processor, Page Description, Aggregate etc. These help enormously in having a grasp on the variety of digital records. For example it was interesting to discover that over half of our digitally born archives are in various raster image file formats.
Files profiled at Norfolk Record Office as classified by DROID |
I am of course also interested in the levels of risk associated with particular formats so have started to work on an additional classification for the data, creating further categories that can help with preservation planning. This would help demonstrate where preservation efforts should be focused in the future.
Jenny Mitcham, Digital Archivist