Digital Archiving at the University of York: accessioning

Showing posts with label accessioning. Show all posts

Tuesday, 24 January 2017

Creating an annual accessions report using AtoM

So, it is that time of year where we need to complete our annual report on accessions for the National Archives. Along with lots of other archives across the UK we send The National Archives summary information about all the accessions we have received over the course of the previous year. This information is collated and provided online on the Accessions to Repositories website for all to see.

The creation of this report has always been a bit time consuming for our archivists, involving a lot of manual steps and some re-typing but since we have started using AtoM as our Archival Management System the process has become much more straightforward.

As I've reported in a previous blog post, AtoM does not do all that we want to do in the way of reporting via it's front end.

However, AtoM has an underlying MySQL database and there is nothing to stop you bypassing the interface, looking at the data behind the scenes and pulling out all the information you need.

One of the things we got set up fairly early in our AtoM implementation project was a free MySQL client called Squirrel. Using Squirrel or another similar tool, you can view the database that stores all your AtoM data, browse the data and run queries to pull out the information you need. It is also possible to update the data using these SQL clients (very handy if you need to make any global changes to your data). All you need initially is a basic knowledge of SQL and you can start pulling some interesting reports from AtoM.

The downside of playing with the AtoM database is of course that it isn't nearly as user friendly as the front end.

It is always a bit of an adventure navigating the database structure and trying to work out how the tables are linked. Even with the help of an Entity Relationship Diagram from Artefactual creating more complex queries is ...well ....complex!

AtoM's database tables - there are a lot of them!

However, on a positive note, the AtoM user forum is always a good place to ask stupid questions and Artefactual staff are happy to dive in and offer advice on how to formulate queries. I'm also lucky to have help from more technical colleagues here in Information Services (who were able to help me get Squirrel set up and talking to the right database and can troubleshoot my queries) so what follows is very much a joint effort.

So for those AtoM users in the UK who are wrestling with their annual accessions report, here is a query that will pull out the information you need:

SELECT accession.identifier, accession.date, accession_i18n.title, accession_i18n.scope_and_content, accession_i18n.received_extent_units,

accession_i18n.location_information, case when cast(event.start_date as char) like '%-00-00' then left(cast(event.start_date as char),4)

else cast(event.start_date as char)

end as start_date,

case when cast(event.end_date as char) like '%-00-00' then left(cast(event.end_date as char),4)

else cast(event.end_date as char)

end as end_date,

event_i18n.date

from accession

LEFT JOIN event on event.object_id=accession.id

LEFT JOIN event_i18n on event.id=event_i18n.id

JOIN accession_i18n ON accession.id=accession_i18n.id

where accession.date like '2016%'

order by identifier

A couple of points to make here:

In a previous version of the query, we included some other tables so we could also capture information about the creator of the archive. The addition of the relation, actor and actor_i18n tables made the query much more complicated and for some reason it didn't work this year. I have not attempted to troubleshoot this in any great depth for the time being as it turns out we are no longer recording creator information in our accessions records. Adding a creator record to an accessions entry creates an authority record for the creator that is automatically made public within the AtoM interface and this ends up looking a bit messy (as we rarely have time at this point in the process to work this into a full authority record that is worthy of publication). Thus as we leave this field blank in our accession record there is no benefit in trying to extract this bit of the database.
In an earlier version of this query there was something strange going on with the dates that were being pulled out of the event table. This seemed to be a quirk that was specific to Squirrel. A clever colleague solved this by casting the date to char format and including a case statement that will list the year when there's only a year and the full date when fuller information has been entered. This is useful because in our accession records we enter dates to different levels.

So, once I've exported the results of this query, put them in an Excel spreadsheet and sent them to one of our archivists, all that remains for her to do is to check through the data, do a bit of tidying up, ensure the column headings match what is required by The National Archives and the spreadsheet is ready to go!

Jenny Mitcham, Digital Archivist

Friday, 18 March 2016

'A' is for AtoM

Over the past couple of years we have been busy working away behind the scenes on our implementation of Access to Memory (AtoM) at the Borthwick Institute for Archives and very soon we will be launching our new catalogue to the public.

I haven’t said much about AtoM on this blog thus far but it has been a huge preoccupation over the last couple of years. Here I attempt to redress that balance.

It turns out that deciding to adopt a system is relatively simple, working out exactly how you are going to use it is far more complex!

What follows is a list of just some of the things we have been thinking about and working on over the last couple of years as we move towards launch. I present you with the A to Z* of implementing AtoM….

A is for Accession Mask

We were very keen to use AtoM for accessioning…in fact the need to urgently find a new system for recording our accessions was the key driver for getting us moving with AtoM in the first place.

As we started using AtoM for recording new accessions we realised we needed to get the accessions mask right. This is just one of the configuration options within AtoM and it enables you to create unique references for your accessions. We wanted to ensure that our new accession numbers were in the same format as our previous ones so with a little bit of help and advice from the AtoM user forum settled on the mask “%Y/%iii” which creates numbers in our preferred format of [yyyy]/[no]. Now we just need to remember to reset the accession number to ‘0’ at the start of each new year so that our running number sequence starts again. This is just one of the ways that an institution can configure AtoM to suit local preferences.

B is for Business as Usual

Any organisation when adopting a new and complex system like AtoM needs to think beyond initial implementation and consider how the solution can be embedded into their workflows for the longer term? The ultimate goal for us is getting AtoM seen as 'business as usual' at the Borthwick. We are not there yet (though perhaps we almost are when it comes to working with accessions data). Getting us to the point where AtoM is not a standing item on our meeting agendas is something to aim for in the future!

C is for Customising the look and feel

AtoM gives you some options for customising of the look and feel of the front end. Being that the AtoM interface is going to be the primary means through which our users will browse and view information about our holdings, we want the interface to look consistent with our other communications. It needs to be clear that it belongs to us. Using our brand colours was a quick win and we also put some additional effort into creating an attractive image for the home page to make it look more visually appealing.

Note that there is a limit to the level of customisation that can be done without developer support. Within the admin interface of AtoM some basic changes to theme colours can be made, but I quickly found that changing the background colour to our Borthwick orange did not look pretty! Much better to call in our local technical experts to tweak the CSS behind the scenes.

D is for Drop Down Lists

AtoM comes ready populated with wordlists (called taxonomies) that populate the drop down lists to support data entry, however, institutions can change these to meet their own local needs. We have had to tweak a few of the taxonomies within AtoM, for example the deposit types in the accessions section and the levels of description (after much internal debate!).

E is for Experimenting

In order to understand AtoM we knew we really need to get some of our data into it. We experimented with some structured finding aids that already existed in EAD format and had a go at importing them. We discovered that data may not always import in the way you expect.

One of the key problem areas for us has been the way AtoM handles the <bioghist> element in our EAD files. The issue is documented here. Essentially what it tends to mean for us is that we end up with lots of untitled authority records when we import an EAD finding aid. This has been a bit of a barrier for us in getting more of our existing catalogues into AtoM. Experimenting and carrying out tests to check the behaviour however, does allow us to consider how we can tackle the issue and work towards a solution for future data imports.

F is for Friendly Advice

Though there is much detail in the AtoM documentation, anyone starting to use a new system such as AtoM will inevitably get to the point where they need to speak to someone, or see another implementation. The AtoM mailing list and the staff at Artefactual Systems are friendly and helpful and it is easy to get quick answers to specific questions. It is also incredibly valuable to have a local AtoM user to talk to, to bounce random questions off (particularly ones that may sound too silly or trivial for the mailing list!).

G is for Give it a Name

In the last few weeks before AtoM launch it occurred to us that we needed to decide what to call it. Internally we have simply been calling it ‘AtoM’ but we realised that this label is of little use to our users. As we started to finalise the interface and prepare the publicity for launch date we agreed that we would call it the ‘Borthwick Catalogue’. Perhaps not very imaginative but it is at the very least a concise description of its content and purpose!

H is for Help Pages

An online archival catalogue is quite a complex thing and we are aware that some of our users may be a bit daunted by it. Help pages are therefore really important to describe how to search and filter the results.

AtoM comes with some standard static pages, that can be very easy edited. We've been working on our help pages and expect we will be editing these further once we have completed our user testing. We have also created another static page to act as a glossary of archival terms. Although one of AtoM's big selling points for us was the fact it was aligned with archival standards and terminology, we are concerned that our users may struggle with some of the language used.

I is for ISDIAH

Within AtoM the archival descriptions from an institution all link back to an ISDIAH record that describes the archival institution. This record is useful for users of our data, whether browsing within the AtoM interface directly or through aggregators.

We have had some internal debate on the extent to which we should replicate information that is on our website, but have decided that providing links to the relevant content would be better in many cases. For information about access and opening hours and the extent of our holdings, we want to ensure that the information is accurate and up to date, and having another place where this information would need to be edited adds an extra overhead.

J is for Just Start!

For a while we were stuck in a chicken and egg situation. Not sure how to use AtoM until it was set up properly and ready to go, and not sure how to set it up until we had started using it and fully understood the issues we would encounter.

Reading the documentation is essential but testing and experimenting with AtoM are really the best ways of working it out. Only by importing different datasets into AtoM or by creating new ones direct into the web form did we really understand how it worked and how this impacted on our own internal workflows. Learn by doing!

K is for Kittens (because they are never really free)

AtoM is open source and freely available for all. However, Artefactual Systems who support it stress it is “free as in free kittens”. In other words, you can have AtoM for free but it isn’t cost neutral - you need someone to install it, manage the server, configure it, and administer it. Populating it is also going to require a huge outlay of staff time.

On top of this, there will undoubtedly be things that you want AtoM to do that it doesn't yet do. If you are implementing AtoM, have a budget for funding further developments. Sponsored developments will then benefit the wider AtoM community and together we can make AtoM better and better. Quite early on in our AtoM implementation project we funded a small piece of work to include covering dates within the accessions module of AtoM as we felt that this was important information to record during the accessioning process and we did not want to lose this data from our existing accessions records when we imported them into the system. Of course we are hoping this feature will also be valuable for other AtoM users. There will undoubtedly be other feature developments we will sponsor in the future.

L is for Local Guidance

One of AtoM’s key selling points to us was the fact that it was created in association with the International Council on Archives (ICA) and is closely aligned with their metadata standards. There is however still a need for local guidance on how we intend to use some of these metadata fields.

In response to this we have created our own AtoM handbook to sit alongside the documentation that Artefactual provides. The handbook doesn't duplicate the official documentation, but describes our local procedures and requirements for data entry. This is all the more necessary given the fact that the majority of the data fields within AtoM are free text fields. With multiple users entering data into AtoM, it is important to have local guidance to ensure we maintain some consistency in the way we describe our archives.

M is for MySQL access

When we initially assessed AtoM against our requirements for an archival management system, it performed well but it didn't do everything we needed it to do. Searching and reporting functionality within AtoM does not currently meet all of our needs. It was considered essential then that we had another method of querying the data within AtoM and producing reports and statistics. To do this, we need access to the MySQL database that sits behind AtoM.

Access to the the data via a free tool (I use Squirrel but there are other options out there) and a working knowledge of Structured Query Language allows you to do pull out exactly the data you require.

AtoM has quite a complex and involved data structure so getting to grips with this was a bit of a learning curve, but having now got a working query to enable me to extract an annual summary of all accessions we have received over a given year I feel ready for the next challenge that is thrown my way!

N is for Not Perfect

AtoM (like all complex systems) has its limitations. It ticks many boxes for us but it does not tick them all. There are several areas where we think it could improve and we have been discussing these with the user community and developers and hope to influence its roadmap. As with all open source solutions, rather than complaining about what it doesn't do well, the user community should be working together to solve problems and support improvements. AtoM is not perfect but we are confident that it is moving in the right direction and getting better all the time.

O is for Objects (digital ones!)

One of the main reasons I got involved with AtoM implementation was because I wanted a stable base to build a digital archive on – a single point of truth about our holdings and a single system through which our users could access information about our holdings. Being able to expose access copies of our born digital archives and digitised content via AtoM is something we haven’t yet explored in full but this work will become a priority over the next couple of years. Once AtoM is launched I will be turning my attention back to Archivematica in order to help get this moving.

P is for Populating AtoM

This is undoubtedly the biggest challenge we have. Over the course of the 60 years we have been in existence, the Borthwick has created a wealth of catalogues and finding aids. Of course, these are in a range of different formats and states of completion. Some are digital, some are not. Of the digital ones, some are structured data and some are not. Some comply to modern archival standards and some don’t. Some are complete but some do not include information about more recent accruals to the archives. Just working out the current state of play is a challenge in itself.

Being both pragmatic and realistic about what is achievable is a good place to start. Getting all of this information into AtoM is a huge task and not something we can do quickly. While we have managed to enter some full finding aids into AtoM, we have not had the staff time to do as much as we would have liked. What we have prioritised though, is the creation of a collection level description for each archive that we hold and this is being achieved through Project Genesis.

Populating AtoM with our accessions data was also not without its problems but now this has been achieved we are able to browse and search all of our accessions data in one place for the first time - a really important step for us!

Q is for Quality

In an ideal world, all our data within AtoM would be of a high quality.

...but we do not live in an ideal world.

Accepting that legacy data will not always meet current standards or be as accurate as we would like is key to moving forward with a system such as this.

We are striving for a full range of high quality and standards compliant finding aids within AtoM but difficult decisions have to be made. Is it better to expose a small number of perfect catalogues or a larger number of catalogues that don’t contain all the mandatory ISAD(G) fields? The second option gets my vote.

R is for Reference Codes

Quite early on, we had to make a decision about whether or not to inherit reference codes. This is a setting you can change within the admin section of AtoM and a very important one to give some thought to before you go too far down the data entry or import route.

AtoM can either be set up so that you enter the full reference code for each level of the hierarchy of archival description, or it can be set up to inherit previous levels of its reference code depending on its position within the hierarchy.

There is no right or wrong answer here and each institution will need to work out what will suit them best. It can be hard to make a decision like this at the point where you are just starting out. Until you start to use AtoM in earnest you may not understand the full implications of your decision. Having initially agreed internally that we were going to inherit the reference code to save time with data entry and help guard against human errors, we subsequently changed our minds and decided not to inherit. This decision was influenced heavily by the way AtoM displays the reference numbers to the end user and how the archival hierarchy appears on the left side of the interface. We wanted the full reference to be displayed alongside each element of the hierarchy to help our users interpret the data and more easily see how the different levels relate to each other.

Time will tell whether we've made the right decision or not, but I imagine that once we have a substantial quantity of data within AtoM, this will become a harder decision to change!

S is for Session Timeout

Beware the inactive session timeout! AtoM times out by default after 30 minutes of inactivity. This has caused us problems when creating detailed descriptions within AtoM. If completing the Scope and Content field for a large and complex archive, it is necessary to spend some time consulting the physical archives and composing a description. Colleagues sometimes found that by the time they came to save their record the session had timed out. Naturally this was the source of great frustration.

We experimented with trying to extend the inactive session timeout period but these efforts were not successful. To avoid data loss we do encourage staff to regularly save their work. A text editor can also be used to compose descriptions. With an autosave function and no timeout, data is safer here and can be pasted into AtoM once it is complete.

T is for Training

Artefactual Systems offer introductory training sessions in AtoM and delivered one of these to Borthwick staff via WebEx at the start of our implementation project. This was well worth the expense, ensuring that staff understood the capabilities of the system and had a basic grounding in how to use it. I had my reservations about how well a training session via WebEx would work, but needn't have worried on that score. We heard Sarah Romkey from Artefactual Systems in Canada loud and clear and she was able to maintain a high level of enthusiasm throughout the session despite the fact that we had got her out of bed very early in the morning.

Training is not just a one off exercise. Now we are further along in our AtoM implementation we will be arranging further staff training to focus more on our local use of AtoM and internal processes and workflows.

U is for User Profiles and Roles

We have been giving some thought to who needs to do what within AtoM.

Who should have access to the import and export functions?
Who will be able to add new users to the system?
Who needs the ability to edit the static pages?
Who can publish and delete archival descriptions?
Who can change the accessions counter?

We are keen that AtoM is widely used by our staff and want to ensure that everyone has the necessary permissions to be able to carry out their work. User roles may evolve over time but some initial decisions do need to be made in the early stages of implementation.

V is for Volunteers

Prior to release of AtoM we have been calling for volunteers from our user base to help us test AtoM and give us their feedback.

We have put a lot of work into getting our AtoM instance ready to release and we have had our users in mind at many stages of the process. We now need to find out whether we have got it right. User testing is ongoing and we envisage we will be making some changes to AtoM once the feedback is collated.

We are really looking forward to seeing what people think.

W is for Web Address

We have made some decisions about the web address we will use for our production version of AtoM. The default url had ‘atom’ in it, but we wanted to change this to something more meaningful. AtoM means something to us and perhaps to other archives professionals but not to our users.

So, we have replaced ‘atom’ with something more descriptive and meaningful to our users – we will be plastering this url over the bookmarks and other publicity we are creating for our scheduled launch date so we want to get it right!

X is for XML

We do not want people to have to come to us to find out what we hold, we want our data to be signposted as widely as possible via other portals and aggregators both nationally and internationally. By doing so we facilitate serendipitous discovery and attract new users.

To this end we have been talking to external aggregators such as the Archives Hub to find out whether our AtoM data can be incorporated into their portal. We have been exporting sample data as EAD XML files so that the Archives Hub can assess it and see if it can be incorporated into their portal. A few initial problems with the EAD that AtoM creates have been ironed out and we are moving closer to being able to make this a reality over the next few months.

Y is for YorSearch

One of the features of AtoM we have been looking at before launch is the OAI-PMH functionality. We have used this to enable our AtoM data to be surfaced as simple Dublin Core metadata via our University Library Catalogue, YorSearch. It will be interesting to see whether students and staff members from the University (who may not have thought to consult our catalogue directly) will be approaching us in the future to consult our archives.

So, these are some of the things we have been thinking about and working on over the last year or so whilst moving our AtoM implementation from idea to reality. Hopefully it is of use to others who are embarking on the same process.

And of course, watch this space for news of launch!

* Actually an A-Y ...did anyone notice that there was no letter 'Z'?

Jenny Mitcham, Digital Archivist

Thursday, 22 November 2012

The first accession!

I am pleased to report that last week I accessioned the first files into the digital archive here at the Borthwick Institute!

This may sound like a rather grand claim at the moment. I will admit that we do not have a 'digital archive' infrastructure in place yet and we are still in the very early stages of considering how best to treat digital material. 'Accessioning' of digital data is not the formal process that I would like it to be, but I am setting up some basic procedures to tide us over until a more cohesive system of managing digital archives alongside their analogue friends and relations is established.

It has been said many times that with digital preservation there is no point waiting for the perfect solution because this may be a long time coming. If we keep on waiting, the problem will get bigger and crucially, data loss may occur ...so this is the methodology I have established so far.

Once I have checked that the media is readable and free from viruses, my first priority is to ensure that new digital data deposited with us is copied on to our digital archive server storage space (and securely backed up). This is of utmost importance and is the first step to ensuring longevity of digital data. If data exists only on one device (whether it is a floppy disc, a DVD or a hard drive) we can not assume it will still be readable or usable next time we need to access it. Ensuring we have more than one copy of the digital data is a key step towards preserving that data.

The next step is to find out exactly what we have. File identification and characterisation tools such as DROID are really helpful here. Running DROID over the files will produce a list of technical metadata about each file. This will include the file name and file size alongside a checksum (we can use this over time to check that a file hasn’t corrupted or been accidentally altered). DROID also tries to identify the exact file type and version of your files. Very useful information as this can provide a starting point for making decisions about future file migrations.

It is also important to maintain a record of the digital deposit process and the provenance of the data. Keeping copies of relevant correspondence about the process and any other documentation submitted which describes the data is crucial as it is hard to recreate this if not captured at the time of deposit.

This is the just the starting point for me - the first steps towards preserving the material. However, the small steps we can take now should ensure that the files can be more easily incorporated into a fuller digital archiving solution in the future.

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Tuesday, 24 January 2017

Creating an annual accessions report using AtoM

Friday, 18 March 2016

'A' is for AtoM

A is for Accession Mask

B is for Business as Usual

C is for Customising the look and feel

D is for Drop Down Lists

E is for Experimenting

F is for Friendly Advice

G is for Give it a Name

H is for Help Pages

I is for ISDIAH

J is for Just Start!

K is for Kittens (because they are never really free)

L is for Local Guidance

M is for MySQL access

N is for Not Perfect

O is for Objects (digital ones!)

P is for Populating AtoM

Q is for Quality

R is for Reference Codes

S is for Session Timeout

T is for Training

U is for User Profiles and Roles

V is for Volunteers

W is for Web Address

X is for XML

Y is for YorSearch

Thursday, 22 November 2012

The first accession!

The sustainability of a digital preservation blog...

Twitter

Subscribe