Digital Archiving at the University of York: Understanding WordStar

Last month I was pleased to be able to give a presentation at 'After the Digital Revolution' about some of the work I have been doing on the WordStar 4.0 files in the Marks and Gran digital archive that we hold here at the Borthwick Institute for Archives. This event specifically focused on literary archives.

It was some time ago now that I first wrote about these files that were recovered from 5.25 inch floppy (really floppy) disks deposited with us in 2009.

My original post described the process of re-discovery, data capture and file format identification - basically the steps that were carried out to get some level of control over the material and put it somewhere safe.

I recorded some of my initial observations about the files but offered no conclusions about the reasons for the idiosyncrasies.

I’ve since been able to spend a bit more time looking at the files and investigating the creating application (WordStar) so in my presentation at this event I was able to talk at length (too long as usual) about WordStar and early word processing. A topic guaranteed to bring out my inner geek!

WordStar is not an application I had any experience with in the past. I didn’t start word processing until the early 90’s when my archaeology essays and undergraduate dissertation were typed up into a DOS version of Word Perfect. Prior to that I used a typewriter (now I feel old!).

WordStar by all accounts was ahead of its time. It was the first Word Processing application to include mail merge functionality. It was hugely influential, introducing a number of keyboard shortcuts that are still used today in modern word processing applications (for example control-B to make text bold). Users interacted with WordStar using their keyboard, selecting the necessary keystrokes from a set of different menus. The computer mouse (if it was present at all) was entirely redundant.

WordStar was widely used as home computing and word processing increased in popularity through the 1980’s and into the early 90’s. However, with the introduction of Windows 3.0 and Word for Windows in 1989, WordStar gradually fell out of favour (info from Wikipedia).

Despite this it seems that WordStar had a loyal band of followers, particularly among writers. Of course the word processor was the key tool of their trade so if they found an application they were comfortable with it is understandable that they might want to stick with it.

I was therefore not surprised to hear that others presenting at 'After the Digital Revolution' also had WordStar files in their literary archives. Clear opportunities for collaboration here! If we are all thinking about how to provide access to and preserve these files for the future then wouldn't it be useful to talk about it together?

I've already learnt a lot through conversations with the National Library of New Zealand who have been carrying out work in this area (read all about it here: Gattuso J, McKinney P (2014) Converting WordStar to HTML4. iPres.)

However, this blog post is not about defining a preservation strategy for the files it is about better understanding them. My efforts have been greatly helped by finding a copy of both a WordStar 3 manual and a WordStar 4 manual online.

As noted in my previous post on this subject there were a few things that stand out when first looking at the recovered WordStar files and I've used the manuals and other research avenues to try and understand these better.

Created and last modified dates

The Marks and Gran digital archive consists of 174 files, most of which are WordStar files (and I believe them to be WordStar version 4).

Looking at the details that appear on the title pages of some of the scripts, the material appears to be from the period 1984 to 1987 (though not everything is dated).

However the system dates associated with the files themselves tell a different story.

The majority of files in the archive have a creation date of 1st January 1980.

This was odd. Not only would that have been a very busy New Year's Day for the screen writing duo, but the timestamps on the files suggest that they were also working in the very early hours of the morning - perhaps unexpected when many people are out celebrating having just seen in the New Year!

This is the point at which I properly lost my faith in technical metadata!

In this period computers weren't quite as clever as they are today. When you switched them on they would ask you what date it was. If you didn't tell them the date, the PC would fall back to a system default ....which just so happens to be 1st January 1980.

I was interested to see Abby Adams from the Harry Ransom Center, University of Texas at Austin (also presenting at 'After the Digital Revolution') flag up some similarly suspicious dates on files in a digital archive held at her institution. Her dates differed just slightly to mine, falling on the evening of the 31st December 1979. Again, these dates looked unreliable as they were clearly out of line with the rest of the collection.

This is the same issue as mine, but the differences relate to the timezone. There is further explanation here highlighted by David Clipsham when I threw the question out to Twitter. Thanks!

Fragmentation

Another thing I had noticed about the files was the way that they were broken up into fragments. The script for a single episode was not saved as a single file but typically as 3 or 4 separate files. These files were named in such a way that it was clear that they were related and that the order that the files should be viewed or accessed was apparent - for example GINGER1, GINGER2 or PILOT and PILOTB.

This seemed curious to me - why not just save the document as a single file? The WordStar 4 manual didn't offer any clues but I found this piece of information in the WordStar 3 manual which describes how files should be split up to help manage the storage space on your diskettes:

From the WordStar 3 manual

Perhaps some of the files in the digital archive are from WordStar 3, or perhaps Marks and Gran had been previously using WordStar 3 and had just got into the habit of splitting a document into several files in order to ensure they didn't run out of space on their floppy disks.

I can not imagine working this way today! Technology really has come on a long way. Imagine trying to format, review or spell check a document that exists as several discrete files potentially sitting on different media!

Filenames

One thing that stands out when browsing the disks is that all the filenames are in capital letters. DOES ANYONE KNOW WHY THIS WAS THE CASE?

File names in this digital archive were also quite cryptic.This is the 1980’s so filenames conform to the 8.3 limit. Only 8 characters are allowed in a filename and it *may* also include a 3 character file extension.

Note that the file extension really is optional and WordStar version 4 doesn’t enforce the use of a standard file extension. Users were encouraged to use those last 3 characters of the file name to give additional context to the file content rather than to describe the file format itself.

Guidance on file naming from the WordStar 4 manual

Some of the tools and processes we have in place to analyse and process the files in our digital archives use the file extension information to help understand the format. The file naming methodology described here therefore makes me quite uncomfortable!

Marks and Gran tended not to use the file extension in this way (though there are a few examples of this in the archive). The majority of WordStar files have no extension at all. The real consistent use of file extensions related to their back up files.

Backup files

Scattered amongst the recovered data were a set of files that had the extension BAK. This clearly is a file extension that WordStar creates and uses consistently. These files clearly contained very similar content to other documents within the archive but typically with just a few differences in content. These files were clearly back up files of some sort but I wondered whether they had been created automatically or by the writers themselves.

Again the manual was helpful in moving forward my understanding on this:

Backup files from the WordStar 4 manual

This backup procedure is also summarised with the help of a diagram in the WordStar 3 manual:

The backup procedure from WordStar 3 manual

This does help explain why there were so many back up files in the archive. I guess the next question is 'should we keep them?'. It does seem that they are an artefact of the application rather than representing a conscious process by the writers to back their files up at a particular point in time and that may impact on their value. However, as discussed in a previous post on preserving Google documents there could be some benefit in preserving revision history (even if only partial).

...and finally

My understanding of these WordStar files has come on in leaps and bounds by doing a bit of research and in particular through finding copies of the manuals.

The manuals even explain why alongside the scripts within the digital archive we also have a disk that contains a copy of the WordStar application itself.

The very first step in the manual asks users to make a copy of the software:

I do remember having to do this sort of thing in the past! From WordStar 4 manual

Of course the manuals themselves are also incredibly useful in teaching me how to actually use the software. Keystroke based navigation is hardly intuitive to those of us who are now used to using a mouse, but I think that might be the subject of another blog post!

Jenny Mitcham, Digital Archivist

2 comments:

Arthur20 October 2017 at 15:45
If this was the 80s, they may have come from a CP/M system that only support capital letters for file names (https://en.wikipedia.org/wiki/CP/M#File_system)

The old DOS 8.3 file format comes from CP/M

Jenny Mitcham, Digital Archivist
Jenny Mitcham23 October 2017 at 10:21
Just had an email from someone (who used to teach WordStar at the University of York back in the eighties) and he has provided a link which mentions the upper case filenames of the eighties: https://en.wikipedia.org/wiki/8.3_filename
In summary, they are upper case because they just are!
Thanks!

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Friday, 20 October 2017

Understanding WordStar - check out the manuals!

Created and last modified dates

Fragmentation

Filenames

Backup files

...and finally

2 comments:

The sustainability of a digital preservation blog...

Twitter

Subscribe