This post describes a prototype tool I've designed for historians working with digitised historical texts. This timeline tool can be used to trace commentary and attitudes around themes through time across a collection of texts.
As digitised historical documents and archives are increasingly available to humanities researchers:
Overviewing commentary across texts over time is an important activity early in the historical research process. This timeline tool is designed to offer humanities researchers a qualitative feel for changing narratives across texts rather than, as is more common with text visualisation tools, quantitative data. (This goal comes from my own experiences conducting historical research as a Master’s student and through conversations with scholars.)
I’ve built a prototype timeline tool demonstrated with a collection of 5,500 historical public health reports from London over a 140-year span. (The document collection is the Medical Officer of Health (or MOH) reports, digitised by the Wellcome Library. Optical Character Recognition and post-OCR text proofing has been applied to these texts, and the text data is of high accuracy).
Imagine a user is interested in tracing the roles of nurses over time through these documents (remember I'm demonstrating the tool with public health reports!). They could search for the keyword 'nurse'.
This generates a visualisation of every instance of 'nurse' across all the documents with a snippet of surrounding text at legible size (see below). The snippets are horizontally centred by date, and vertically ordered from old to new.
The user might be interested to know:
Each keyword instance is displayed within a snippet of context from the document of sufficient length to get a sense of how the term is being used. In this way, it is possible to see the context of what is being said around the keyword at a given time, and to quickly compare those contexts over time. The user can conduct a rich overview of what is being said around 'nurse' across time and, with new keyword searches, any other themes they are interested in.
The sloping shape also helps users get a sense of the volume and patterns of keyword occurrences. For instance, the sharp turn at the bottom of the resulting visualisation for 'putrid' (see below) reveals a sharp drop in frequency of use of this term 1920-onwards. Could this mean there was a decline in occurrences of putridity? Or maybe the language changed?
But what if the user wanted to know if the snippets from a given year originate from the same document, rather than many documents? A grey vertical bracket to the left of the snippets indicates when more than one snippet has been generated from the same document.
And if the user spots a snippet of text of particular interest and wishes to see its wider context in the document, each text snippet is hyperlinked to its report on the Wellcome Library website. In this way, a user can trivially retrieve the complete text source as photographed pages.
Here are a couple of examples showing how powerful this tool design can be for exploring qualitative trends over time:
The visualisation when searching for ‘heroin’ (see below) includes an almost vertical column to the right, indicating a sudden surge in instances of the term 1960-onwards.
The snippets of surrounding text provide valuable context for a shift in how the term is used at different times. The two pre-1940 occurrences of ‘heroin’ talk about regulations and a surgical technique. The post-1960 text snippets, however, reveal that the narrative suddenly shifts to drug addiction/abuse and its links to criminality. Is the shape indicating a moral panic around drug use at this time?
The resulting visualisation for ‘blitz’ (see below) also has a distinct shape. There are sparse pre-WWII instances of the term (names and an instrument for stunning animals). Only in the strong column of snippets from 1940 does ‘blitz’ refer to the Nazi air raids on Britain starting that year.
A user can also observe the gradual adoption of the word into accepted language. At first it appears within quotation marks: “our most severe time was during the "blitz" on the London area”. Over time, instances of use without quotation marks increase and become the norm as the word is accepted: “demands for housing by a returning pre-blitz population”. At the bottom of the slope, by the 50s and 60s, the term is even used metaphorically: “what has been described as the "bed bug blitz" took place”.
To sum up, by mapping enough context to make sense of keyword instances against time this novel visualisation design can be used to overview commentary across a collection of documents. And, while the tool was designed with historians in mind, there are other situations where it could be useful for analysing texts over time: for example, analysing social media data or reviewing transcripts from interviews conducted over a time span.
For a more detailed write-up of this work (including feedback from historians) or to share any thoughts, do send me an email.