Explore themes across historical documents

December 2016

This post describes a prototype tool I've designed for historians working with digitised historical texts. This timeline tool can be used to trace commentary and attitudes around themes through time across a collection of texts.

'typhoid carrier' timeline, Medical Officer of Health reports

As digitised historical documents and archives are increasingly available to humanities researchers:

  • with these new resources, there’s a need for interface designs serving the requirements of scholarly research → help historians with the tasks they already do, but at a larger scale
  • but there’s also potential for novel interfaces offering new ways of accessing and exploring these documents by exploiting the affordances of texts as digital data → enable scholars to ask new research questions of historical texts

Overviewing commentary across texts over time is an important activity early in the historical research process. This timeline tool is designed to offer humanities researchers a qualitative feel for changing narratives across texts rather than, as is more common with text visualisation tools, quantitative data. (This goal comes from my own experiences conducting historical research as a Master’s student and through conversations with scholars.)

I’ve built a prototype timeline tool demonstrated with a collection of 5,500 historical public health reports from London over a 140-year span. (The document collection is the Medical Officer of Health (or MOH) reports, digitised by the Wellcome Library. Optical Character Recognition and post-OCR text proofing has been applied to these texts, and the text data is of high accuracy).

Example page images from the Medical Officer of Health reports. Image credits Wellcome Library

Visualisation design

Imagine a user is interested in tracing the roles of nurses over time through these documents (remember I'm demonstrating the tool with public health reports!). They could search for the keyword 'nurse'.

This generates a visualisation of every instance of 'nurse' across all the documents with a snippet of surrounding text at legible size (see below). The snippets are horizontally centred by date, and vertically ordered from old to new.

'nurse' timeline, MOH reports

Arranging text snippets by date: 'typhoid carrier' timeline, MOH reports

The user might be interested to know:

  • What sorts of things are being said in the text about nurses at particular times?
  • Is the context of keyword in the narrative of the documents similar over time?
  • Are there patterns in shifting descriptions, associations or use of vocabulary?
  • Are the document authors all saying similar things?

Each keyword instance is displayed within a snippet of context from the document of sufficient length to get a sense of how the term is being used. In this way, it is possible to see the context of what is being said around the keyword at a given time, and to quickly compare those contexts over time. The user can conduct a rich overview of what is being said around 'nurse' across time and, with new keyword searches, any other themes they are interested in.

The sloping shape also helps users get a sense of the volume and patterns of keyword occurrences. For instance, the sharp turn at the bottom of the resulting visualisation for 'putrid' (see below) reveals a sharp drop in frequency of use of this term 1920-onwards. Could this mean there was a decline in occurrences of putridity? Or maybe the language changed?

'putrid' timeline 1900-1972, MOH reports.

But what if the user wanted to know if the snippets from a given year originate from the same document, rather than many documents? A grey vertical bracket to the left of the snippets indicates when more than one snippet has been generated from the same document.

Brackets on 'nurse' timeline, MOH reports.

And if the user spots a snippet of text of particular interest and wishes to see its wider context in the document, each text snippet is hyperlinked to its report on the Wellcome Library website. In this way, a user can trivially retrieve the complete text source as photographed pages.

Here are a couple of examples showing how powerful this tool design can be for exploring qualitative trends over time:

'Heroin' example

The visualisation when searching for ‘heroin’ (see below) includes an almost vertical column to the right, indicating a sudden surge in instances of the term 1960-onwards.

The snippets of surrounding text provide valuable context for a shift in how the term is used at different times. The two pre-1940 occurrences of ‘heroin’ talk about regulations and a surgical technique. The post-1960 text snippets, however, reveal that the narrative suddenly shifts to drug addiction/abuse and its links to criminality. Is the shape indicating a moral panic around drug use at this time?

'heroin' timeline, MOH reports.

Detail from 'heroin' timeline post-1960s, MOH reports.

'Blitz' example

The resulting visualisation for ‘blitz’ (see below) also has a distinct shape. There are sparse pre-WWII instances of the term (names and an instrument for stunning animals). Only in the strong column of snippets from 1940 does ‘blitz’ refer to the Nazi air raids on Britain starting that year.

A user can also observe the gradual adoption of the word into accepted language. At first it appears within quotation marks: “our most severe time was during the "blitz" on the London area”. Over time, instances of use without quotation marks increase and become the norm as the word is accepted: “demands for housing by a returning pre-blitz population”. At the bottom of the slope, by the 50s and 60s, the term is even used metaphorically: “what has been described as the "bed bug blitz" took place”.

'blitz' timeline, MOH reports.

Detail from 'blitz' timeline, MOH reports.

To sum up, by mapping enough context to make sense of keyword instances against time this novel visualisation design can be used to overview commentary across a collection of documents. And, while the tool was designed with historians in mind, there are other situations where it could be useful for analysing texts over time: for example, analysing social media data or reviewing transcripts from interviews conducted over a time span.

For a more detailed write-up of this work (including feedback from historians) or to share any thoughts, do send me an email.

[ Technical details: the front-end of the tool is built in JavaScript and d3.js. The back-end is an Elasticsearch index of the MOH reports raw text. ]