Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

14 May 2020

Searching eTheses for the openVirus project

This is a guest post by Andy Jackson (@anjacks0n), Technical Lead for the UK Web Archive and enthusiastic data-miner.

Introduction

The COVID-19 outbreak is an unprecedented global crisis that has prompted an unprecedented global response. I’ve been particularly interested in how academic scholars and publishers have responded:

It’s impressive how much has been done in such a short time! But I also saw one comment that really stuck with me:

“Our digital libraries and archives may hold crucial clues and content about how to help with the #covid19 outbreak: particularly this is the case with scientific literature. Now is the time for institutional bravery around access!”
– @melissaterras

Clearly, academic scholars and publishers are already collaborating. What could digital libraries and archives do to help?

Scale, Audience & Scope

Almost all the efforts I’ve seen so far are focused on helping scientists working on the COVID-19 response to find information from publications that are directly related to coronavirus epidemics. The outbreak is much bigger than this. In terms of scope, it’s not just about understanding the coronavirus itself. The outbreak raises many broader questions, like:

  • What types of personal protective equipment are appropriate for different medical procedures?
  • How effective are the different kinds of masks when it comes to protecting others?
  • What coping strategies have proven useful for people in isolation?

(These are just the examples I’ve personally seen requests for. There will be more.)

Similarly, the audience is much wider than the scientists working directly on the COVID-19 response. From medical professions wanting to know more about protective equipment, to journalists looking for context and counter-arguments.

As a technologist working at the British Library, I felt like there must be some way I could help this situation. Some way to help a wider audience dig out any potentially relevant material we might hold?

The openVirus Project

While looking out for inspiration, I found Peter Murray-Rust’s openVirus project. Peter is a vocal supporter of open source and open data, and had launched an ambitious attempt to aggregate information relating to viruses and epidemics from scholarly publications.

In contrast to the other efforts I’d seen, Peter wanted to focus on novel data-mining methods, and on pulling in less well-known sources of information. This dual focus on text analysis and on opening up underutilised resources appealed to me. And I already had a particular resource in mind…

EThOS

Of course, the British Library has a very wide range of holdings, but as an ex-academic scientist I’ve always had a soft spot for EThOS, which provides electronic access to UK theses.

Through the web interface, users can search the metadata and abstracts of over half a million theses. Furthermore, to support data mining and analysis, the EThOS metadata has been published as a dataset. This dataset includes links to institutional repository pages for many of the theses.

Although doctoral theses are not generally considered to be as important as journal articles, they are a rich and underused source of information, capable of carrying much more context and commentary than a brief article[1].

The Idea

Having identified EThOS as source of information, the idea was to see if I could use our existing UK Web Archive tools to collect and index the full-text of these theses, build a simple faceted search interface, and perform some basic data-mining operations. If that worked, it would allow relevant theses to be discovered and passed to the openVirus tools for more sophisticated analysis.

Preparing the data sources

The links in the EThOS dataset point to the HTML landing-page for each theses, rather than to the full text itself. To get to the text, the best approach would be to write a crawler to find the PDFs. However, it would take a while to create something that could cope with the variety of ways the landing pages tend to be formatted. For machines, it’s not always easy to find the link to the actual theses!

However, many of the universities involved have given the EThOS team permission to download a copy of their theses for safe-keeping. The URLs of the full-text files are only used once (to collect each thesis shortly after publication), but have nevertheless been kept in the EThOS system since then. These URLs are considered transient (i.e. likely to ‘rot’ over time) and come with no guarantees of longer-term availability (unlike the landing pages), so are not included in the main EThOS dataset. Nevertheless, the EThOS team were able to give me the list of PDF URLs, making it easier to get started quickly.

This is far from ideal: we will miss theses that have been moved to new URLs, and from universities that do not take part (which, notably, includes Oxford and Cambridge). This skew would be avoided if we were to use the landing-page URLs provided for all UK digital theses to crawl the PDFs. But we need to move quickly.

So, while keeping these caveats in mind, the first task was to crawl the URLs and see if the PDFs were still there…

Collecting the PDFs

A simple Scrapy crawler was created, one that could read the PDF URLs and download them without overloading the host repositories. The crawler itself does nothing with them, but by running behind warcprox the web requests and responses (including the PDFs) can be captured in the standardised Web ARChive (WARC) format.

For 35 hours, the crawler attempted to download the 130,330 PDF URLs. Quite a lot of URLs had already changed, but 111,793 documents were successfully downloaded. Of these, 104,746 were PDFs.

All the requests and responses generated by the crawler were captured in 1,433 WARCs each around 1GB in size, totalling around 1.5TB of data.

Processing the WARCs

We already have tools for handling WARCs, so the task was to re-use them and see what we get. As this collection is mostly PDFs, Apache Tika and PDFBox are doing most of the work, but the webarchive-discovery wrapper helps run them at scale and add in additional metadata.

The WARCs were transferred to our internal Hadoop cluster, and in just over an hour the text and associated metadata were available as about 5GB of compressed JSON Lines.

A Legal Aside

Before proceeding, there’s legal problem that we need to address. Despite being freely-available over the open web, the rights and licenses under which these documents are being made available can be extremely varied and complex.

There’s no problem gathering the content and using it for data mining. The problem is that there are limitations on what we can redistribute without permission: we can’t redistribute the original PDFs, or any close approximation.

However, collections of facts about the PDFs are fine.

But for the other openVirus tools to do their work, we need to be able to find out what each thesis are about. So how can we make this work?

One answer is to generate statistical summaries of the contents of the documents. For example, we can break the text of each document up into individual words, and count how often each word occurs. These word frequencies are a no substitute for the real text, but are redistributable and suitable for answering simple queries.

These simple queries can be used to narrow down the overall dataset, picking out a relevant subset. Once the list of documents of interest is down to a manageable size, an individual researcher can download the original documents themselves, from the original hosts[2]. As the researcher now has local copies, they can run their own tools over them, including the openVirus tools.

Word Frequencies

second, simpler Hadoop job was created, post-processing the raw text and replacing it with the word frequency data. This produced 6GB of uncompressed JSON Lines data, which could then be loaded into an instance of the Apache Solr search tool [3].

While Solr provides a user interface, it’s not really suitable for general users, nor is it entirely safe to expose to the World Wide Web. To mitigate this, the index was built on a virtual server well away from any production systems, and wrapped with a web server configured in a way that should prevent problems.

The API this provides (see the Solr documentation for details) enables us to find which theses include which terms. Here are some example queries:

This is fine for programmatic access, but with a little extra wrapping we can make it more useful to more people.

APIs & Notebooks

For example, I was able to create live API documentation and a simple user interface using Google’s Colaboratory:

Using the openVirus EThOS API

Google Colaboratory is a proprietary platform, but those notebooks can be exported as more standard Jupyter Notebooks. See here for an example.

Faceted Search

Having carefully exposed the API to the open web, I was also able to take an existing browser-based faceted search interface and modify to suite our use case:

EThOS Faceted Search Prototype

Best of all, this is running on the Glitch collaborative coding platform, so you can go look at the source code and remix it yourself, if you like:

EThOS Faceted Search Prototype – Glitch project

Limitations

The main limitation of using word-frequencies instead of full-text is that phrase search is broken. Searching for face AND mask will work as expected, but searching for “face mask” doesn’t.

Another problem is that the EThOS metadata has not been integrated with the raw text search. This would give us a much richer experience, like accurate publication years and more helpful facets[4].

In terms of user interface, the faceted search UI above is very basic, but for the openVirus project the API is likely to be of more use in the short term.

Next Steps

To make the search more usable, the next logical step is to attempt to integrate the full-text search with the EThOS metadata.

Then, if the results look good, we can start to work out how to feed the results into the workflow of the openVirus tool suite.

 


1. Even things like negative results, which are informative but can be difficult to publish in article form. ↩︎

2. This is similar data sharing pattern used by Twitter researchers. See, for example, the DocNow Catalogue. ↩︎

3. We use Apache Solr a lot so this was the simplest choice for us. ↩︎

4. Note that since writing this post, this limitation has been rectified. ↩︎

 

07 May 2020

How to make art when we’re working apart

Like many at the British Library, the last couple of months has seen my role as a Senior Imaging Support Officer drastically change. While working from home, it has been impossible for me to carry on my normal digitisation work. Despite this, the time spent at home has reminded me why the work I do can be so important. Digitised collections have become essential during this time in allowing institutions to carry on engaging with the public at a time when their doors are closed. The potential and opportunity accessing heritage collections from the comfort of your own home can bring is being highlighted every day.

Research and academic use of online collections is recognised. However, the vast creative potential is still not always identified by users and institutions. This is something I have been interested in for a long time and I am always keen to produce work that changes people’s perceptions of how people can use online collections. The Hack Days I have taken part in over the last couple of years at the British Library have given me a chance to explore this, producing creative responses to the Qatar Digital Library. Whether through zine making, print making or colourisation, I have been able to produce this work thanks to the digitised material made available for free.

The format and layout of websites designed to showcase digitised collections is normally more geared towards researchers and academics. When using online collections creatively, I want to work with a website that is image focused rather than text based. As a result, I have found institution’s uploading to Flickr to be the easiest to use. The Library of Congress, The National Archives and The National Archives of Estonia are just some of the accounts I have used in the past. This meant that when I found myself working from home, one of the first things I wanted to do was work with the British Library’s Flickr account. With over a million images uploaded in 2013, there are some incredible images to use and I have long recognised its potential for use in creative projects. All of these images have been uploaded without copyright restrictions so that they can be used by anyone for free. Yes - including commercially! The images are taken from the pages of 17th, 18th and 19th century books. Digitised by Microsoft, they gifted the scanned images to the British Library allowing them to be released into the Public Domain.You can read more about this collection here.

It’s also important to note this certainly isn’t the first time the British Library Flickr images have been used creatively. Among others, artists like David Normal, Michael Takeo Magruder, Mario Klingemann and Jiayi Chong have all produced very different work using the book illustrations from this collection.

Animated collage created by Jiayi Chong, using Creature, a Cutting-edge 2D Animation Software

Getting Started

I knew I wanted to produce a series of collages but without access to a printer I was unable to print any of the images out. However, downloading the images allowed me to begin making digital collages. I used both Microsoft Word and Adobe Photoshop to create them. The collages I produced were in a similar vein to some of the images and artwork included in zines I have made in the past. These were normally made to illustrate a piece of text or data. However, this time I wanted to do something purely for fun, bringing random images together with lots of colour to create weird and wonderful images.

I began scrolling through the photo stream, collecting images into a folder on my computer. I enjoyed the serendipitous nature the Flickr interface brings and there were so many images I found just by chance rather than by using the search bar. There are also useful albums under different themes, including cats and cycling, which helped during my initial search. You can see some of the collages I created below.

4 collage images, a woman holding a large pair of scissors, a classical sculpture of a man holding a snake, an angel on a map and a profile of a man's face in front of a mountain range
Collages created by Hannah Nagle using the British Library's Flickr image collection

As I was creating them, I realised that not only were they enjoyable to make, the process was giving me something to put my mind to. It took me away from the stress and the uncertainty of everything going on around me and once again proved the benefits creativity has on mental health. Studies have shown that getting creative can have a similar affect to meditating. It helps you to relax, lifts your mood and allows you to express yourself in ways you wouldn’t normally. Additionally, there’s no wrong or right when making art. It can be a bit of freedom without any pressure or stress - something so many of us need right now.

The Guide

Having realised and experienced the benefits of this myself, I began putting together a guide on how to make collages using images from the British Library Flickr page. ‘How to make art when we’re working apart’ contains simple instructions on how to produce a collage, both digital and physical. If you don’t have access to printer, the guide focuses on using Microsoft Word to produce a digital collage. I am also currently producing a second guide for anyone interested in gaining a basic understanding of how to use Adobe Photoshop. Included is a list of artists who use collage in their work, along with a simple ‘formula’ of how to make a collage for those who are feeling a bit stuck. The guide is for anyone interested in creating a bit of art while stuck indoors. Even if you’re not creatively minded, it’s easy to use and ideal for anyone looking to give themselves a small distraction from the situation we all find ourselves in.

Cover image for "How to make art when we’re working apart" guide featuring a drawing of a row of houses
Cover image for "How to make art when we’re working apart" guide

Click here to download the "How to make art when we're working apart" guide.

This is a guest post by Hannah Nagle (@hannagle) who works in the Imaging Team for the British Library Qatar National Library PartnershipYou can follow the British Library Qatar National Library Partnership on Twitter at @BLQatar.

06 May 2020

What did you call me?!

This guest blog post is by Michael St John-McAlister, Western Manuscripts Cataloguing Manager at the British Library.

The coronavirus lockdown is a good opportunity to carry out some of those house-keeping tasks that would never normally get done (and I do not mean re-grouting the bathroom). Anticipating that we would be sent home and knowing I would be limited in the work I could do at home, I asked IT to download all the name authorities in our archives and manuscripts cataloguing system (all 324,106 of them) into a spreadsheet that I would be able to work on at home.

Working through the names, looking for duplicate records, badly-formed names, and typos, my eye was caught by the variety of epithets that have been used over 267 years of manuscripts cataloguing.

For the uninitiated, an epithet is part of a name authority or index term, in the form of a short descriptive label, used to help distinguish people of the same name. Imagine you are writing a biography of a John Smith. You search the Explore Archives and Manuscripts catalogue for any relevant primary sources, only to find three entries for Smith, John, 1800-1870. How would you know which John Smith’s letters and diaries to call up for your research? (Humour me: let us assume our three Smiths all have the same vital dates, unlikely I know, and that the papers are not fully catalogued so the catalogue descriptions of the papers themselves cannot help you decide as they would normally).

Now imagine your catalogue search for John Smith turned up the following entries instead:

Smith, John, 1810-1880, baker

Smith, John, 1810-1880, butcher

Smith, John, 1810-1880, candlestick maker

Instantly, you can see which of the three John Smiths is relevant to your ground-breaking research into the history of candlestick making in the West Riding in the early Victorian era.

The epithet is one element of a well-formed index term and it tends to be a position in life (King of Jordan; Queen of Great Britain and Ireland), a former or alternative name (née Booth; pseudonym ‘Jane Duncan’), a career or occupation (soldier; writer), or a relationship to another person (husband of Rebecca West; son of Henry VII).

Scrolling through the spreadsheet, in amongst the soldiers, writers, composers, politicians, Earls of this, and Princesses of that, I stumbled across a fascinating array of epithets, some obvious, some less so.

There are plenty of examples of the perhaps slightly everyday, but important all the same: bricklayer; plumber; glazier; carpenter. As well as the trades common to us today, some of the trades used as epithets seem very much of their times: button-maker; coach and harness maker; dealer in fancy goods; butterman; copperplate printer; hackney coachman.

Those from the edges of law-abiding society loom large, with people described as burglar and prisoner (presumably the former led to his becoming the latter), convict, assassin, murderer, pickpocket, forger, felon, regicide, and rioter. There are even 50 pirate’s wives in the catalogue (but only seven pirates!). The victims of conflict and persecution also crop up, including prisoner of war, martyr, and galley slave, as well as, occasionally, their tormentors (inquisitor, head jailer, arms dealer).

Some of the epithets have a distinct air of mystery about them (codebreaker; conspirator; spy; alchemist; child prodigy; fugitive; renegade priest; hermit; recluse; mystic; secret agent; intercept operator; dream interpreter) whilst others exude a certain exoticism or loucheness: casino owner; dance band leader; acrobat; mesmerist; jazz poet; pearl fisher; showman; diamond tycoon; charioteer.

Many of the epithets relate to services provided to others. Where would the great and the good be without people to drive them around, manage their affairs, assist in their work, take their letters, make their tea, cook their food, and treat them when they fall ill. So, Marcel Proust’s chauffeur, Charlie Chaplin’s business manager, Gustav Holsts’s many amanuenses, Laurence Olivier’s secretary, Virginia Woolf’s charwoman, as well as her cook, and HG Wells’s physician all make appearances in the catalogue.

Then there are the epithets which are less than useful and do not really enlighten us about their subjects: appraiser (of what?); connoisseur (ditto); purple dyer (why only purple?); political adventurer; official. The less said about the usefulness, or otherwise, of epithets such as Mrs, widow, Mr Secretary, and Libyan the better.  Some fall into the ‘What is it?’ category: coastwaiter (and landwaiter, for that matter); pancratiast; paroemiographer; trouvère.*

Another interesting category contains epithets of people with more than one string to their bow. One’s mind boggles at the career path of the ‘music scribe and spy’, or the ‘inn-keeper, gunner, and writer on mathematics’; is awed by the variety of skills of the ‘composer and physician’; marvels at the multi-talented ‘army officer, footballer, and Conservative politician’; and wonders what occurred in someone’s life to earn them the epithet ‘coach-painter and would-be assassin’.

As we have discovered, an epithet can help identify individuals, thus making the reader’s life easier, but if all else fails, and it is not possible to say who someone is, you can always say who they are not. Hence one of our manuscripts cataloguing forbears leaving us with Barry, Garrett; not Garrett Barry of Lisgriffin, county Cork as an index term.

  • a type of Customs officer; ditto; a participant in a boxing or wrestling contest, esp. in ancient Greece; a writer or collector of proverbs; a medieval epic poet.

This guest blog post is by Michael St John-McAlister, Western Manuscripts Cataloguing Manager at the British Library.

 

04 May 2020

VisibleWikiWomen 2020 Campaign

May the 4th be with you!

When I think of Star Wars, one of the first characters that comes to mind, is brave, quick witted and feisty Princess Leia, General of the Resistance, played by the unforgettable Carrie Fisher. Leia is a role model for nerdy girls throughout the galaxy! Sadly I don’t have any photos of the time I went a friend’s fancy dress party as Leia, wearing a long floaty white high necked gown, and sporting the cinnamon bun hairstyle (this was when I had much longer hair), but I remember having an absolute blast pretending to be one of my heros for an evening :-)

However, we don’t have to look as far as the fictional planet of Alderaan to find female heros and role models. #VisibleWikiWomen is an annual campaign to make all women, especially black, brown, indigenous and trans women, visible on Wikipedia and the broader internet. This global campaign brings together Wikimedians, feminist and women’s organisations, and cultural institutions in a worldwide effort to reduce the gender gap and the lack of images of women in the biggest online free encyclopedia.

#VisibleWikiWomen campaign logo image; silhouette of a woman taking a photograph with a camera
#VisibleWikiWomen campaign logo image

Due to COVID-19, the world is going through a collective experience of deep anxiety and uncertainty. It is a deeply important time for collective solidarity and support. The work of female artists, actresses, writers and musicians is entertaining us and lifting our spirits during the long days of lockdown. However, we often miss “seeing” and appreciating the women who are part of the critical infrastructure of care that keeps us going in times like this: health workers, carers, cashiers, cleaners, cooks, activists, scientists, policy-makers and so many more. 

Next weekend, 9-12 May 2020, is the #VisibleWikiWomen Edit-a-thon: Women in critical infrastructures of care. To acknowledge, affirm, support and raise awareness of these incredible women. During a time where we isolate ourselves physically, #VisibleWikiWomen is an opportunity where we can come together virtually, to introduce and celebrate online, the faces, work, and wisdom of women who have often been missing from the world’s shared knowledge and histories. 

The goal of this online event is to gather and upload, good quality images of women, which are in the public domain, or under free license, to Wikimedia Commons (the image file repository for Wikipedia) under the VisibleWikiWomen category and have fun! These images could be photographs or drawings of women, as well as images of their work, with proper consent. If you are not sure where to start, there will be some online training sessions on how to upload images to Commons and also group conversations, where participants can ask questions and share their experiences participating in the campaign.

The Edit-a-thon is being organised by:

Schedule for the online event is:

  • May 9 (Saturday) - online training at 12pm UTC (English session) and 3pm UTC (Spanish session). Each session will be a 1:30 hour video-call
  • From May 9 to May 12 - uploading images to Wikimedia Commons at each participant's preferred time
  • May 11 (Monday) - Q&A online session for troubleshooting and discussing issues, at 2pm UTC (English session) and 5pm UTC (Spanish session)

Many other organisations have joined as institutional partners, including Wikimedia UK and the International Image Interoperability Framework (IIIF) Consortium, who have asked their member institutions, including the British Library, to identify and encourage reuse of openly licensed digitised images that fit the criteria for this campaign. For more information, check out the “Guide for Cultural and Memory Institutions to make women visible on Wikipedia” created by Whose Knowledge?. If you use any digitised British Library images, please let us know (by emailing digitalresearch(at)bl(dot)uk), as we always love to hear how people have used our collections.

Logo images of the VisibleWikiWomen partner organisations
Logos of some of the VisibleWikiWomen partner organisations

In the British Library we have some experience of running Wikipedia edit-a-thons to help address the gender imbalance; we have held a number of successful Wiki-Food and (mostly) Women edit-a-thons, led by Polly Russell. Also, for International Women’s Day in 2019, the British Library & Qatar National Library Partnership, organised an Imaging Hack Day, which produced interactive photographs, story maps and a zine.

People editing Wikipedia pages
Photograph of the second British Library Wiki-Food and (mostly) Women edit-a-thon on 6th July 2015

Our landmark exhibition, Unfinished Business: The Fight for Women’s Rights, was due to open in the Library last month. Unfortunately due to the COVID-19 lockdown, the on-site exhibition is postponed. However, in the meantime, we are exploring women’s rights via our online channels, alongside writers, artists and activists. Our first offering is a tribute to writer Mary Wollstonecraft, a podcast featuring historian Dan Snow, Lady Hale, campaigner Bee Rowlatt, scholar Professor Emma Clery, actor Saffron Burrows and musician Jade Ellins, paying homage to the foremother of feminism.

Good luck to all those taking part in the #VisibleWikiWomen 2020 campaign, May the FORCE be with you!

This post is by Jedi Librarian, Jocasta Nu, sorry I just wanted to link to Wookieepedia! It is actually written by Digital Curator (which is a just as cool job title as a Jedi Librarian) Stella Wisdom (@miss_wisdom

24 April 2020

BL Labs Learning & Teaching Award Winners - 2019 - The Other Voice - RCA

Innovations in sound and art

Dr Matt Lewis, Tutor of Digital Direction and Dr Eleanor Dare, Reader of Digital Media both at the School of Communication, at the Royal College of Art and Mary Stewart Curator, Oral History and Deputy Director of National Life Stories at the British Library reflect on an ongoing and award-winning collaboration (posted on behalf of them by Mahendra Mahey, BL Labs Manager).

In spring 2019, based in both the British Library and the Royal College of Art School of Communication, seven students from the MA Digital Direction course participated in an elective module entitled The Other Voice. After listening in-depth to a selection of oral history interviews, the students learnt how to edit and creatively interpret oral histories, gaining insight into the complex and nuanced ethical and practical implications of working with other people’s life stories. The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies.

The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies. The module was led by Eleanor Dare (Head of Programme for MA Digital Direction, RCA), Matt Lewis (Sound Artist and Musician and RCA Tutor) and Mary Stewart (British Library Oral History Curator). We were really pleased that over 100 British Library staff took the time to come to the showcase, engage with the artwork and discuss their responses with the students.

Eleanor reflects:

The students have benefited enormously from this collaboration, gaining a deeper understanding of the ethics of editing, the particular power of oral history and of course, the feedback and stimulation of having a show in the British Library.”

We were all absolutely delighted that the Other Voice group were the winners of the BL Labs Teaching and Learning Award 2019, presented in November 2019 at a ceremony at the British Library Knowledge Centre.  Two students, Karthika Sakthivel and Giulia Brancati, also showcased their work at the 2019 annual Oral History Society Regional Network Event at the British Library - and contributed to a wide ranging discussion reflecting on their practice and the power of oral history with a group of 35 oral historians from all over the UK.  The collaboration has continued as Mary and Matt ran ‘The Other Voice’ elective in spring 2020, where the students adapted to the Covid-19 Pandemic, producing work under lockdown, from different locations around the world. 

Here is just a taster of the amazing works the students created in 2019, which made them worthy winners of the BL Labs Teaching and Learning Award 2019.

Karthika Sakthivel and Giulia Brancati were both inspired by the testimony of Irene Elliot, who was interviewed by Dvora Liberman in 2014 for an innovative project on Crown Court Clerks. They were both moved by Irene’s rich description of her mother’s hard work bringing up five children in 1950s Preston.

On the way back by Guilia Brancati

Giulia created On the way back an installation featuring two audio points – one with excerpts of Irene’s testimony and another an audio collage inspired by Irene’s description. Two old fashioned telephones played the audio, which the listener absorbed while curled up in an arm chair in a fictional front room. It was a wonderfully immersive experience.

Irene-eilliot
Irene Elliot's testimony interwoven with the audio collage (C1674/05)
Audio collage and photography © Giulia Brancati.
Listen here

Giulia commented:

In a world full of noise and overwhelming information, to sit and really pay attention to someone’s personal story is an act of mindful presence. This module has been continuous learning experience in which ‘the other voice’ became a trigger for creativity and personal reflection.”

Memory Foam by Karthika Sakthivel

Inspired by Irene’s testimony Karthika created a wonderful sonic quilt, entitled Memory Foam.

Karthika explains,

There was power in Irene’s voice, enough to make me want to sew - something I’d never really done on my own before. But in her story there was comfort, there was warmth and that kept me going.”

Illustrated with objects drawn from Irene's memories, each square of the patchwork quilt encased conductive fabric that triggered audio clips. Upon touching each square, the corresponding story would play.

Karthika further commented,

The initial visitor interactions with the piece gave me useful insights that enabled me to improve the experience in real time by testing alternate ways of hanging and displaying the quilt. After engaging with the quilt guests walked up to me with recollections of their own mothers and grandmothers – and these emotional connections were deeply rewarding.”

Karthika, Giulia and the whole group were honoured that Irene and her daughter Jayne travelled from Preston to come to the exhibition, Karthika:

"It was the greatest honour to have her experience my patchwork of her memories. This project for me unfurled yards of possibilities, the common thread being - the power of a voice.”

Memory-foam
Irene and her daughter Jayne experiencing Memory Foam © Karthika Sakthivel.
Irene's words activated by touching the lime green patch with lace and a zip (top left of the quilt) (C1674/05)
Listen here

Meditations in Clay by James Roadnight and David Sappa

Listening to ceramicist Walter Keeler's memories of making a pot inspired James Roadnight and David Sappa to travel to Cornwall and record new oral histories to create Meditations in Clay. This was an immersive documentary that explores what we, as members of this modern society, can learn from the craft of pottery - a technology as old as time itself. The film combines interviews conducted at the Bernard Leach pottery with audio-visual documentation of the St Ives studio and its rugged Cornish surroundings.


Meditations in Clay, video montage © James Roadnight and David Sappa.

Those attending the showcase were bewitched as they watched the landscape documentary on the large screen and engaged with the selection of listening pots, which when held to the ear played excerpts of the oral history interviews.

James and David commented,

This project has taught us a great deal about the deep interview techniques involved in Oral History. Seeing visitors at the showcase engage deeply with our work, watching the film and listening to our guided meditation for 15, 20 minutes at a time was more than we could have ever imagined.”

Beyond Form

Raf Martins responded innovatively to Jonathan Blake’s interview describing his experiences as one of the first people in the UK to be diagnosed with HIV. In Beyond Form Raf created an audio soundscape of environmental sounds and excerpts from the interview which played alongside a projected 3D hologram based on the cellular structure of the HIV virus. The hologram changed form and shape when activated by the audio – an intriguing visual artefact that translated the vibrant individual story into a futuristic media.

Beyond-form
Jonathan Blake's testimony interwoven with environmental soundscape (C456/104) Soundscape and image © Raf Martins.
Listen here

Stiff Upper Lip

Also inspired by Jonathan Blake’s interview was the short film Stiff Upper Lip by Kinglsey Tao which used clips of the interview as part of a short film exploring sexuality, identity and reactions to health and sickness.

Donald in Wonderland

Donald Palmer’s interview with Paul Merchant contained a wonderful and warm description of the front room that his Jamaican-born parents ‘kept for best’ in 1970s London. Alex Remoleux created a virtual reality tour of the reimagined space, entitled Donald in Wonderland, where the viewer could point to various objects in the virtual space and launch the corresponding snippet of audio.

Alex commented,

I am really happy that I provided a Virtual Reality experience, and that Donald Palmer himself came to see my work. In the picture below you can see Donald using the remote in order to point and touch the objects represented in the virtual world.”

Donald-wonderland
Donald Palmer describes his parents' front room (C1379/102)
Interviewee Donald Palmer wearing the virtual reality headset, exploring the virtual reality space (pictured) created by Alex Remoleux.
Listen here

Showcase at the British Library

The reaction to the showcase from the visitors and British Library staff was overwhelmingly positive, as shown by this small selection of comments. We were incredibly grateful to interviewees Irene and Donald for attending the showcase too. This was an excellent collaboration: RCA students and staff alike gained new insights into the significance and breadth of the British Library Oral History collection and the British Library staff were bowled over by the creative responses to the archival collection.

Feedback
Examples of feedback from British Library showcase of 'The Other Voice' by Royal College of Art

With thanks to the MA Other Voice cohort Giulia Brancati, Raf Martins, Alexia Remoleux, James Roadnight, Karthika Sakthivel, David Sappa and Kingsley Tao, RCA staff Eleanor Dare and Matt Lewis & BL Oral History Curator Mary Stewart, plus all the interviewees who recorded their stories and the visitors who took the time to attend the showcase.

21 April 2020

Clean. Migrate. Validate. Enhance. Processing Archival Metadata with Open Refine

This blogpost is by Graham Jevon, Cataloguer, Endangered Archives Programme 

Creating detailed and consistent metadata is a challenge common to most archives. Many rely on an army of volunteers with varying degrees of cataloguing experience. And no matter how diligent any team of cataloguers are, human error and individual idiosyncrasies are inevitable.

This challenge is particularly pertinent to the Endangered Archives Programme (EAP), which has hitherto funded in excess of 400 projects in more than 90 countries. Each project is unique and employs its own team of one or more cataloguers based in the particular country where the archival content is digitised. But all this disparately created metadata must be uniform when ingested into the British Library’s cataloguing system and uploaded to eap.bl.uk.

Finding an efficient, low-cost method to process large volumes of metadata generated by hundreds of unique teams is a challenge; one that in 2019, EAP sought to alleviate using freely available open source software Open Refine – a power tool for processing data.

This blog highlights some of the ways that we are using Open Refine. It is not an instructional how-to guide (though we are happy to follow-up with more detailed blogs if there is interest), but an introductory overview of some of the Open Refine methods we use to process large volumes of metadata.

Initial metadata capture

Our metadata is initially created by project teams using an Excel spreadsheet template provided by EAP. In the past year we have completely redesigned this template in order to make it as user friendly and controlled as possible.

Screenshot of spreadsheet

But while Excel is perfect for metadata creation, it is not best suited for checking and editing large volumes of data. This is where Open Refine excels (pardon the pun!), so when the final completed spreadsheet is delivered to EAP, we use Open Refine to clean, validate, migrate, and enhance this data.

WorkflowDiagram

Replicating repetitive tasks

Open Refine came to the forefront of our attention after a one-day introductory training session led by Owen Stephens where the key takeaway for EAP was that a sequence of functions performed in Open Refine can be copied and re-used on subsequent datasets.

ScreenshotofOpenRefineSoftware1

This encouraged us to design and create a sequence of processes that can be re-applied every time we receive a new batch of metadata, thus automating large parts of our workflow.

No computer programming skills required

Building this sequence required no computer programming experience (though this can help); just logical thinking, a generous online community willing to share their knowledge and experience, and a willingness to learn Open Refine’s GREL language and generic regular expressions. Some functions can be performed simply by using Open Refine’s built-in menu options. But the limits of Open Refine’s capabilities are almost infinite; the more you explore and experiment, the further you can push the boundaries.

Initially, it was hoped that our whole Open Refine sequence could be repeated in one single large batch of operations. The complexity of the data and the need for archivist intervention meant that it was more appropriate to divide the process into several steps. Our workflow is divided into 7 stages:

  1. Migration
  2. Dates
  3. Languages and Scripts
  4. Related subjects
  5. Related places and other authorities
  6. Uniform Titles
  7. Digital content validation

Each of these stages performs one or more of four tasks: clean, migrate, validate, and enhance.

Task 1: Clean

The first part of our workflow provides basic data cleaning. Across all columns it trims any white space at the beginning or end of a cell, removes any double spaces, and capitalises the first letter of every cell. In just a few seconds, this tidies the entire dataset.

Task 1 Example: Trimming white space (menu option)

Trimming whitespace on an individual column is an easy function to perform as Open Refine has a built in “Common transform” that performs this function.

ScreenshotofOpenRefineSoftware2

Although this is a simple function to perform, we no longer need to repeatedly select this menu option for each column of each dataset we process because this task is now part of the workflow that we simply copy and paste.

Task 1 Example: Capitalising the first letter (using GREL)

Capitalising the first letter of each cell is less straightforward for a new user as it does not have a built-in function that can be selected from a menu. Instead it requires a custom “Transform” using Open Refine’s own expression language (GREL).

ScreenshotofOpenRefineSoftware3


Having to write an expression like this should not put off any Open Refine novices. This is an example of Open Refine’s flexibility and many expressions can be found and copied from the Open Refine wiki pages or from blogs like this. The more you copy others, the more you learn, and the easier you will find it to adapt expressions to your own unique requirements.

Moreover, we do not have to repeat this expression again. Just like the trim whitespace transformation, this is also now part of our copy and paste workflow. One click performs both these tasks and more.

Task 2: Migrate

As previously mentioned, the listing template used by the project teams is not the same as the spreadsheet template required for ingest into the British Library’s cataloguing system. But Open Refine helps us convert the listing template to the ingest template. In just one click, it renames, reorders, and restructures the data from the human friendly listing template to the computer friendly ingest template.

Task 2 example: Variant Titles

The ingest spreadsheet has a “Title” column and a single “Additional Titles” column where all other title variations are compiled. It is not practical to expect temporary cataloguers to understand how to use the “Title” and “Additional Titles” columns on the ingest spreadsheet. It is much more effective to provide cataloguers with a listing template that has three prescriptive title columns. This helps them clearly understand what type of titles are required and where they should be put.

SpreadsheetSnapshot

The EAP team then uses Open Refine to move these titles into the appropriate columns (illustrated above). It places one in the main “Title” field and concatenates the other two titles (if they exist) into the “Additional Titles” field. It also creates two new title type columns, which the ingest process requires so that it knows which title is which.

This is just one part of the migration stage of the workflow, which performs several renaming, re-ordering, and concatenation tasks like this to prepare the data for ingest into the British Library’s cataloguing system.

Task 3: Validate

While cleaning and preparing the data for migration is important, it also vital that we check that the data is accurate and reliable. But who has the time, inclination, or eye stamina to read thousands of rows of data in an Excel spreadsheet? What we require is a computational method to validate data. Perhaps the best way of doing this is to write a bespoke computer program. This indeed is something that I am now working on while learning to write computer code using the Python language (look out for a further blog on this later).

In the meantime, though, Open Refine has helped us to validate large volumes of metadata with no programming experience required.

Task 3 Example: Validating metadata-content connections

When we receive the final output from a digitisation project, one of our most important tasks is to ensure that all of digital content (images, audio and video recordings) correlate with the metadata on the spreadsheet and vice versa.

We begin by running a command line report on the folders containing the digital content. This provides us with a csv file which we can read in Excel. However, the data is not presented in a neat format for comparison purposes.

SpreadsheetSnapshot2

Restructuring data ready for validation comparisons

For this particular task what we want is a simple list of all the digital folder names (not the full directory) and the number of TIFF images each folder contains. Open Refine enables just that, as the next image illustrates.

ScreenshotofOpenRefineSoftware4

Constructing the sequence that restructures this data required careful planning and good familiarity with Open Refine and the GREL expression language. But after the data had been successfully restructured once, we never have to think about how to do this again. As with other parts of the workflow, we now just have to copy and paste the sequence to repeat this transformation on new datasets in the same format.

Cross referencing data for validation

With the data in this neat format, we can now do a number of simple cross referencing checks. We can check that:

  1. Each digital folder has a corresponding row of metadata – if not, this indicates that the metadata is incomplete
  2. Each row of metadata has a corresponding digital folder – if not, this indicates that some digital folders containing images are missing
  3. The actual number of TIFF images in each folder exactly matches the number of images recorded by the cataloguer – if not this may indicate that some images are missing.

For each of these checks we use Open Refine’s cell.cross expression to cross reference the digital folder report with the metadata listing.

In the screenshot below we can see the results of the first validation check. Each digital folder name should match the reference number of a record in the metadata listing. If we find a match it returns that reference number in the “CrossRef” column. If no match is found, that column is left blank. By filtering that column by blanks, we can very quickly identify all of the digital folders that do not contain a corresponding row of metadata. In this example, before applying the filter, we can already see that at least one digital folder is missing metadata. An archivist can then investigate why that is and fix the problem.

ScreenshotofOpenRefineSoftware5

Task 4: Enhance

We enhance our metadata in a number of ways. For example, we import authority codes for languages and scripts, and we assign subject headings and authority records based on keywords and phrases found in the titles and description columns.

Named Entity Extraction

One of Open Refine’s most dynamic features is its ability to connect to other online databases and thanks to the generous support of Dandelion API we are able to use its service to identify entities such as people, places, organisations, and titles of work.

In just a few simple steps, Dandelion API reads our metadata and returns new linked data, which we can filter by category. For example, we can list all of the entities it has extracted and categorised as a place or all the entities categorised as people.

ScreenshotofOpenRefineSoftware6

Not every named entity it finds will be accurate. In the above example “Baptism” is clearly not a place. But it is much easier for an archivist to manually validate a list of 29 phrases identified as places, than to read 10,000 scope and content descriptions looking for named entities.

Clustering inconsistencies

If there is inconsistency in the metadata, the returned entities might contain multiple variants. This can be overcome using Open Refine’s clustering feature. This identifies and collates similar phrases and offers the opportunity to merge them into one consistent spelling.

ScreenshotofOpenRefineSoftware7

Linked data reconciliation

Having identified and validated a list of entities, we then use other linked data services to help create authority records. For this particular task, we use the Wikidata reconciliation service. Wikidata is a structured data sister project to Wikipedia. And the Open Refine reconciliation service enables us to link an entity in our dataset to its corresponding item in Wikidata, which in turn allows us to pull in additional information from Wikidata relating to that item.

For a South American photograph project we recently catalogued, Dandelion API helped identify 335 people (including actors and performers). By subsequently reconciling these people with their corresponding records in Wikidata, we were able to pull in their job title, date of birth, date of death, unique persistent identifiers, and other details required to create a full authority record for that person.

ScreenshotofOpenRefineSoftware8

Creating individual authority records for 335 people would otherwise take days of work. It is a task that previously we might have deemed infeasible. But Open Refine and Wikidata drastically reduces the human effort required.

Summary

In many ways, that is the key benefit. By placing Open Refine at the heart of our workflow for processing metadata, it now takes us less time to do more. Our workflow is not perfect. We are constantly finding new ways to improve it. But we now have a semi-automated method for processing large volumes of metadata.

This blog puts just some of those methods in the spotlight. In the interest of brevity, we refrained from providing step-by-step detail. But if there is interest, we will be happy to write further blogs to help others use this as a starting point for their own metadata processing workflows.

20 April 2020

BL Labs Research Award Winner 2019 - Tim Crawford - F-Tempo

Posted on behalf of Tim Crawford, Professorial Research Fellow in Computational Musicology at Goldsmiths, University of London and BL Labs Research Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Introducing F-TEMPO

Early music printing

Music printing, introduced in the later 15th century, enabled the dissemination of the greatest music of the age, which until that time was the exclusive preserve of royal and aristocratic courts or the Church. A vast repertory of all kinds of music is preserved in these prints, and they became the main conduit for the spread of the reputation and influence of the great composers of the Renaissance and early Baroque periods, such as Josquin, Lassus, Palestrina, Marenzio and Monteverdi. As this music became accessible to the increasingly well-heeled merchant classes, entirely new cultural networks of taste and transmission became established and can be traced in the patterns of survival of these printed sources.

Music historians have tended to neglect the analysis of these patterns in favour of a focus on a canon of ‘great works’ by ‘great composers’, with the consequence that there is a large sub-repertory of music that has not been seriously investigated or published in modern editions. By including this ‘hidden’ musical corpus, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time.

Online resources of music and how to read them

Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music is also available online in great quantity in the form of PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. These are a surrogate for paper-based books used in traditional musicology, but offer few advantages beyond convenience. What they don’t allow is full-text search, unlike the text-based online materials which are increasingly the subject of ‘distant reading’ in the digital humanities.

With good score images, Optical Music Recognition (OMR) programs can sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised symbols. The results often amount to musical gibberish, severely limiting the usefulness of OMR for creating large digital score collections. Our OMR program is Aruspix, which is highly reliable on good images, even when they have been digitised from microfilm.

Here is a screen-shot from Aruspix, showing part of the original page-image at the top, and the program’s best effort at recognising the 16th-century music notation below. It is not hard to see that, although the program does a pretty good job on the whole, there are not a few recognition errors. The program includes a graphical interface for correcting these, but we don’t make use of that for F-TEMPO for reasons of time – even a few seconds of correction per image would slow the whole process catastrophically.

The Aruspix user-interface
The Aruspix user-interface

 

 

Finding what we want – error-tolerant encoding

Although OMR is far from perfect, online users are generally happy to use computer methods on large collections containing noise; this is the principle behind the searches in Google Books, which are based on Optical Character Recognition (OCR).

For F-TEMPO, from the output of the Aruspix OMR program, for each page of music, we extract a ‘string’ representing the pitch-name and octave for the sequence of notes. Since certain errors (especially wrong or missing clefs or accidentals) affect all subsequent notes, we encode the intervals between notes rather than the notes themselves, so that we can match transposed versions of the sequences or parts of them. We then use a simple alphabetic code to represent the intervals in the computer.

Here is an example of a few notes from a popular French chanson, showing our encoding method.

A few notes from a Crequillon chanson, and our encoding of the intervals
A few notes from a Crequillon chanson, and our encoding of the intervals

F-TEMPO in action

F-TEMPO uses state-of-the-art, scalable retrieval methods, providing rapid searches of almost 60,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks are different. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content are usually well-matched.

Here is a screen-shot from the demo interface to F-TEMPO. The ‘query’ image is on the left, and searches are done by hitting the ‘Enter’ or ‘Return’ key in the normal way. The list of results appears in the middle column, with the best match (usually the query page itself) highlighted and displayed on the right. As other results are selected, their images are displayed on the right. Users can upload their own images of 16th-century music that might be in the collection to serve as queries; we have found that even photos taken with a mobile phone work well. However, don’t expect coherent results if you upload other kinds of image!

F-Tempo-User Interface
F-Tempo-User Interface

The F-TEMPO web-site can be found at: http://f-tempo.org

Click on the ‘Demo’ button to try out the program for yourself.

What more can we do with F-TEMPO?

Using the full-text search methods enabled by F-TEMPO’s API we might begin to ask intriguing questions, such as:

  • ‘How did certain pieces of music spread and become established favourites throughout Europe during the 16th century?’
  • ‘How well is the relative popularity of such early-modern favourites reflected in modern recordings since the 1950s?’
  • ‘How many unrecognised arrangements are there in the 16th-century repertory?’

In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we are finding more such unexpected concordances, and can sometimes identify the composers of works labelled in some printed sources as by ‘Incertus’ (Uncertain). We have also uncovered some interesting conflicting attributions which could provoke interesting scholarly discussion.

Early Music Online and F-TEMPO

From the outset, this project has been based on the Early Music Online (EMO) collection, the result of a 2011 JISC-funded Rapid Digitisation project between the British Library and Royal Holloway, University of London. This digitised about 300 books of early printed music at the BL from archival microfilms, producing black-and-white images which have served as an excellent proof of concept for the development of F-TEMPO. The c.200 books judged suitable for our early methods in EMO contain about 32,000 pages of music, and form the basis for our resource.

The current version of F-TEMPO includes just under 30,000 more pages of early printed music from the Polish National Library, Warsaw, as well as a few thousand from the Bibliothèque nationale, Paris. We will soon be incorporating no fewer than a further half-a-million pages from the Bavarian State Library collection in Munich, as soon as we have run them through our automatic indexing system.

 (This work was funded for the past year by the JISC / British Academy Digital Humanities Research in the Humanities scheme. Thanks are due to David Lewis, Golnaz Badkobeh and Ryaan Ahmed for technical help and their many suggestions.)

16 April 2020

BL Labs Community Commendation Award 2019 - Lesley Phillips - Theatre History

EXPLORING THEATRE HISTORY WITH BRITISH LIBRARY PLAYBILLS AND NEWSPAPERS

Posted on behalf of Lesley Phillips, a former Derbyshire local studies librarian in the UK and BL Labs Community Commendation Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Lesley explains how the British Library's digital collections of playbills and digtised newspapers enabled her to compile a detailed account of the career of the actor-manager John Faucit Saville in the East Midlands 1843-1855.

John Faucit Saville was born in Norwich in 1807, the son of two actors then performing with the Norwich Company as Mr and Mrs Faucit. His parents separated when he was 14 years old and just entering on his stage career. His mother, then a leading actress at Drury Lane, moved in with the celebrated actor William Farren, and continued to perform as Mrs Faucit, while his father became a manager and changed his surname to Saville (his real name).

Oxberry's Dramatic Biography (1825) records his father's grief:

On the evening that the fatal news [of his wife's desertion] reached him [Mr John Faucit] left the theatre and walked over the beach. His lips trembled and he was severely agitated. Many persons addressed him, but he broke from them and went to the house of a particular friend. The facts were then known only to himself. Though a man of temperate habits, he drank upwards of two bottles of wine without being visibly affected. He paced the room and seemed unconscious of the presence of anyone. To his friend's inquiries he made no reply. He once said “My heart is almost broke, but you will soon know why”.

(C.E. Oxberry (ed.) Oxberry's Dramatic Biography and Histrionic Anecdotes. Vol. III (1825) pp. 33-34, Memoir of William Farren)

Despite the rift between his parents, John Faucit Saville had all the advantages that famous friends and relatives could bring in the theatrical world, but during his time as an aspiring actor it soon became clear that he would never be a great star. In 1841 he began to put his energies into becoming a manager, like his father before him. He took a lease of Brighton Theatre in his wife's home town, but struggled to make a success of it.

Like the other managers of his day he was faced with a decline in the fashion for rational amusements and the rise of 'beer and circuses'. This did not deter him from making a further attempt at establishing a theatrical circuit. For this he came to the East Midlands and South Yorkshire, where the decline of the old circuit and the retirement of Thomas Manly had laid the field bare for a new man. Saville must surely have had great confidence in his own ability to be successful here, given that the old, experienced manager had begun to struggle.

Saville took on the ailing circuit, and soon discovered that he was forced to make compromises. He was careful to please the local authorities as to the respectability of his productions, and yet managed to provide more lowbrow entertainments to bring in the audiences. Even so, after a few years he was forced to reign in his ambitions and eventually reduce his circuit, and he even went back on tour as an itinerant actor from time to time to supplement his income. Saville's career had significant implications for the survival of some of the theatres of the East Midlands, as he lived through the final disintegration of the circuit.

Over the years, John Faucit Saville's acting career had taken him to Paris, Edinburgh, and Dublin, as well as many parts of England. Without the use of digital online resources it would be almost impossible to trace a career such as his, to explore his background, and bring together the details of his life and work.

Theatre-royal-brghton
Newspaper article from 29 January 1829 detailing the benefit performance for Mr Faucit entitled 'Clandestine Marriage' at the Theatre Royal Brighton

The digitised newspapers of the British Newspaper Archive https://www.britishnewspaperarchive.co.uk enabled me to uncover the Saville family origins in Bedford, and to follow John Faucit Saville's career from the heights of the London stage, to management at Brighton and then to the Midlands.

Saville-benefit
Newspaper article detailing benefit performance for Mr JF Saville at Theatre Royal Derby on Friday May 23, 1845, play entitled 'Don Caesar de Bazan' or 'Martina the Gypsy'

The dataset of playbills available to download from the British Library web site https://data.bl.uk/playbills/pb1.html enabled me to build up a detailed picture of Saville's work, the performers and plays he used, and the way he used them. It was still necessary to visit some libraries and archives for additional information, but I could never have put together such a rich collection of information without these digital resources.

My research has been put into a self-published book, filled with newspaper reviews of Saville's productions, and stories about his company. This is not just a narrow look at regional theatre; there are also some references to figures of national importance in theatre history. John Faucit Saville's sister, Helen Faucit, was a great star of her day, and his half-brother Henry Farren made his stage debut in Derbyshire with Saville's company. John Faucit Saville's wife Marianne performed with Macready on his farewell tour and also played at Windsor for Queen Victoria. The main interest for me, however, was the way theatre history reveals how national and local events impacted on society and public behaviour, and how the theatre connected with the life of the ordinary working man and woman.

Lesley-phillips-book
Front cover of my self-published book about John Faucit Saville

If you are interested in playbills generally, you might want to help British Library provide more information about individual ones through a crowdsourcing project, entitled 'In the Spotlight'.