THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

22 August 2016

SherlockNet: tagging and captioning the British Library’s Flickr images

Finalists of the BL Labs Competition 2016, Karen Wang, Luda Zhao and Brian Do, inform us on the progress of their SherlockNet project:

Sherlock

This is an update on SherlockNet, our project to use machine learning and other computational techniques to dramatically increase the discoverability of the British Library’s Flickr images dataset. Below is some of our progress on tagging, captioning, and the web interface.

Tags

When we started this project, our goal was to classify every single image in the British Library's Flickr collection into one of 12 tags -- animals, people, decorations, miniatures, landscapes, nature, architecture, objects, diagrams, text, seals, and maps. Over the course of our work, we realised the following:

  1. We were achieving incredible accuracy (>80%) in our classification using our computational methods.
  2. If our algorithm assigned two tags to an image with approximately equal probability, there was a high chance the image had elements associated with both tags.
  3. However, these tags were in no way enough to expose all the information in the images.
  4. Luckily, each image is associated with text on the corresponding page.

We thus wondered whether we could use the surrounding text of each image to help expand the “universe” of possible tags. While the text around an image may or may not be directly related to the image, this strategy isn’t without precedent: Google Images uses text as its main method of annotating images! So we decided to dig in and see how this would go.

As a first step, we took all digitised text from the three pages surrounding each image (the page before, the page of, and the page after) and extracted all noun phrases. We figured that although important information may be captured in verbs and adjectives, the main things people will be searching for are nouns. Besides, at this point this is a proof of principle that we can easily extend later to a larger set of words. We then constructed a composite set of all words from all images, and only kept words present in between 5% and 80% of documents. This was to get rid of words that were too rare (often misspellings) or too common (words like ‘the’, ‘a’, ‘me’ -- called “stop words” in the natural language processing field).

With this data we were able to use a tool called Latent Dirichlet Allocation (LDA) to find “clusters” of images in an automatic way. We chose the original 12 tags after manually going through 1,000 images on our own and deciding which categories made the most sense based on what we saw; but what if there are categories we overlooked or were unable to discern by hand? LDA solves this by trying to find a minimal set of tags where each document is represented by a set of tags, and each tag is represented by a set of words. Obviously the algorithm can’t provide meaning to each tag, so we provide meaning to the tag by looking at the words that are present or absent in each tag. We ran LDA on a sample of 10,000 images and found tags clusters for men, women, nature, and animals. Not coincidentally, these are similar to our original tags and represent a majority of our images.

This doesn’t solve our quest for expanding our tag universe though. One strategy we thought about was to just use the set of words from each page as the tags for each image. We quickly found, however, that most of the words around each image are irrelevant to the image, and in fact sometimes there was no relation at all. To solve this problem, we used a voting system [1]. From our computational algorithm, we found the 20 images most similar to the image in question. We then looked for the words that were found most often in the pages around these 20 images. We then use these words to describe the image in question. This actually works quite well in practice! We’re now trying to combine this strategy (finding generalised tags for images) with the simpler strategy (unique words that describe images) to come up with tags that describe images at different “levels”.

Image Captioning

We started with a very ambitious goal: given only the image input, can we give a machine -generated, natural-language description of the image with a reasonably high degree of accuracy and usefulness? Given the difficulty of the task and of our timeframe, we didn’t expect to get perfect results, but we’ve hoped to come up with a first prototype to demonstrate some of the recent advances and techniques that we hope will be promising for research and application in the future.

We planned to look at two approaches to this problem:

  • Similarity-based captioning. Images that are very similar to each other using a distance metric often share common objects, traits, and attributes that shows up in the distribution of words in their captions. By pooling words together from a bag of captions of similar images, one can come up with a reasonable caption for the target image.
  • Learning-based captioning. By utilising a CNN similar to what we used for tagging, we can capture higher-level features in images. We then attempt to learn the mappings between the higher-level features and their representations in words, using either another neural network or other methods.

We have made some promising forays into the second technique. As a first step, we used a pre-trained CNN-RNN architecture called NeuralTalk to caption our images. As the models are trained on the Microsoft COCO dataset, which consists of pictures and photograph that differs significantly from the British Library's  Flickr dataset, we expect the transfer of knowledge to be difficult. Indeed, the resulting captions of some ~1000 test images show that weakness, with the black-and-white exclusivity of the British Library illustration and the more abstract nature of some illustrations being major roadblocks in the qualities of the captioning. Many of the caption would comment on the “black and white” quality of the photo or “hallucinate” objects that did not exist in the images. However, there were some promising results that came back from the model. Below are some hand-pick examples. Note that this was generated with no other metadata; only the raw image was given.

S1 S2 S3
From a rough manual pass, we estimate that around 1 in 4 captions are of useable quality: accurate, contains interesting and useful data that would aid in search discovery, catalogisation etc., with occasional gems (like the elephant caption!). More work will be directed to help us increase this metric.

Web Interface

We have been working on building the web interface to expose this novel tag data to users around the world.

One thing that’s awesome about making the British Library dataset available via Flickr, is that Flickr provide an amazing API for developers. The API exposes, among other functions, the image website’s search logic via tags as well as free text search using the image title and description, and the capability to sort by a number of factors including relevance and “interestingness”. We’ve been working on using the Flickr API, along with AngularJS and Node.js to build a wireframe site. You can check it out here.

If you look at the demo or the British Library's Flickr album, you’ll see that each image has a relatively sparse set of tags to query from. Thus, our next steps will be adding our own tags and captions to each image on Flickr. We will pre-pend these with a custom namespace to distinguish them from existing user-contributed and machine tags, and utilise them in queries to find better results.

Finally, we are interested in what users will use the site for. For example, we could track user’s queries and which images they click on or save. These images are presumably more relevant to these queries, and we rank them higher in future queries. We also want to be able to track general analytics like the most popular queries over time. Thus incorporating user analytics will be the final step in building the web interface.

We welcome any feedback and questions you may have! Contact us at teamsherlocknet@gmail.com

References

[1] Johnson J, Ballan L, Fei-Fei L. Love Thy Neighbors: Image Annotation by Exploiting Image Metadata. arXiv (2016)

19 August 2016

BL Labs Awards (2016): enter before midnight 5th September!

The BL Labs Awards formally recognises outstanding and innovative work that has been created using the British Library’s digital collections and data.

The closing date for entering the BL Labs Awards (2016) is midnight BST on 5th September. So please submit your entry and/or help us spread the word to all interested and relevant parties over these final weeks. This will ensure we have another year of fantastic digital-based projects highlighted by the Awards competition!

This year, the BL Labs Awards is commending work in four key areas:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

After the submission deadline of midnight BST on 5th September for entering the BL Labs Awards has past, the entries will be shortlisted. Selected shortlisted entrants will be notified via email by midnight BST on Wednesday 21st September 2016. A prize of £500 will be awarded to the winner and £100 to the runner up of each Awards category at the Labs Symposium on 7th November 2016 at the British Library, St Pancras, courtesy of the Andrew W. Mellon Foundation.

The talent of the BL Labs Awards winners and runners of 2015 has led to the production a remarkable and varied collection of innovative projects. Last year, the Awards commended work in three main categories – Research, Creative/Artistic and Entrepreneurship:

All

Image:
(Top-left) Spatial Humanities research group at the University Lancaster plotting mentions of disease in newspapers on a map in Victorian times;
(Top-right) A computer generated work of art, part of 'The Order of Things' by Mario Klingemann;
(Bottom-left) A bow tie made by Dina Malkova inspired by a digitised original manuscript of Alice in Wonderland;
(Bottom-right) Work on Geo-referencing maps discovered from a collection of digitised books at the British Library that James Heald is still involved in.

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.

12 August 2016

Black Abolitionist Performances and their Presence in Britain

Posted by Mahendra Mahey on behalf of Hannah-Rose Murray, finalist of the BL Labs Competition 2016

Overview of the project

The Black Abolitionist project focuses on African American lives, experiences and lectures in Britain between 1830-1895. It builds on my PhD project, which I am currently studying for at the Department of American and Canadian Studies, University of Nottingham. Working with the British Library has already proved a fortunate and enriching opportunity, and by harnessing the power of technology, we want to work together to search through thousands of newspapers to find abolitionist speeches, a process that would take years by hand. By reading black abolitionist speeches in the Nineteenth Century Newspaper Collection (and using the Flickr collection to illustrate), we can get a sense of their performances and how their lectures reached nearly every corner of Britain. Newspapers can also provide us with the locations of these meetings, and for the first time, I have mapped these locations to gather an estimate of how many lectures black abolitionists gave in Britain and to allow their hidden voices to be heard. I am updating my website to reflect this project, which can be found at www.frederickdouglassinbritain.com.

These are the maps I have so far: the map (below left) chronicles the lectures of Frederick Douglass, and the second one (on the right) represents the lectures given by other black abolitionists such as Josiah Henson, Sarah Remond, Moses Roper, William Wells Brown, Henry ‘Box’ Brown, Ida B. Wells, James Watkins and William and Ellen Craft (to name a few): Abolitionist_maps

African Americans visited Britain for a variety of reasons. Many came to publish slave narratives, teach Britons about slavery and look for their support in the abolitionist cause. Others came to live in Britain safely, away from the ever-watchful eyes of slave-catchers, while several wanted to raise money to purchase family members from the jaws of slavery. 

Black abolitionists made their mark in nearly every part of Great Britain, and it is of no surprise to learn they had a strong impact on London too. Lectures were held in famous meeting halls, taverns, the houses of wealthy patrons, theatres, and churches across London: we inevitably and unknowably walk past sites with a rich history of Black Britain every day.

When searching the newspapers, what we have found so far is that the OCR (Optical Character Recognition) is patchy at best. OCR refers to scanned images that have been turned into machine-readable text, and the quality of the OCR can depend on many factors – from the quality of the scan itself, to the quality of the paper the newspaper was printed on, to whether it has been damaged or ‘muddied.’ If the OCR is unintelligible, the data will not be ‘read’ properly – hence there could be hundreds of references to Frederick Douglass that are not accessible or ‘readable’ to us through an electronic search (see the image below).

American_slavery_f_douglass

In order to clean and sort through the ‘muddied’ OCR and the ‘clean’ OCR, we need to teach the computer what is ‘positive text’ (i.e., language that uses the word ‘abolitionist’, ‘black’, ‘fugitive’, ‘negro’) and ‘negative text’ (language that does not relate to abolition). For example, the image to the left shows an advert for one of Frederick Douglass’s lectures (Leamington Spa Courier, 20 February 1847). The key words in this particular advert that are likely to appear in other adverts, reports and commentaries are ‘Frederick Douglass’, ‘fugitive’, ‘slave’, ‘American’, and ‘slavery.’ I can search for this advert through the digitized database, but there are perhaps hundreds more waiting to be uncovered.

I have spent several years transcribing many of Frederick Douglass’ speeches and most of this will act as the ‘positive’ text. ‘Negative’ text can refer to other lectures of a similar structure but do not relate to abolition specifically, for example prison reform meetings or meetings about church finances. This will ensure the abolitionist language becomes easily readable. We can then test the performance of this against some of the data we already have, and once the probability ensures we are on the right track, we can apply it to a larger data set.

The prospect of uncovering hidden speeches by African Americans is incredibly exciting, and hopefully this will add to our knowledge of the black presence in Britain: we can use these extensive sources to build a more complete picture of Victorian London in particular.