Digital scholarship blog

Enabling innovative research with British Library digital collections


Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

15 September 2014

Finding Jokes - The Victorian Meme Machine

Add comment Comments (0)

Posted on behalf of Bob Nicholson.

The Victorian Meme Machine is a collaboration between the British Library Labs and Dr Bob Nicholson (Edge Hill University). The project will create an extensive database of Victorian jokes and then experiment with ways to recirculate them out over social media. For an introduction to the project, take a look at this blog post or this video presentation.

Stage One: Finding Jokes

Whenever I tell people that I’m working with the British Library to develop an archive of nineteenth-century jokes, they often look a bit confused. “I didn’t think the Victorians had a sense of humour”, somebody told me recently. This is a common misconception. We’re all used to thinking of the Victorians as dour and humourless; as a people who were, famously, ‘not amused’. But this couldn’t be further from the truth. In fact, jokes circulated at all levels of Victorian culture. While most of them have now been lost to history, a significant number have survived in the pages of books, periodicals, newspapers, playbills, adverts, diaries, songbooks, and other pieces of printed ephemera. There are probably millions of Victorian jokes sitting in libraries and archives just waiting to be rediscovered – the challenge lies in finding them.   

In truth, we don’t know how many Victorian gags have been preserved in the British Library’s digital collections. Type the word ‘jokes’ into the British Newspaper Archive or the JISC Historical Texts collection and you’ll find a handful of them fairly quickly. But this is just the tip of the iceberg. There are many more jests hidden deeper in these archives. Unfortunately, they aren’t easy to uncover. Some appear under peculiar titles, others are scattered around as unmarked column fillers, and many have aged so poorly that they no longer look like jokes at all. Figuring out an effective way to find and isolate these scattered fragments of Victorian humour is one of the main aims of our project. Here’s how we’re approaching it.

Firstly, we’ve decided to focus our attention on two main sources: books and newspapers. While it’s certainly possible to find jokes elsewhere, these sources provide the largest concentrations of material. A dedicated joke book, such as this Book of Humour, Wit and Wisdom, contains hundreds of viable jokes in a single package. Similarly, many Victorian newspapers carried weekly joke columns containing around 30 gags at a time – over the course of a year, a regularly printed column yields more than 1,500 jests. If we can develop an efficient way to extract jokes from these texts then we’ll have a good chance of meeting our target of 1 million gags.


Our initial searches have focused on two digital collections:

1)      The 19th Century British Library Newspapers Database.

2)      A collection of nineteenth-century books digitised by Microsoft.

In order to interrogate these databases we’ve compiled a continually-expanding list of search terms. Obvious keywords like ‘jokes’ and ‘jests’ have proven to be effective, but we’ve also found material using words like ‘quips’, ‘cranks’, ‘wit’, ‘fun’, ‘jingles’, ‘humour’, ‘laugh’, ‘comic’, ‘snaps’, and ‘siftings’. However, while these general search terms are useful, they don’t catch everything. Consider these peculiarly-named columns from the Hampshire Telegraph:


At first glance, they look like recipes for buckwheat cakes – in fact, they’re columns of imported American jokes named after what was evidently considered to be a characteristically Yankee delicacy. I would never have found these columns using conventional keyword searches. Uncovering material like this is much more laborious, and requires us to manually look for peculiarly-named books and joke columns.

In the case of newspapers, this requires a bit of educated guesswork. Most joke columns appeared in popular weekly papers, or in the weekend editions of mass-market dailies. So, weighty, morning broadsheets like the London Times are unlikely to yield many gags. Similarly, while the placement of jokes columns varied from paper to paper (and sometimes from issue to issue), they were typically placed at the back of the paper alongside children’s columns, fashion advice, recipes, and other miscellaneous tit-bits of entertainment. Finally, once a newspaper has been proven to contain one set of joke columns, the likelihood is that more will be found under other names. For example, initial keyword searches seem to suggest that the Newcastle Weekly Courant discontinued its long-running ‘American Humour’ column in 1888. In fact, the column was simply renamed ‘Yankee Snacks’ and continued to appear under this title for another 8 years.

Tracking a single change of identity like this is fairly straightforward; once the new title has been identified we simply need to add it to our list of search terms. Unfortunately, the editorial whims of some newspapers are harder to follow. For example, the Hampshire Telegraph often scattered multiple joke columns throughout a single issue. To make things even more complicated, they tended to rename and reposition these columns every couple of weeks. Here’s a sample of the paper’s American humour columns, all drawn from the first 6 months of 1892:

For papers like this, the only option is to manually locate jokes columns one at a time. In other words, while our initial set of core keywords should enable us to find and extract thousands of joke columns fairly quickly, more nuanced (and more laborious) methods will be required in order to get the rest.

It’s important to stress that jokes were not always printed in organised collections. Some newspapers mixed humour with other pieces of entertaining miscellany under titles such as ‘Varieties’ or ‘Our Carpet Bag’. The same is true of books, which often combined jokes with short stories, comic songs, and material for parlour games. While it’s fairly easy to find these collections, recognising and filtering out the jokes is more problematic. As our project develops, we’d like to experiment with some kind of joke-detection tool that pick out content with similar formatting and linguistic characteristics to the jokes we’ve already found. For example, conversational jokes usually have capitalised names (or pronouns) followed by a colon and, in some cases, include a descriptive phrase enclosed in brackets. So, if a text includes strings of characters like “Jack (…):” or “She (…):“ then there’s a good chance that it might be a joke. Similarly, many jokes begin with a capitalised title followed by a full-stop and a hyphen, and end with an italicised attribution. Here’s a characteristic example of all three trends in action:


Unfortunately, conventional search interfaces aren’t designed to recognise nuances in punctuation, so we’ll need to build something ourselves. For now, we’ve chosen to focus our efforts on harvesting the low-hanging fruit found in clearly defined collections of jokes.

                The project is still in the pilot stage, but we’ve already identified the locations of more than 100,000 jokes. This is more than enough for our current purposes, but I hope we’ll be able to push onwards towards a million as the project expands. The most effective way to do this may well to be harness the power of crowdsourcing and invite users of the database to help us uncover new sources. It’s clear from our initial efforts that a fully-automated approach won’t be effective. Finding and extracting large quantities of jokes – or, indeed, any specific type of content – from among the millions of pages of books and newspapers held in the library’s collection requires a combination of computer-based searching and human intervention. If we can bring more people on board we’ll be able to find and process the jokes much faster.

Finding gags is just the first step. In the next blog post I’ll explain how we’re extracting joke columns from the library’s digital collections, importing them into our own database, and transcribing their contents. Stay tuned!


01 September 2014

Wikimania and UK Wikimedian of the Year 2014 Awards

Add comment Comments (0)

This year it was very exciting that Wikimania 2014, the official annual conference of the Wikimedia Foundation, was held in the UK for the first time. It was a wonderful opportunity to catch up with old friends; such as the Library’s previous Wikipedian-in-Residence Andrew Gray, I also met new interesting folk, many from other libraries and cultural heritage institutions around the world, as there was a whole strand of the programme devoted to the GLAM sector.

Wikimedia UK used the conference closing ceremony for Jimmy Wales to announce the winners of the UK Wikimedian of the Year 2014 awards. The main award went to Ed Saperia for his hard work in organising Wikimania 2014. GLAM of the Year went to our friends (and my old colleagues) at the National Library of Scotland, Educational Institution of the Year to the University of Portsmouth; with Professor Humphrey Southall, who the British Library collaborates with on the successful Geofreferencer project, collecting their award. It was also very pleasing for Honourable Mentions to be given to Andy Mabbett, also known as Pigsonthewing, who started the Voice Intro Project and last, but definitely not least, the British Library received an Honourable Mention for the Mechanical Curator and Flickr Commons image release. Ben O’Steen from British Library Labs who created the Mechanical Curator received the award on behalf of the Library and had the privilege of shaking Jimmy Wales’ hand on stage.


Wikimedia UK 2014 Award Winners, including Ben O’Steen from the British Library

Plans are now under way for next year’s Wikimania in Mexico City, which will take place 15-19 July 2015 in a library for the first time: la Biblioteca Vasconcelos (Vasconcelos Library) also known as the Megabiblioteca ("megalibrary"); from looking at photos I can see why it has this nickname! It also includes a huge whale sculpture by Gabriel Orozco in the centre of the building, for more info on how this was created and assembled check out this Tate blog post.


Stella Wisdom

Curator, Digital Research


27 August 2014

The British Library Meets Burning Man

Add comment Comments (0)

Posted on behalf of David Normal (edited by Sophie McIvor and Mahendra Mahey)

The British Library meets Burning Man…

In December 2013 the British Library uploaded over a million images from our 19th century digitised books onto Flickr Commons, with the invitation for anyone to remix, re-use and re-purpose the content as they wish.

The response from the online community was outstanding, but by far the most unexpected use of the British Library’s Flickr Commons images is happening this week - the collection has inspired four large-scale artworks on display at this year’s Burning Man festival in the Nevada desert, created by David Normal, a California-based artist with a special interest in 19th century illustration.

David_normal_light_box_errecting_burning_man_1One of David’s four paintings being installed at Burning Man 2014
(photographed by Andrew Spalding)

A video showing the process of one of the lightboxes being installed at Burning Man 2014 
(Courtesy of David Normal)

Before he headed off to the desert to install his “Crossroads of Curiosity’ artworks at the festival, we spoke to David about how this came about, and how he used the image collection:

What first attracted you to the idea of using 19th Century illustrations in your art?

Beginning as a teenager I was interested in making “seamless” collages, in which the elements go together so smoothly that it looks as though it were all one illustration. I love Max Ernst’s collage novel, “Une Semaine De Bonte” which took this seamless collage aesthetic to its zenith using 19th century illustration.  Recently, I began painting over digital collage prints, and this process opened up a lot of possibilities, to the point where I felt that I could use the 19th century in a fresh way that is not derivative of Ernst’s work.

How did you come across the British Library’s Flickr Commons collection?

The guitarist of the punk band “Flipper” mentioned something about it and at the time I had already initiated the plan to create paintings based on 19th Century images for Burning Man, and so learning of this vast online collection was thrilling and truly fortuitous since it was exactly what I was looking for.

How has the Library’s collection informed your artwork?

After being introduced to the collection I realized that everything I needed was there.  I decided to use the collection exclusively, and make that one of the hallmarks of the project. Indeed, I feel that the “Crossroads of Curiosity” celebrates this amazing collection.

One of the most striking aspects of the collection is its colossal size.  Having a lot of material to choose from is important in collage making, since out of excess come the chance juxtapositions that are so magical.

Another thing that was very helpful to me was the randomness.  The majority of the images are in no particular order in the photostream, and viewing the images in succession was like taking a journey through a landscape of illustrated symbols. 

How did you identify which images you wanted to use?

Certain images have some symbolic power or strangeness that intrigues me and those are the images I am drawn too.  This has to do with thematic preoccupations that percolate up from my subconscious on the one hand, and with my taste in things on the other, and also with the specific theme I am working with on the Burning Man project, which is “Caravansary - The Silk Road”.  I have favorited nearly 3000 images on my own Flickr page.

What happens next?

I start with selecting several images that I think will go together well.  I bring them into Photoshop and then begin to arrange and play with them.  As the composition develops the images are increasingly cleaned up, edited, and composed together. 

These images below outline the development of the collage painting, “Conflamingulation”, one of four which will be featured on 8’x20’ lightpanels at Burning Man:

David_normal_flickr_commons_favouritesThe chance conjunction of the machine gunner and the skunk suggests an idea for a collage.

 David_normal_machine_gun_skunkA rough collage is made.

David_normal_collage_1Different arrangements are experimented with.

A final version is arrived at that is the basis of the painting.

David_normal_collage_3Finished painting: 
“Conflamingulation”, Acrylic on polypropylene film, lightpanel,  35” x 96”, 2014

Which is your favourite of all the images you’ve discovered on the Flickr Commons collection?

I think I have not viewed more than 10% of the collection altogether, so I can’t say that I have enough familiarity to choose a favourite fairly.  However, if I had to select a single image then perhaps I would choose this skunk because of his great versatility as a piece of clip art.


Image available at the British Library Flickr Commons page
Taken from  page 42 of the book, OUR EARTH AND ITS STORY, A Popular Treatise on Physical Geography, Edited by Robert Brown, Published by Cassell and Company Limited

What is special about a collection like this?

Being able to use illustrations as a way of approaching books is interesting - typically the reverse is the case;  reading a book you find the illustrations and not vice versa.

What do you hope that people at Burning Man will take from the finished pieces?

Larry Harvey, the director of Burning Man, has said that he hopes the pieces will evoke a feeling of “romance”, in the sense of the romanticism of myths and fairytales such as the Arabian Nights.  I will concur with that.  The pieces are meant to show the intersections of distant times, places, peoples and things in humorous and thought provoking ways.  It is a cabinet of curiosities that has opened up to encompass the world in series of dramatic tableaux.  I hope the Crossroads of Curiosity fills the viewer with wonder, and arouses their own curiosity.

David Normal’s ‘Crossroads of Curiosity’ artworks are on display at the Burning Man Festival from 25 August – 1 September.

Here is one his illuminated panels from Burning Man 2014:

David_normal_light_illuminatedOne of David Normal's illuminated panels for Burning Man 2014.

You can discover more about his work at