Finding Jokes - The Victorian Meme Machine
Posted on behalf of Bob Nicholson.
The Victorian Meme Machine is a collaboration between the British Library Labs and Dr Bob Nicholson (Edge Hill University). The project will create an extensive database of Victorian jokes and then experiment with ways to recirculate them out over social media. For an introduction to the project, take a look at this blog post or this video presentation.
Whenever I tell people that Iâm working with the British Library to develop an archive of nineteenth-century jokes, they often look a bit confused. âI didnât think the Victorians had a sense of humourâ, somebody told me recently. This is a common misconception. Weâre all used to thinking of the Victorians as dour and humourless; as a people who were, famously, ânot amusedâ. But this couldnât be further from the truth. In fact, jokes circulated at all levels of Victorian culture. While most of them have now been lost to history, a significant number have survived in the pages of books, periodicals, newspapers, playbills, adverts, diaries, songbooks, and other pieces of printed ephemera. There are probably millions of Victorian jokes sitting in libraries and archives just waiting to be rediscovered â the challenge lies in finding them.
In truth, we donât know how many Victorian gags have been preserved in the British Libraryâs digital collections. Type the word âjokesâ into the British Newspaper Archive or the JISC Historical Texts collection and youâll find a handful of them fairly quickly. But this is just the tip of the iceberg. There are many more jests hidden deeper in these archives. Unfortunately, they arenât easy to uncover. Some appear under peculiar titles, others are scattered around as unmarked column fillers, and many have aged so poorly that they no longer look like jokes at all. Figuring out an effective way to find and isolate these scattered fragments of Victorian humour is one of the main aims of our project. Hereâs how weâre approaching it.
Firstly, weâve decided to focus our attention on two main sources: books and newspapers. While itâs certainly possible to find jokes elsewhere, these sources provide the largest concentrations of material. A dedicated joke book, such as this Book of Humour, Wit and Wisdom, contains hundreds of viable jokes in a single package. Similarly, many Victorian newspapers carried weekly joke columns containing around 30 gags at a time â over the course of a year, a regularly printed column yields more than 1,500 jests. If we can develop an efficient way to extract jokes from these texts then weâll have a good chance of meeting our target of 1 million gags.
Our initial searches have focused on two digital collections:
2) A collection of nineteenth-century books digitised by Microsoft.
In order to interrogate these databases weâve compiled a continually-expanding list of search terms. Obvious keywords like âjokesâ and âjestsâ have proven to be effective, but weâve also found material using words like âquipsâ, âcranksâ, âwitâ, âfunâ, âjinglesâ, âhumourâ, âlaughâ, âcomicâ, âsnapsâ, and âsiftingsâ. However, while these general search terms are useful, they donât catch everything. Consider these peculiarly-named columns from the Hampshire Telegraph:
At first glance, they look like recipes for buckwheat cakes â in fact, theyâre columns of imported American jokes named after what was evidently considered to be a characteristically Yankee delicacy. I would never have found these columns using conventional keyword searches. Uncovering material like this is much more laborious, and requires us to manually look for peculiarly-named books and joke columns.
In the case of newspapers, this requires a bit of educated guesswork. Most joke columns appeared in popular weekly papers, or in the weekend editions of mass-market dailies. So, weighty, morning broadsheets like the London Times are unlikely to yield many gags. Similarly, while the placement of jokes columns varied from paper to paper (and sometimes from issue to issue), they were typically placed at the back of the paper alongside childrenâs columns, fashion advice, recipes, and other miscellaneous tit-bits of entertainment. Finally, once a newspaper has been proven to contain one set of joke columns, the likelihood is that more will be found under other names. For example, initial keyword searches seem to suggest that the Newcastle Weekly Courant discontinued its long-running âAmerican Humourâ column in 1888. In fact, the column was simply renamed âYankee Snacksâ and continued to appear under this title for another 8 years.
Tracking a single change of identity like this is fairly straightforward; once the new title has been identified we simply need to add it to our list of search terms. Unfortunately, the editorial whims of some newspapers are harder to follow. For example, the Hampshire Telegraph often scattered multiple joke columns throughout a single issue. To make things even more complicated, they tended to rename and reposition these columns every couple of weeks. Hereâs a sample of the paperâs American humour columns, all drawn from the first 6 months of 1892:
For papers like this, the only option is to manually locate jokes columns one at a time. In other words, while our initial set of core keywords should enable us to find and extract thousands of joke columns fairly quickly, more nuanced (and more laborious) methods will be required in order to get the rest.
Itâs important to stress that jokes were not always printed in organised collections. Some newspapers mixed humour with other pieces of entertaining miscellany under titles such as âVarietiesâ or âOur Carpet Bagâ. The same is true of books, which often combined jokes with short stories, comic songs, and material for parlour games. While itâs fairly easy to find these collections, recognising and filtering out the jokes is more problematic. As our project develops, weâd like to experiment with some kind of joke-detection tool that pick out content with similar formatting and linguistic characteristics to the jokes weâve already found. For example, conversational jokes usually have capitalised names (or pronouns) followed by a colon and, in some cases, include a descriptive phrase enclosed in brackets. So, if a text includes strings of characters like âJack (âŠ):â or âShe (âŠ):â then thereâs a good chance that it might be a joke. Similarly, many jokes begin with a capitalised title followed by a full-stop and a hyphen, and end with an italicised attribution. Hereâs a characteristic example of all three trends in action:
Unfortunately, conventional search interfaces arenât designed to recognise nuances in punctuation, so weâll need to build something ourselves. For now, weâve chosen to focus our efforts on harvesting the low-hanging fruit found in clearly defined collections of jokes.
The project is still in the pilot stage, but weâve already identified the locations of more than 100,000 jokes. This is more than enough for our current purposes, but I hope weâll be able to push onwards towards a million as the project expands. The most effective way to do this may well to be harness the power of crowdsourcing and invite users of the database to help us uncover new sources. Itâs clear from our initial efforts that a fully-automated approach wonât be effective. Finding and extracting large quantities of jokes â or, indeed, any specific type of content â from among the millions of pages of books and newspapers held in the libraryâs collection requires a combination of computer-based searching and human intervention. If we can bring more people on board weâll be able to find and process the jokes much faster.
Finding gags is just the first step. In the next blog post Iâll explain how weâre extracting joke columns from the libraryâs digital collections, importing them into our own database, and transcribing their contents. Stay tuned!