Digital scholarship blog

7 posts from October 2014

31 October 2014

2014 Off the Map Competition Winners Announced at GameCity9 Festival

Last night was the award ceremony at Nottingham Contemporary art gallery for the Off the Map 2014 competition, a partnership project with GameCity and Crytek. Now in its second year, Off the Map challenges UK Higher Education students to make videogames based on British Library collection items using Crytek's CRYENGINE software. Furthermore, for 2014, the competition had a gothic theme to accompany the British Library's current exhibition Terror and Wonder: The Gothic Imagination, which is open until Tuesday 20 January 2015 and is well worth a visit.

I've created a video, which you can see below, showing flythrough footage of last year's winning entry from Pudding Lane Productions, De Montfort University, Leicester. It also gives details of the 2014 gothic sub-themes and shows flythrough clips from this year's shortlisted entries.



The jury were impressed by the quality and creativity of the submitted entries, so there was passionate debate regarding the deciding the 2014 shortlist! The third winning entry was Team Shady Agents from University of South Wales in Newport with their Edgar Allan Poe inspired game Crimson Moon. The second winning entry was Team Flying Buttress from De Montfort University, who created a visually rich interpretation of Dracula's Whitby. 

I was delighted that British Library Chief Executive Roly Keating announced the winning entry:  Nix, this was created by Jackson Rolls-Gray, Sebastian Filby and Faye Allen from the University of South Wales. Using Oculus Rift, a revolutionary virtual reality headset for 3D gaming; it challenges players to reconstruct Fonthill Abbey via collecting hidden and moving glowing orbs in a spooky underwater world. You can see a flythrough of their game below:



My colleague Tim Pye, curator of Terror and Wonder and a member of this year's Off the Map jury, said: “The original architectural model of Fonthill Abbey is currently on display in Terror and Wonder.  What is so impressive about the Nix game is the way in which it takes the stunning architecture of the Abbey, combines it with elements from its troubled history and infuses it all with a very ghostly air.  The game succeeds in transforming William Beckford’s stupendously Gothic building into a magical, mysterious place reminiscent of the best Gothic novels.”

Nix also impressed fellow jury member Scott Fitzgerald, Crytek's CRYENGINE Sandbox Product Manager he said: “With the theme of Fonthill Abbey, the winning team took the fantasy route and twisted the story into something fresh and completely different.  The mechanics used to progress through the game and the switching between the two realities make a very interesting experience for the player.”

I'd like to thank this year's jury members: Tim Pye, Tom Harper, Kim Blake and Scott Fitzgerald. I also want to thank all of this year's Off the Map participating teams, far from being a terror, it has been a delight to follow the students' work via their blogs and YouTube channels.

Plans are currently underway for the third competition: "Alice's Adventures Off the Map", which will be launched at the British Library on Monday 8 December 2014, at one of the Digital Research team's Digital Conversation events. If you would like to come along to find out more, book here.


Stella Wisdom

Curator, Digital Research


30 October 2014

British Library Digital Scholarship Training Programme: a round-up of resources you can use

The British Library Digital Scholarship Training Programme provides hands-on practical training for British Library staff delivered as one-day on-site workshops covering topics from communicating collections and cleaning up data to command line programming and geo-referencing. Since launching in November 2012 over 250 individual members of staff have attended one or more session with over 60 course days delivered.

We've blogged about the programme before (see '50th Anniversary!'), and the more we go around talking about it (most recently at Digital Humanities 2014 and Data Driven: DH in the Library) the more we hear from librarians, curators, academics, and other professionals in the cultural sector looking to build similar programmes and looking to learn from our model.

Although the British Library Digital Scholarship Training Programme is an internal programme, we've made efforts over the last year to release bits of the programme externally. In lieu of having a central home for these outputs, this post collates all those bits of the programme that have floated out onto the open web, usually under generous licences.

Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions

Mia Ridge leads this course for us. Notes, links, and references relating to the course are on her blog.

 Data visualisation for analysis in scholarly research

Again, Mia Ridge leads this course for us. Notes, links, and references relating to the course are on her blog.

Information Integration: Mash-ups, APIs and the Semantic Web

Owen Stephens leads this course for us. Both his slides and the hands-on exercise he developed for the course are available on his blog and licensed under a Creative Commons Attribution 4.0 International License.

Programming in Libraries

There is a great deal of cross-over between this course and two lessons I wrote for the Programming Historian with Ian Milligan: Introduction to the Bash Command Line and Counting and mining research data with Unix. Both lessons are licensed under a Creative Commons Attribution 2.0 Generic License.

Managing Personal Digital Research Information

This course is led by Sharon Howard, the bulk of which covers Zotero. A wiki resource was developed by Sharon for the course attendees to work through and this was subsequently released under a Creative Commons Attribution-ShareAlike 3.0 Unported License as A Zotero Guide.

[update 21/11/14] Cleaning up Data

This course is led by Owen Stephens. Both his slides and the hands-on exercise he developed for the course are available on his blog and licensed under a Creative Commons Attribution 4.0 International License.

[update 11/03/15] Mapping your Data

I led this course in June 2014. Intro, exercises, hand out, and data are available on Figshare (DOI: 10.6084/m9.figshare.1332408).


James Baker

Curator, Digital Research


Creative Commons License

This post is licensed under a Creative Commons Attribution 4.0 International License.

22 October 2014

Victorian Meme Machine - Extracting and Converting Jokes

Posted on behalf of Bob Nicholson.

The Victorian Meme Machine is a collaboration between the British Library Labs and Dr Bob Nicholson (Edge Hill University). The project will create an extensive database of Victorian jokes and then experiment with ways to recirculate them over social media. For an introduction to the project, take a look at this blog post or this video presentation.

1 - intro image

In my previous blog post I wrote about the challenge of finding jokes in nineteenth century books and newspapers. There’s still a lot of work to be done before we have a truly comprehensive strategy for identifying gags in digital archives, but our initial searches scooped up a lot of low-hanging fruit. Using a range of keywords and manual browsing methods we quickly managed to identify the locations of more than 100,000 gags. In truth, this was always going to be the easy bit. The real challenge lies in automatically extracting these jokes from their home-archives, importing them into our own database, and then converting them into a format that we can broadcast over social media.

Extracting joke columns from the 19th Century British Library Newspaper Archive – the primary source of our material – presents a range of technical and legal obstacles. On the plus side, the underlying structure of the archive is well-suited to our purposes. Newspaper pages have already been broken up into individual articles and columns, and the XML for each these articles includes an ‘Article Title’ field. As a result, it should theoretically be possible to isolate every article with the title “Jokes of the Day” and then extract them from the rest of the database. When I pitched this project to the BL Labs, I naïvely thought that we’d be able to perform these extractions in a matter of minutes – unfortunately, it’s not that easy. 

1-5 -joke_syntaxMarking up a joke with tags

The archive’s public-facing platform is owned and operated by the commercial publisher Gale Cengage, who sells subscriptions to universities and libraries around the world (UK universities currently get free access via JISC). Consequently, access to the archive’s underlying content is restricted when using this interface. While it’s easy to identify thousands of joke columns using the archive’s search tools, it isn’t possible to automatically extract all of the results. The interface does not provide access to the underlying XML files, and images can only be downloaded one-by-one using a web browser’s ‘save image as’ button. In other words, we can’t use the commercial interface to instantly grab the XML and TIFF files for every article with the phrase “Jokes of the Week” in its title.

The British Library keeps its own copies these files, but they are currently housed in a form of digital deep-storage that is impossible for researchers to directly access and extremely cumbersome to discover content within it. In order to move forward with the automatic extraction of jokes we will need to secure access to this data, transfer it onto a more accessible internal server, custom build an index that allows us to search the full text of the articles and titles so that we may extract all of the relevant text and image files showing the areas of the newspaper scans from which the text was derived.

All of this is technically possible, and I’m hopeful that we’ll find a way to do it in the next stage of the project. However, given the limited time available to us we decided to press ahead with a small sample of manually extracted columns and focus our attention on the next stages of the project. This manually created sample will be of great use in future, as we and other research groups can use it to train computer models, which should enable us to automatically classify text from other corpora as potentially containing jokes that we would not have been able to find otherwise.

For our sample we manually downloaded all of the ‘Jokes of the Day’ columns published by Lloyd’s Weekly News in 1891. Here’s a typical example:

2 - joke column

These columns contain a mixture of joke formats – puns, conversations, comic stories, etc – and are formatted in a way that makes them broadly representative of the material found elsewhere in the database. If we can find a way to process 1,000 jokes from this source, we shouldn’t have too much difficulty scaling things up to deal with 100,000 similar gags from other newspapers.    

Our sample of joke columns was downloaded as a set of jpeg images. In order to make them keyword searchable, transform them into ‘memes’, and send them out over social media we first need to convert them into accurate, machine-readable text. We don’t have access to the existing OCR data, but even if this was available it wouldn’t be accurate enough for our purposes. Here’s an example of how one joke has been interpreted by OCR software:

  3 - OCR comparison
Some gags have been rendered more successfully than this, but many are substantially worse. Joke columns often appeared at the edge of a page, which makes them susceptible to fading and page bending. They also make use of unusual punctuation, which tends to confuse the scanning software. Unlike newspaper archives, which remain functional even with relatively low-quality OCR, our project requires 100% accuracy (or something very close) in order to republish the jokes in new formats.

So, even if we had access to OCR data we’d need to correct and improve it manually. We experimented with this process using OCR data taken from the British Newspaper Archive, but the time it took to identify and correct errors turned out to be longer than transcribing the jokes from scratch. Our volunteers reported that the correction process required them to keep looking back and forth between the image and the OCR in order to correct errors one-by-one, whereas typing up a fresh transcription was apparently quick and straightforward. It seems a shame to abandon the OCR, and I’m hopeful that we’ll eventually find a way to make it usable. The imperfect data might work as a stop-gap to make jokes searchable before they are manually corrected. We may be able to improve it using new OCR software, or speed up the correction process by making use of interface improvements like TILT. However, for now, the most effective way to convert the jokes into an accurate, machine-readable format is simply to transcribe directly from the image.

16 October 2014

Curious Roads to Cross: British Library, Burning Man and the art of David Normal

California-based artist, David Normal will be talking about how he used images from  part of the British Library Flickr commons, one million images release as inspiration for artwork he created for the Burning Man Festival tomorrow Friday 17 October, between 1500 - 1600, at the British Library, Chaucer Suite, Conference Centre, London (places are very limited if you are interested in attending, see below for booking information).

10677587_10152756150680087_1666190930_oCrossroads of Curiosity at Burning Man, Nevada, 25 August to 1 September 2014

With a special interest in 19th century illustration, David created the ‘Crossroads of Curiosity’, which was on display from 25 August – 1 September 2014 at the festival in Nevada.

David recently blogged about his work on the British Library's Digital Scholarship blog,

David will bring large prints of his work and talk about each painting and focus on specific details of the work. He will also explain the production process of the work right through to the de-installation of it at Burning Man.

Booking information

Don't miss out on this fantastic opportunity to see how the British Library's digital content is being used to inspire artists. If you are interested in attending, please email with the subject 'David Normal: BL and Burning Man' no later than Friday 17 October, 1100, 2014.

10 October 2014

Introducing Paper Machines

In the welcome surroundings of the refurbished Institute of Historical Research, Jo Guldi (Brown University) kicked off the 2014 Autumn Term programme of the IHR Digital History Seminar. In town to discuss The History Manifesto, her new open access book co-authored with David Armitage, Guldi's talk ranged from the public role of the historians, the Digital Humanities and new model of publishing to impending environmental catastrophe, the need for deep history and data processing tools that can help citizen and scholars alike overcome the problems of modern bureaucracy. To see how Guldi weaved all this threads together, you'll need to watch the video below. Here I just want to tease in no particular order at a few of threads that stuck in my mind, threads that pertain to most, if not all, digital history projects that pass through the seminar.

Tools as provocations: Paper Machines is a research tool. But it is also a provocation, an experiment with using large swathes of information to inform historical research in the longue durée, a vantage point - the tools makers argue - historians take not often enough. The tool, in short, is the argument.

What we need now: As we sit on the precipice of environmental catastrophe, does it not behove us to think about what digital projects we need? Do we want digital projects that analyse art for art's sake, that recapitulate old research paradigms and do not address problems of a wider, public relevance?

Hypothesis generation: At the heart of Paper Machines is hypothesis generation. It allows the scholar to take a vast paper archive and facet that archive, make visualisations, select where to read closely. How that macro to micro scaling changes the history that is written, how scholarly debates mature to integrate the inevitable discrepancies between interpretations made at these scales is the challenge historians must re-engage with.

Being bold about method: Works that change the focus of disciplines usually open their accounts by stating 'you missed this because your method was wrong'. Digital history can and should do the same, it can and should be bold about how it comes to the conclusions it does rather than hide the methods, ways, and means that underpin its particular take on historical phenomena.

My partial, incomplete, CC BY notes on the seminar are available on GitHub Gist.

The next Digital History seminar, 'Interrogating the archived UK web: Historians and Social Scientists Research Experiences', will take place on 4 November and a full listing of Autumn Term seminars is available on the IHR Website.

James Baker

Curator, Digital Research


This was originally posted on the IHR Digital History Seminar blog.


Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

08 October 2014

British Library Labs Symposium 2014

Tim Hitchcock will launch the second annual British Library Labs Symposium with a keynote focusing on “Big and small data in the humanities”. Roly Keating, the Library’s Chief Executive, will then present awards to the 2014 Labs competition winners.

The event will take place on Monday 3rd of November, between 0930 and 1730 at the Conference Centre, British Library.

The 2014 competition’s winners will present their work followed by presentations from researchers who are actively using the Library's digital content; such as text mining scanned books to help build a cityscape of Edinburgh, analysing the performance characteristics of some of our digitised live music recordings and how it might be possible to measure the impact of releasing digital content and data into the public domain.

The Labs Team will talk about some of their surprising findings and the impact of the project, for instance, the tremendous appetite for openly licensed digital content when the team released over 1 million images on to Flickr, with the collection already having over 200 million views in less than a year and stirring great interest from artists, researchers and the general public.

Adam Farquhar will highlight key lessons that we’ve learned working closely with researchers since the Labs were formed and discuss how the Labs is transforming our approach to providing services to support digital research and scholarship. He will also outline plans for the Labs over the coming years and give further details of the 2015 Competition.

The event will end with a reception to celebrate the end of the first phase of the project and there will be an entertaining talk and live music performance which tries to connect the 'old' with the new digital world. Sarah Angliss, composer, automatist and sound historian will explore some unlikely affinities between sound making today and in the Victorian era through presentation and musical performance and she will be facilitate a D.I.Y. session where you can record your own voice on wax cylinder. The Labs team will be around to talk about their Data Giveaway if you are interested in getting hold of large of amounts of open and free digital content for your institution / organisation on portable devices.

 For more information about the event and to register, please visit the Labs Symposium page.

01 October 2014

The Art of Data

Last month the Digital Research team organized another succesful Digital Conversations event. The evening, chaired by Anthony Lilley, brought together artists, researchers and art critics to reflect on projects and ideas around the use of digital data in contemporary artistic expression. Ernest Edmonds started the discussion by defining data as something that is constantly moving around and how this movement is, by its turn, constantly transforming data. Take for example communication. When we communicate our ideas and conceptions of the world, we are producing some sort of data that is transmitted from us to be perceived and given a meaning by others. This interactive process implies that data is transformed over time as different people can interpret the same data in different ways. Art, conceived as a mode of expression, follows the same logic: a single work can trigger different interpretations even by the same person: when we see a work of art for the first time we have a reaction that might change when are exposed to it on a second occasion. What digital data offers to artists is the possibility to explore new ways of representing the world, highlighting how data is constantly changing our perceptions. Digital data, according to Edmonds, is the ‘new canvas’ that artists use to make us aware of the transformation of data over time.


Michael Takeo Magruder continued the discussion by presenting some of his own artistic projects, arguing that the adoption of real time data by artists emphasizes the work of art as something in constant change, that is, something which is never really finished. Michael’s Data_Plex (economy) project was used as an example to illustrate this argument. The project was based on live data produced by the Dow Jones Industrial Average (DJI) index, represented in a simulated urban environment in the form of skyscrapers. The virtual buildings were erected or destroyed according to the fluctuations of the stock market. The audience was not only able to visualise a complex data structure in a more intelligible representation system but, more importantly, it became aware of how unstable the whole economic market in the USA was, as buildings were constantly changing in size, colour and shape to represent the variations of the market. During times of financial crisis, the audience could see the virtual buildings falling down as the stocks crashed, revealing in this way how some specific industries in the financial market are more prone to be affected by financial crises than others which, in the virtual urban space created by Michael, were kept intact. In a more metaphorical level, the artwork produced a criticism of capitalism as an unstable economic system that has the power to build up as well as to destroy what it has constructed.

 Julia Freeman spoke of data as something that involves complexity as we all consume data in very different ways. Data is a broad and overused term and therefore we need to think of a ‘taxonomy of data’ in order to try to understand a little more about what makes data important to us. In the digital world there is a whole movement advocating for data to become open for anyone to use it but there is still little understanding on how this data will be used and how it can transform society. Talking about her own work, Julie explained about her interest in using live data – data that comes from biological systems – to explore new ways in which we can connect with our environment beyond sensorial perception. In one of her projects, The Lake, Julie tagged 16 fishes from different species in a particular lake containing a population of circa 3,000 other fishes. The idea of the project was to track the movements of the different schools by translating it into visual and acoustic data. The result was a complicated network of sounds and images that created different patterns by showing the levels of activity between fish species according to different times of the day. In a period of six weeks the project generated more than 5 million data points producing interesting colour patterns and sound compositions.

  The Lake

The Lake, by Julia Freeman © 

The last speaker of the evening, Kevin Walker, started his presentation by raising a controversial point in arguing that we are living in an age of digital data terror. Digital data is a buzzword and most of us who deal with it rarely question what are the sources that produce data, what are they meaning or even what sort of stories lay behind them. The role of the artist, in this context, is to interrogate technology and the data it produces. When experimenting with digital data, artists often arrive at unusual and surprising results, transforming information into experiences through design. Kevin illustrate his ideas by presenting some of the work done by his students at the Royal Academy of Arts, who are transforming data into perceptual experiences, normally through representing this data by sounds and graphic images. Students enrolled in the Information Experience Design programme run by the RCA are encouraged to integrate digital data from various sources into visual displays that translate the data into meaningful information for the audience in the same way described by the other speakers in the evening. These works emphasize digital data as a re-usable source that can be recycled and transformed into art. The question for the future, as Kevin points out, is how artists will move from working with digital to deal with quantum data. This question remains open so we should watch this space….

 The audience participated eagerly in the discussions by posing interesting questions to the panel, much of them around the interactive nature of contemporary art in the digital environment. As explained in the presentations, art can add an essential meaning to real time data by turning it into visual and acoustic representations to the audience. As this data changes constantly, so does the work of art. This also suggests another interesting point that relates to the difficult task of preserving these works to future generations. Since transformation in real time permeates the aesthetic concept of digital data in contemporary artistic expression, it would probably be a good time for us to rethink our concept of preservation in a world of constant change.

You can watch the event here and below. Special thanks to Susan Ferreira for recording the video!

Digital Conversations #5: Digital data and artistic expression from Aquiles Baryner on Vimeo.


By Aquiles Alencar-Brayner

Curator, Digital Research