Digital scholarship blog

Enabling innovative research with British Library digital collections


Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

07 March 2018

Breathe, A Digital Ghost Story

Recently I posted about The Cartographer's Confession an immersive digital story based in London, where readers interact with the app on location. However, then the ‘beast from the east’ arrived in the UK and made walking in London rather a bracing and slippy experience last week!

So if during the cold weather you prefer seeking chills of a different kind, you may like to read Breathe, a digital ghost story, from the comfort and warmth of your sofa or bed. The story takes about fifteen minutes to read, is designed for mobile devices and it is available for free. To start reading, go to


Written by Kate Pullinger, the work is collaboration with Editions at Play, which is itself collaboration between Google Creative Labs Sydney and London-based publisher Visual Editions. The result is a literary experience delivered using Application Programming Interfaces (APIs) and context recognition technology. The app uses place, time, context and environment to place the reader in the story, making the experience individual and personal to each reader.  Kate has blogged about creating Breathe on The Writing Platform.

As with the other two Ambient Literature commissioned literary works, the research project team are looking for participants to try out Breathe and talk to them about their reading experience. If you are interested in assisting, please fill out this form and one of the researchers will be in touch via email to arrange a time to talk. If you have any questions about this process, please contact Dr Michael Marcinkowski.

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom and member of the Ambient Literature Advisory Board.

28 February 2018

Announcing the BL Labs roadshows locations and dates for 2018!

The @BL_Labs Roadshows: dates and locations for 2018

Do you want to learn more about the British Library’s digital collections? Are you interested in discovering how other researchers have used our digitised material in creative and innovative ways? Would you like to give us feedback on the kinds of services we are providing and would like to provide for digital scholars? Come and meet Library staff and gain an insight into some of the opportunities and challenges of working with our digital content. Get advice, pick up tips, and consider entering the digital project you have been working on for one of the BL Labs Awards (deadline Thursday 11th October 2018).

Our @BL_Labs Roadshows will be held at university departments across the UK between March and June 2018. Events will include presentations from the British Library and host institutions, practical hands-on workshops, a chance to explore and discuss what you may do with some of the Library's data and for you to speak to and get feedback from experts. We’re also keen to hear your views on some of the long-term services the British Library is hoping to develop for those who want to work with our digital collections and data.

Register for one of the roadshows! They are FREE to attend and OPEN TO ALL (unless otherwise stated). For further details about locations we are visiting this year, see below: 

Scanned British Isles with places JPEG correctetd
BL Labs Roadshow locations for 2018


  • Monday 26 March 2018 (10:00 – 13:00) - BL Labs Roadshow at CityLIS (City University of London Department of Library and Information Science), London (internal event)



  • Wednesday 2 May 2018 (09:30 – 13:00) - BL Labs Roadshow at the University of Edinburgh, Edinburgh
  • Tuesday 15 May 2018 (09:00 – 13:00) - BL Labs Roadshow at the University of Wolverhampton, Wolverhampton
  • Wednesday 16 May 2018 (12:30 – 16:00) - BL Labs Roadshow at the University of Lincoln, Lincoln


  • Tuesday 5 June 2018 (12:00 – 16:00) - BL Labs Roadshow at the University of Leeds, Leeds
  • Wednesday 27 June 2018 (09:00 – 13:00) - BL Labs Roadshow at the University of Birmingham, Birmingham

You will be able to view the full programme details for each of the roadshows, and book your place via Eventbrite. Links will be live shortly or visit our events page.

For any further questions, please contact us at

The British Library Labs project is funded by the Andrew W. Mellon Foundation and the British Library.

Posted by BL Labs

23 February 2018

The Cartographer's Confession

Last summer I posted about the Ambient Literature project, which is researching if and how digital media can create bridges between story and place. Forming the heart of this project, three authors; Kate Pullinger, James Attlee and Duncan Speakman have each created new experimental works that respond to the presence of a reader, and these aim to show how we can redefine the rules of the reading experience through innovative use of technology.

I’m pleased to report the one of the Ambient Literature commissioned works; The Cartographer's Confession by James Attlee is the winner of the 8th annual if:book award for New Media Writing, which was presented at Bournemouth University recently.

if:book award winner James Attlee, with (left to right) Chris Meade, Justine Solomons, Jim Pope, Andy Campbell, Stella Wisdom and Emma Whittaker

The Cartographer’s Confession is an immersive story based in London, where readers interact with the app on location, to discover the long-hidden secrets of ‘The Cartographer’.  Containing visual material, as well as having an original musical soundtrack, this is a ‘mixed reality’ experience. Accepting his award from if:book director Chris Meade, Attlee confessed that this blending of sound, video and story is something he had wanted to do previously alongside his print-based works, but he wasn’t able to make it happen until collaborating with digital producer Emma Whittaker.

I very much enjoyed this work, especially the music, and I also encourage you to try the experience. All you will need is a smartphone, a set of headphones, and the ability to visit a number of locations in London, through which the story unfolds (though there is also an ‘armchair mode’ if you are unable to get to London). You can download the app for free and it is available on iOS and Android.

Furthermore, as Ambient Literature is research project, they are very keen to speak with participants; to learn about their reactions to the work. So if you have completed The Cartographer's Confession (or are close to finishing it) and willing to be interviewed about your experience for about 15 minutes (either in person or over Skype, Facetime etc.), please fill out this form  and one of the project researchers will be in touch via email to arrange a time to talk. If you have any questions about this process, please contact Dr Michael Marcinkowski.

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom and member of the Ambient Literature Advisory Board.

22 February 2018

BL Labs 2017 Symposium: Picturing Canada and Interactive Map (Staff Award Runner Up)

Putting collection metadata on the map: Picturing Canada

The Picturing Canada project began in 2012 as a British Library, Eccles Centre and Wikimedia UK collaboration to digitise a collection and experiment with releasing high quality reproductions of collection items into the public domain. At its heart the project sought to open up an under-used collection of photographs, connecting them with new audiences and uses outside of the walls of the British Library. It also provided a template for the Library’s subsequent public domain releases and has been provided many around with an insight into the depth of the Library’s Canadian collections.

Before the collection could be released it needed to be digitised and robust metadata created. Fortunately the Library had a good working batch of metadata created off the back of work done by researchers from Dalhousie University in the 1980s. The initial use of this to the project was clear but in digitising the images and putting them and the metadata online something became apparent; most images had some sort of information (be it a title or a photographer’s studio address) that could be used to determine a geographical location for the images.

At the time, this realisation was parked for future investigation but the 2015 exhibition, ‘Canada Through the Lens’, drawing off the same digitised collection, opened up an opportunity to try and use this information to map the collection and generate new insights into its contents. Much of the coordinate determination and mapping was done by Joan Francis, co-awardee of the BL Labs runner-up prize, who worked to find and add coordinates for the photographs. This was a relatively simple but time-consuming process involving finding locations in the metadata image title or, in the case of a photographer’s studio address, on the photograph itself. These text-based locations were then converted into co-ordinates compatible with Google Fusion Tables (there’s an excellent tutorial here) and added to records for each image.


The result of this is the map that you see above, a series of points which can be clicked on to see a partial metadata record for the item as well as a link to the photograph itself on Wikipedia Commons. As the work is time-consuming and fraught with potential error we have still only worked to a robust mapping of about four fifths of the collection and this is the work you see here. Interestingly, map is not just a useful finding aid – although it performs this function very well.

Mapping the collection also provides insight into the geography of photographic production in Canada during the period this collection was created (1895 – 1923). It is clear, for instance, how significant the eastern metropolitan areas of Toronto, Montreal and Quebec are to Canada’s photographic production in this period. Similarly, the corridors of production seen running close to the Canada-US border and occasionally spurring north also suggest the significance of the railroad to Canada’s photographic economy. So the map helps users to find images but also offers more questions; an exciting prospect for continued work.

Posted by BL Labs on behalf of Philip Hatfield and Joan Francis

Submit a project for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

21 February 2018

BL Labs 2017 Symposium: Opening up the British Library’s Early Indian Printed Books Collection (Staff Award Winner)

Making the British Library’s valuable collection of early Bengali books more accessible to researchers and the general public around the world rests heavily on the collaborative work undertaken across different teams of the library and partners in the UK and abroad. The commitment and passion of the project team has relied on the contribution and expertise of collaborators, as well as the forward thinking vision of the library, partners and fundraisers.

Receiving the BL Labs Staff Award 2017 is a great opportunity to thank everyone involved. 

Members of the Two Centuries of Indian Print team receiving the British Library Labs award at the Symposium on 30th October.
Members of the Two Centuries of Indian Print team receiving the British Library Labs award at the Symposium on 30th October 2017
Tom Derrick (Digital Curator) was in India at the same time the team received their Award.
Tom Derrick (Digital Curator) was in India at the same time the team received their Award

The Two Centuries of Indian Print project is a partnership between the British Library, the School of Cultural Texts and Records (SCTR) at Jadavpur University, Srishti Institute of Art, Design and Technology, and the Library at SOAS University of London, among others. It has also involved collaborations with the National Library of India, and other institutions in India.

The AHRC Newton-Bhabha Fund and the Department for Business, Energy and Industrial Strategy have generously funded the work undertaken so far by the project, focusing on early printed Bengali books. Many are unavailable in other library collections or are extremely difficult to locate and access. The project has undertaken a variety of initiatives from the digitisation of books and enhancement of the catalogue records in English and Bengali, to stimulating the use of digital humanities tools and techniques, running a programme of digital skills sharing and capacity building workshops, and hosting the South Asia Series seminars. All of these initiatives greatly contribute to the discovery and study of the collection. The project is also conducting ground breaking work in finding a solution to Optical Character Recognition (OCR) in Bangla script. OCR is not available for South Asian languages currently and harnessing viable Optical Character Recognition technology would enable full text search of the books, paving the way for researchers to use natural language processing techniques to perform large scale analysis across a large corpus of text covering a diverse range of topics relating to Indian society, religion, and politics to name but a few. Doing so will increase the possibilities for new discoveries in this academic field. 

However, despite its status as one of the most widely spoken languages in the world, Bangla script has been greatly underserved by providers of OCR solutions. This is due in part to the orthographical and typographical variances that have taken place in recent centuries that make building a dictionary and character ‘classifier’ more challenging. Due to the wide date range of the books we are digitising, these issues affect the quality of OCR. The physical condition of our historical books, including faded text, presents additional difficulties for creating machine readable versions of the books. 

To overcome these obstacles, the project team has been advancing the development of OCR for Bangla through the organisation of an international competition which reviewed the state-of-the-art in commercial and open source text recognition tools. The results of the competition will be announced at the ICDAR 2017 conference in Kyoto later this month. Watch this space! The competition dataset has been made openly available for download and reuse for any researchers or institutions who would like to experiment with OCR for Bengali.

A page from the Animal Biographies, VT 1712 showing its transcription produced for the ICDAR 2017 competition
A page from the Animal Biographies, VT 1712 showing its transcription produced for the ICDAR 2017 competition

The project has organised two Skills Exchange Programmes, hosting mid-career Library professionals from the the National Library of India at the British Library for a week, providing a packed programme of tours and talks from all areas of the Library. The project has also conducted digital skills sharing and capacity building workshops for library professionals and archivists from cultural heritage institutions in India. The first workshop took place at Jadavpur University, Kolkata, in December 2016. Library and information professionals from cultural heritage institutions in Bengal took part in a one-day event to learn more about how information technology is transforming humanities research today and in turn Library services, as well as the methods for interrogating humanities-related datasets.

Afterthe success of this first workshop another event was held in July 2017, at which more than 30 library professionals discussed OCR developments for Bangla, trying out different tools and discussing digital scholarship techniques and projects. Most recently, the project’s digital curator facilitated a workshop around Digitisation Standards at the International Conference of Asian Libraries in Delhi. The workshops continue in earnest in the new year with another digital humanities skills workshop planned for January 2018 to be held in partnership with the Srishti Institute of Art, Design, and Technology.

Attendees of the workshop held at Jadavpur University in December 2016 taking part in a group activity to discuss the application of digital humanities methods to library collections
Attendees of the workshop held at Jadavpur University in December 2016 taking part in a group activity to discuss the application of digital humanities methods to library collections

The Project Team also held a two day Academic Symposium on South Asian book history at Jadavpur University in the summer, with 17 speakers from India, wider South Asia, and the UK. Attendance was between 50-70 people a day and feedback was very good.  We plan to have a publication arising from this Symposium, and to upload a video to our project webspace. The project also hosts a popular series of talks based around the Two Centuries of Indian Print project and the British Library’s South Asia collections. The seminars take place fortnightly at the British Library. So far we have hosted a range of academics and researchers, from PhD students to senior academics from the UK and abroad, who share cutting-edge research with discussion chaired by curators and specialists in the field. The seminars have been a great success attracting large attendances and speakers from around the world. We also host a number of show and tells of our material to raise awareness for our collection and to engage in community outreach.

Everyone on the project is thrilled to have won this award and we will be working hard in 2018 to continue bringing the Two Centuries of Indian Print project to the attention and use of researchers and the general public.

Submit a project for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

Posted by BL Labs on behalf of The Two Centuries of Indian Print team.

15 February 2018

BL Labs 2017 Symposium: Git Lit, Learning & Teaching Award Runner Up

Applications of Distributed Version Control Technologies Toward the Creation of 50,000 Digital Scholarly Editions

The British Library maintains a collection of roughly 50,000 digital texts, scanned from public-domain books, most of which were originally published in the 19th century. As scanned books, their text format is Analyzed Layout and Text Object (ALTO) Extensible Markup Language (XML), a verbose markup format created by Optical Character Recognition (OCR) software, and one which is only marginally human-readable. Our project, Git-Lit, converts each text to the plain text format Markdown, creates version-controlled repositories for each using the distributed version control system Git, and posts the repositories to the project management platform GitHub, where they can be edited by anyone. Along the way, websites for each text, optimized for readability, are automatically generated via GitHub Pages. These websites integrate with the annotation platform, enabling them to be annotated. In this way, Git-Lit aims to make this collection of British Library electronic texts discoverable, readable, editable, annotatable, and downloadable.

A Screenshot of the Website Automatically Generated from the British Library Electronic Text
A Screenshot of the Website Automatically Generated from the British Library Electronic Text

The biggest advantage of using a distributed version control system like Git is that it leverages the kinds of decentralized collaboration workflows that have long been in use in software development. Open-source software and web development, for which Git and GitHub were originally designed, is a much-studied methodology, long proven to be more effective than closed-source methods. Rather than maintain a central silo for serving code and electronic texts, the decentralized approach ensures a plurality of textual versions. Since anyone may copy ("fork") a project, modify it, and create their own version, there is no one central, canonical text, but many. Each version may freely borrow ("pull") from others, request that others integrate their changes ("pull request"), and discuss potential changes ("issues") using the project management subsystems of GitHub. This workflow streamlines collaboration, and encourages external contributions. Furthermore, since each change ("commit") requires a description of the commit, and reasons for it, the Git platform enforces the kind of editorial documentation necessary for scholarly editing. We like to think of git-based editing, therefore, as scholarly editing, and GitHub-based collaboration as a democratization of scholarly editing.

Furthermore, since GitHub allows instant editing of texts in the web browser, it is a simple and intuitive method of crowdsourcing the text cleanup process. Since OCRd texts are often full of errors, GitHub allows any reader to correct an obvious OCR error she or he finds. The analogous process of reporting errors to centralized text repositories like Project Gutenberg has been known to take several years. On GitHub, however, it is instantaneous.

Not the least advantage of this setup is the automated creation of websites from the plain text sources. Not only does this transform the markdown to a clean, readable edition of the text, but it provides integration with the annotation platform allows for social annotation of a text, making it ideal for classroom use. Professors may assign a British Library text as a course reading, and may require their students annotate it, an activity which can generate discussions in the limitless virtual margins of this electronic textual space.

The Git-Lit project has so far posted around 50 texts to GitHub, as prototypes, with the full corpus of roughly 50,000 texts soon to come. After the full corpus is processed in this way, we'll begin enhancing some of the metadata. So far, we have developed techniques for probabilistically inferring the language of each text, and using Ben Schmidt's document vectorization method, Stable Random Projection, we have been able to probabilistically infer Library of Congress classifications, as well. This enables the automatic generation of sub-corpora like PR (British Literature), or PZ (American Literature).

In the coming year, we hope to integrate the Git-Lit transformed British Library texts into a structured database, further enhancing the discoverability of its texts. We have just received a micro-grant from NYC-DH to help launch Corpus-DB, a project also aiming to produce textual corpora, and through Corpus-DB, we will soon create a SQL database containing the metadata, our enhanced and inferred metadata, and other aggregated book data gleaned from public APIs. This will soon allow readers and computational text analysts the ability to download groups of British Library electronic texts. Users interested in downloading, say, all novels set in London, will be able to get a complete full-text dump of all public-domain novels in this category by visiting a URL such as We expect that this will greatly streamline the corpus creation process that takes up so much of the time of a computational text analysis.

Both Git-Lit and Corpus-DB are open-source projects, open to contributions from anyone, regardless of skill. If you'd like to contribute to our project in some way, get in contact with us, and we'll tell you how you can help.

Jonathan Reeve
Jonathan Reeve

Jonathan Reeve is a third-year graduate student in the Department of English and Comparative Literature at Columbia University, where he specializes in computational literary analysis. Find his recent experiments at

If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library to find out who wins.

Posted by BL Labs on behalf of Jonathan Reeve

14 February 2018

BL Labs 2017 Symposium: Movable Type, Commercial Award Winner

Movable Type is a tabletop word game, and something of a love letter to classic books and authors, made completely of custom playing cards. While the game’s appearance might remind you of Scrabble, it has some tricks up its sleeve to give it a much more modern and dynamic feel.

Movable Type Card Game
Figure 1: Movable Type Card Game

The initial idea for Movable Type was born around two years ago. I had been making games for some years and knew I wanted to do something with a word game. My main objective was to have a game that was very interactive, easy to grasp, and tactical, while also being very quick to play. As much as I love word games, some of them have a tendency to outstay their welcome. 

The central mechanism in Movable Type is called card-drafting. This method allows players to pick their letter cards each round – this does away with the large amounts of luck you find in many classic word games, and instead shifts attention onto the tactical decisions of the players. It also means that rules are kept very simple and that players can take their turns simultaneously, creating a much more dynamic play environment.

Movable Type - The Cards
Movable Type - The Cards

The prototype for Movable Type was only a few weeks old when I settled on the art style I was going to use in the final product. I’ve been a long-time fan of the British Library Flickr account, which lets users browse through images from public domain books. Once I had spotted the large collection of initial capitals, I was sold!

Movable Type - Illustrated Letters
Movable Type - Illustrated Letters

My wife, Tiffany Moon, is a graphic designer by trade. She helped clean up the images and present these beautiful pieces of art in a colourful new fashion, appropriate for a retail product. I also wanted portraits of some of my favourite authors to be in the game, so I commissioned Alisdair Wood to create woodcut-style images of ten classic, influential and diverse literary figures. Without those initial capital images taken from the British Library collection and used to direct the game’s overall style, Movable Type would likely not look half as impressive and definitely wouldn’t resonate with me and many players like the current style does.

Movable Type - Illustrated Famous People
Movable Type - Illustrated Famous People

I launched Movable Type on the crowdfunding platform, Kickstarter, last year. Upon its release, it won the Imirt Irish Game Award for Best Analog Game and second runner-up for Game of the Year. It received good reception at several public events and sold out of its initial print run, so I decided that a second edition of the game was in order. That bigger and better second edition is funding on Kickstarter and should be in some select retail stores by mid to late 2018 (fingers crossed!).

Movable Type receiving the Commercial Category BL Lab Award, was a huge boon for the reputation of the game and myself as a game designer. Furthermore, it was a genuine honour to be at the British Library for this event and able to share my product with the audience there.

If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

Posted by BL Labs on behalf of Robin David O’Keeffe

13 February 2018

BL Labs 2017 Symposium: Samtla, Research Award Runner Up

Samtla (Search And Mining Tools for Labelling Archives) was developed to address a need in the humanities for research tools that help to search, browse, compare, and annotate documents stored in digital archives. The system was designed in collaboration with researchers at Southampton University, whose research involved locating shared vocabulary and phrases across an archive of Aramaic Magic Texts from Late Antiquity. The archive contained texts written in Aramaic, Mandaic, Syriac, and Hebrew languages. Due to the morphological complexity of these languages, where morphemes are attached to a root morpheme to mark gender and number, standard approaches and off-the-shelf software were not flexible enough for the task, as they tended to be designed to work with a specific archive or user group. 

Figure 1: Samtla supports tolerant search allowing queries to be matched exactly and approximately. (Click to enlarge image)

  Samtla is designed to extract the same or similar information that may be expressed by authors in different ways, whether it is in the choice of vocabulary or the grammar. Traditionally search and text mining tools have been based on words, which limits their use to corpora containing languages were 'words' can be easily identified and extracted from text, e.g. languages with a whitespace character like English, French, German, etc. Word models tend to fail when the language is morphologically complex, like Aramaic, and Hebrew. Samtla addresses these issues by adopting a character-level approach stored in a statistical language model. This means that rather than extracting words, we extract character-sequences representing the morphology of the language, which we then use to match the search terms of the query and rank the documents according to the statistics of the language. Character-based models are language independent as there is no need to preprocess the document, and we can locate words and phrases with a lot of flexibility. As a result Samtla compensates for the variability in language use, spelling errors made by users when they search, and errors in the document as a result of the digitisation process (e.g. OCR errors). 

Figure 2: Samtla's document comparison tool displaying a semantically similar passage between two Bibles from different periods. (Click to enlarge image)

 The British Library have been very supportive of the work by openly providing access to their digital archives. The archives ranged in domain, topic, language, and scale, which enabled us to test Samtla’s flexibility to its limits. One of the biggest challenges we faced was indexing larger-scale archives of several gigabytes. Some archives also contained a scan of the original document together with metadata about the structure of the text. This provided a basis for developing new tools that brought researchers closer to the original object, which included highlighting the named entities over both the raw text, and the scanned image.

Currently we are focusing on developing approaches for leveraging the semantics underlying text data in order to help researchers find semantically related information. Semantic annotation is also useful for labelling text data with named entities, and sentiments. Our current aim is to develop approaches for annotating text data in any language or domain, which is challenging due to the fact that languages encode the semantics of a text in different ways.

As a first step we are offering labelled data to researchers, as part of a trial service, in order to help speed up the research process, or provide tagged data for machine learning approaches. If you are interested in participating in this trial, then more information can be found at

Figure 3: Samtla's annotation tools label the texts with named entities to provide faceted browsing and data layers over the original image. (Click to enlarge image)

 If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

Posted by BL Labs on behalf of Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang.