THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

22 March 2017

British Library Launches OCR Competition for Rare Indian Books

Calling all transcription enthusiasts! We’ve launched a competition to find an accurate and automated transcription solution for our rare Indian books and printed catalogue records, currently being digitised through the Two Centuries of Indian Print project. 

The competition, in partnership with the University of Salford’s PRIMA Research Lab, is part of the International Conference on Document Analysis and Recognition, taking place in Kyoto, Japan this November. The winners will be announced at a special event during the conference.

Digitised images of the books will be made openly available through the library’s website and we hope this competition will produce transcriptions that enable full text search and discovery of this rich material. Sharing XML transcriptions will also give researchers the foundation to apply computational tools and methods such as text mining that may lead to new insights into book and publishing history in India.   

Split into two challenges, those wishing to participate in the competition can enter either or both.

The first challenge is to find an automated transcription for the 19th century printed books written in Bengali script. Optical Character Recognition of many non-Latin scripts is a developing area, but still presents a considerable barrier for libraries and other cultural institutions hoping to open up their material for scholarly research.

Vt1712_Schoolbook_lion_0007

Above: A page from 'Animal Biography', one of the Bengali books being digitised as part of Two Centuries of Indian Print (VT 1712)

 

Challenge number two involves our printed catalogue records, known as ‘Quarterly Lists’. These describe books published in India between 1867 and 1967. The lists are arranged in tables and therefore accurately representing the layout of the data is important if researchers are able to use computational methods to identify chunks of information such as the place of publication and cost of the book.    

Quarterly_List

 Above: A typical double page from the Quarterly Lists (SV 412/8)

 

With the competition now open, we’ve already gone some way to helping participants by manually transcribing a few pages to create ‘ground truth’ using PRIMA's editing tool, Aletheia.  So if you or anyone you know would like to enter, do please register and you could be contributing to this landmark project, and picking up an award for your troubles!   

21 March 2017

Poetic Places and World Poetry Day 2017

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom.

Happy World Poetry Day!

The Digital Scholarship team are marking the day with an event exploring how poetry, history and literature can be discovered and experienced via digital technologies. Creative Entrepreneur-in-Residence Sarah Cole is talking about the development of Poetic Places, a free app for iOS and Android devices, that creates digital encounters with poems and literature in the locations described, accompanied by sounds and illustrations from cultural heritage collections; including the British Library's images on Flickr.

Being a creative type Sarah has also been using the Flickr collection in her new enterprise Badgical Kingdom, which takes images from galleries, libraries, archives, and museums released under Creative Commons licenses and turns them into badges. Sarah hopes to bring forgotten works out into the everyday world where they can be re-admired. Furthermore, every piece is sent with a card detailing a little of the design’s history and naming the institution which has made the work available; including the Rijksmuseum, whose collections have inspired these flower brooches, which could make perfect Mother's Day presents in my opinion.

Photo-02-02-2017-15-11-58 Billycock-Cat-reverse

Images of Billycock Cat Pin, copyright Sarah Cole.

Also speaking at the event are 

  • Dr Jennifer Batt, a lecturer in English, University of Bristol, who has been working with British Library Labs on an innovative project to data mine 18th-century newspapers for verse.
  • Dr Duncan Hay, from the Bartlett Centre for Advanced Spatial Analysis who works on the Survey of London, check out their map. It is also worth noting that Duncan is a colleague of Martin Zaltz Austwick, who did GPS mapping of a walk based around the first section of William Gull's coach ride in Alan Moore's From Hell. There is a short video of this here.

For those of you unable to join us this evening and also those of you who are; please check out the British Library's drama and literature recordings on SoundCloud. These include excellent poems from The Michael Marks Awards for Poetry Pamphlets winners and shortlisted entries and readings from other British Library events, enjoy ...

 Recording of Richard Scott reading from his pamphlet ‘Wound’, published by The Rialto

09 March 2017

Archaeologies of reading: guest post from Matthew Symonds, Centre for Editing Lives and Letters

Digital Curator Mia Ridge: today we have a guest post by Matthew Symonds from the Centre for Editing Lives and Letters on the Archaeologies of reading project, based on a talk he did for our internal '21st century curatorship' seminar series. Over to Matt...

Some people get really itchy about the idea of making notes in books, and dare not defile the pristine printed page. Others leave their books a riot of exclamation marks, sarcastic incredulity and highlighter pen.

Historians – even historians disciplined by spending years in the BL’s Rare Books and Manuscripts rooms – would much prefer it if people did mark books, preferably in sentences like “I, Famous Historical Personage, have read this book and think the following having read it…”. It makes it that much easier to investigate how people engaged with the ideas and information they read.

Brilliantly for us historians, rare books collections are filled with this sort of material. The problem is it’s also difficult to catalogue and make discoverable (nota bene – it’s hard because no institutions could afford to employ and train sufficient cataloguers, not because librarians don’t realise this is an issue).

The Archaeology of Reading in Early Modern Europe (AOR) takes digital images of books owned and annotated by two renaissance readers, the professional reader Gabriel Harvey and the extraordinary polymath John Dee, transcribes and translates all the comments in the margin, and marks up all traces of a reader’s intervention with the printed book and puts the whole thing on the Internet in a way designed to be useful and accessible to researchers and the general public alike.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-03-09/76bacc2c-befe-4e7c-b729-c49cf47adf0b.png
Screenshot, The Archaeology of Reading in Early Modern Europe

AOR is a digital humanities collaboration between the Centre for Editing Lives and Letters (CELL) at University College London, Johns Hopkins University and Princeton University, and generously funded by the Andrew W. Mellon Foundation.

More importantly, it’s also a collaboration between academic researchers, librarians and software engineers. An absolutely vital consideration of how we planned AOR, how we work on it, how we’re planning to expand it, was to identify a project that could offer a common ground to be shared between these three interests, where each party would have something to gain from it.

As one of the researchers, it was really important to me to avoid forming some sort of “client-provider” relationship with the librarians who curate and know so much about my sources, and the software engineers who build the digital infrastructure.

But we do use an academic problem as a means of giving our project a focus. In 1990, Antony Grafton and the late Lisa Jardine published their seminal article ‘“Studied for Action: how Gabriel Harvey read his Livy’ in the journal Past & Present.

One major insight of the article is that people read books in conjunction with one another, often for specific, pragmatic purposes. People didn’t pick up a book from their shelves, open at page one and proceed through to the finis, marking up as they went. They put other books next to them, books that explained, clarified, argued with one another.

By studying the marginalia, it’s possible to reconstruct these pathways across a library, recreating the strategies people used to manage the vast quantities of information they had at their disposal.

In order to produce this archaeology of reading, we’ve built a “digital bookwheel”, an attempt to recreate the revolving reading desk of the renaissance period which allowed the lucky owner to manoeuvre back and forth their books. From here, the user can call up the books we’ve digitised, read the transcriptions, and search for particular words and concepts.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2017-03-09/ac83353a40f24bea921e478b1450993e.png
Screenshot, The Archaeology of Reading in Early Modern Europe


It’s built out of open source materials, leveraging the International Image Interoperability Framework (IIIF) and the IIIF-compliant Mirador 2 Viewer. Interested parties can download the XML files of our transcriptions, as well as the data produced in the process.

The exciting thing for us is that all the work on creating this digital infrastructure – which is very much a work in progress -- has provided us with the raw materials for asking new research questions, questions that can only be asked by getting away from our computer and returning back to the rare books room.