Digital scholarship blog

Enabling innovative research with British Library digital collections

59 posts categorized "Printed books"

05 April 2024

Curious about using 'public domain' British Library Flickr images?

We regularly get questions from people who want to re-use the images we've published on the British Library's Flickr account. They might be listed as 'No known copyright restrictions' or 'public domain'.

We're always pleased to hear that people want to use images from our Flickr collection, and appreciate folk who get in touch to check if their proposed use - particularly commercial use - is ok.

So, yes, our Public Domain images are out of copyright and available for re-use without restriction, including commercial re-use. You can find out more about our images at https://web.archive.org/web/20230325130540/https://www.bl.uk/about-us/terms-and-conditions/content-on-flickr-and-wikimedia-commons

You don't have to credit the Library when using our public domain images, but we always appreciate credit where possible as a way of celebrating their re-use and to help other people find the collection.

If you'd like to credit us, you can say something like 'Images courtesy of the British Library’s Flickr Collection'.

We also love hearing how people have used our images, so please do let us know ([email protected]) about the results if you do use them.

By Digital Curator Mia Ridge for the British Library's Digital Research team

12 September 2023

Convert-a-Card: Past, Present and Future of Catalogue Cards Retroconversion

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected].

 

It’s been more than eight years, in June 2015, since the British Library launched its crowdsourcing platform, LibCrowds, with the aim of enhancing access to our collections. The first project series on LibCrowds was called Convert-a-Card, followed by the ever-so-popular In the Spotlight project. The aim of Convert-a-Card was to convert print card catalogues from the Library’s Asian and African Collections into electronic records, for inclusion in our online catalogue Explore.

A significant portion of the Library's extensive historical collections was acquired well before the advent of standard computer-based cataloguing. Consequently, even though the Library's online catalogue offers public access to tens of millions of records, numerous crucial research materials remain discoverable solely through searching the traditional physical card catalogues. The physical cards provide essential information for each book, such as title, author, physical description (dimensions, number of pages, images, etc.), subject and a “shelfmark” – a reference to the item’s location. This information still constitutes the basic set of data to produce e-records in libraries and archives.

Card Catalogue Cabinets in the British Library’s Asian & African Studies Reading Room © Jon Ellis
Card Catalogue Cabinets in the British Library’s Asian & African Studies Reading Room © Jon Ellis

 

The initial focus of Convert-a-Card was the Library’s card catalogues for Chinese, Indonesian and Urdu books – you can read more about this here and here. Scanned catalogue cards were uploaded to Flickr (and later to our Research Repository), grouped by the physical drawer in which they were originally located. Several of these digitised drawers became projects on LibCrowds.

 

Crowdsourcing Retroconversion

Convert-a-Card on LibCrowds included two tasks:

  1. Task 1 – Search for a WorldCat record match: contributors were asked to look at a digitised card and search the OCLC WorldCat database based on some of the metadata elements printed on it (e.g. title, author, publication date), to see if a record for the book already exists in some form online. If found, they select the matching record.
  2. Task 2 – Transcribe the shelfmark: if a match was found, contributors then transcribed the Library's unique shelfmark as printed on the card.

Online volunteers worked on Pinyin (Chinese), Indonesian and Urdu records, mainly between 2015 and 2019. Their valuable contributions resulted in lists of new records which were then ingested into the Library's Explore catalogue – making these items so much more discoverable to our users. For cards only partially matched with online records, curators and cataloguers had a special area on the LibCrowds platform through which they could address some of the discrepancies in partial matches and resolve them.

An example of an Urdu catalogue card
An example of an Urdu catalogue card

 

After much consideration, we have decided to sunset LibCrowds. However, you can see a good snapshot of it thanks to the UK Web Archive (with thanks to Mia Ridge and Filipe Bento for archiving it), or access its GitHub pages – originally set up and maintained by LibCrowds creator Alex Mendes. We have been using mainly Zooniverse for crowdsourcing projects (see for example Living with Machines projects), and you can see here some references to these and other crowdsourcing initiatives. Sunsetting LibCrowds provided us with the opportunity to rethink Convert-a-Card and consider alternative, innovative ways to automate or semi-automate the retroconversion of these valuable catalogue cards.

 

Text Recognition

As a first step, we were looking to automate the retrieval of text from the digitised cards using OCR/Machine Learning. As mentioned, this text includes shelfmark, title, author, place and date of publication, and other information. If extracted accurately enough, this text could be used for WorldCat lookup, as well as for enhancement of existing records. In most cases, the text was typewritten in English, often with additional information, or translation, handwritten in other languages. To start with, we’ve decided to focus only on the typewritten English – with the aspiration to address other scripts and languages in the future.

Last year, we ran some comparative testing with ABBYY FineReader Server (the software generally used for in-house OCR) and Transkribus, to see how accurately they perform this task. We trialled a set of cards with two different versions of ABBYY, and three different models for typewritten Latin scripts in Transkribus (Model IDs 29418, 36202, and 25849). Assessment was done by visually comparing the original text with the OCRed text, examining mainly the key areas of text which are important for this initiative, i.e. the shelfmark, author’s name and book title. For the purpose of automatically recognising the typewritten English on the catalogue cards, Transkribus Model 29418 performed better than the others – and more accurately than ABBYY’s recognition.

An example of a Pinyin card in Transkribus, showing segmentation and transcription
An example of a Pinyin card in Transkribus, showing segmentation and transcription

 

Using that as a base model, we incrementally trained a bespoke model to recognise the text on our Pinyin cards. We’ve also normalised the resulting text, for example removing spaces in the shelfmark, or excluding unnecessary bits of data. This model currently extracts the English text only, with a Character Error Rate (CER) of 1.8%. With more training data, we plan on extending this model to other types of catalogue cards – but for now we are testing this workflow with our Chinese cards.

 

Entities Extraction

Extracting meaningful entities from the OCRed text is our next step, and there are different ways to do that. One such method – if already using Transkribus for text extraction – is training and applying a bespoke P2PaLA layout analysis model. Such model could identify text regions, improve automated segmentation of the cards, and help retrieve specific regions for further tasks. Former colleague Giorgia Tolfo tested this with our Urdu cards, with good results. Trying to replicate this for our Chinese cards was not as successful – perhaps due to the fact that they are less consistent in structure.

Another possible method is by using regular expressions in a programming language. Research Software Engineer (RSE) Harry Lloyd created a Jupyter notebook with Python code to do just that: take the PAGE XML files produced by Transkribus, parse the XML, and extract the title, author and shelfmark from the text. This works exceptionally well, and in the future we’ll expand entity recognition and extraction to other types of data appearing on the cards. But for now, this information suffices to query OCLC WorldCat and see if a matching record exists.

One of the 26 drawers of Chinese (Pinyin) card catalogues © Jon Ellis
One of the 26 drawers of Chinese (Pinyin) card catalogues © Jon Ellis

 

Matching Cards to WorldCat Records

Entities extracted from the catalogue cards can now be used to search and retrieve potentially matching records from the OCLC WorldCat database. Pulling out WorldCat records matched with our card records would help us create new records to go into our cataloguing system Aleph, as well as enrich existing Aleph records with additional information. Previously done by volunteers, we aim to automate this process as much as possible.

Querying WorldCat was initially done using the z39.50 protocol – the same one originally used in LibCrowds. This is a client-server communications protocol designed to support the search and retrieval of information in a distributed network environment. With an excellent start by Victoria Morris and Giorgia Tolfo, who developed a prototype that uses PyZ3950 and PyMARC to query WorldCat, Harry built upon this, refined the code, and tested it successfully for data search and retrieval. Moving forward, we are likely to use the OCLC API for this – which should be a lot more straightforward!

 

Curator/Cataloguer Disambiguation

Getting potential matches from WorldCat is brilliant, but we would like to have an easy way for curators and cataloguers to make the final decision on the ideal match – which WorldCat record would be the best one as a basis to create a new catalogue record on our system. For this purpose, Harry is currently working on a web application based on Streamlit – an open source Python library that enables the building and sharing of web apps. Staff members will be able to use this app by viewing suggested matches, and selecting the most suitable ones.

I’ll leave it up to Harry to tell you about this work – so stay tuned for a follow-up blog post very soon!

 

03 August 2023

My AHRC-RLUK Professional Practice Fellowship: A year on

A year ago I started work on my RLUK Professional Practice Fellowship project to analyse computationally the descriptions in the Library’s incunabula printed catalogue. As the project comes to a close this week, I would like to update on the work from the last few months leading to the publication of the incunabula printed catalogue data, a featured collection on the British Library’s Research Repository. In a separate blogpost I will discuss the findings from the text analysis and next steps, as well as share my reflections on the fellowship experience.

Since Isaac’s blogpost about the automated detection of the catalogue entries in the OCR files, a lot of effort has gone into improving the code and outputting the descriptions in the format required for the text analysis and as open datasets. With the invaluable help of Harry Lloyd who had joined the Library’s Digital Research team as Research Software Engineer, we verified the results and identified new rules for detecting sub-entries signaled by Another Copy rather than a main entry heading. We also reassembled and parsed the XML files, originally split in two sets per volume for the purpose of generating the OCR, so that the entries are listed in the order in which they appear in the printed volume. We prepared new text files containing all the entries from each volume with each entry represented as a single line of text, that I could use for the corpus linguistics analysis with AntConc. In consultation with the Curator, Karen Limper-Herz, and colleagues in Collection Metadata we agreed how best to store the data for evaluation and in preparation to update the Library’s online catalogue.

Two women looking at the poster illustrating the text analysis with the incunabula catalogue data
Poster session at Digital Humanities Conference 2023

Whilst all this work was taking place, I started the computational analysis of the English text from the descriptions. The reason for using these partial descriptions was to separate what was merely transcribed from the incunabula from the more language used by the cataloguer in their own ‘voice’. I have recorded my initial observations in the poster I presented at the Digital Humanities Conference 2023. Discussing my fellowship project with the conference attendees was extremely rewarding; there was much interest in the way I had used Transkribus to derive the OCR data, some questions about how the project methodology applies to other data and an agreement on the need to contextualise collections descriptions and reflect on any bias in the transmission of knowledge. In the poster I also highlight the importance of the cross-disciplinary collaboration required for this type of work, which resonated well with the conference theme of Collaboration as Opportunity.

I have started disseminating the knowledge gained from the project with members of the GLAM community. At the British Library Harry, Karen and I ran an informal ‘Hack & Yack’ training session showcasing the project aims and methodology through the use of Jupyter notebooks. I also enjoyed the opportunity to discuss my research at a recent Research Libraries UK Digital Scholarship Network workshop and look forward to further conversations on this topic with colleagues in the wider GLAM community. 

We intend to continue to enrich the datasets to enable better access to the collection, the development of new resources for incunabula research and digital scholarship projects. I would like to end by adding my thanks to Graham Jevon, for assisting with the timely publication of the project datasets, and above all to James, Karen and Harry for supporting me throughout this project.

This blogpost is by Dr Rossitza Atanassova, Digital Curator, British Library. She is on Twitter @RossiAtanassova  and Mastodon @[email protected]

 

29 November 2022

My AHRC-RLUK Professional Practice Fellowship: Four months on

In August 2022 I started work on a project to investigate the legacies of curatorial voice in the descriptions of incunabula collections at the British Library and their future reuse. My research is funded by the collaborative AHRC-RLUK Professional Practice Fellowship Scheme for academic and research libraries which launched in 2021. As part of the first cohort of ten Fellows I embraced this opportunity to engage in practitioner research that benefits my institution and the wider sector, and to promote the role of library professionals as important research partners.

The overall aim of my Fellowship is to demonstrate new ways of working with digitised catalogues that would also improve the discoverability and usability of the collections they describe. The focus of my research is the Catalogue of books printed in the 15th century now at the British Museum (or BMC) published between 1908 and 2007 which describes over 12,700 volumes from the British Library incunabula collection. By using computational approaches and tools with the data derived from the catalogue I will gain new insights into and interpretations of this valuable resource and enable its reuse in contemporary online resources. 

Titlepage to volume 2 of the Catalogue of books printed in the fifteenth century now in the British Museum, part 2, Germany, Eltvil-Trier
BMC volume 2 titlepage


This research idea was inspired by a recent collaboration with Dr James Baker, who is also my mentor for this Fellowship, and was further developed in conversations with Dr Karen Limper-Herz, Lead Curator for Incunabula, Adrian Edwards, Head of Printed Heritage Collections, and Alan Danskin, Collections Metadata Standards Manager, who support my research at the Library.

My Fellowship runs until July 2023 with Fridays being my main research days. I began by studying the history of the catalogue, its arrangement and the structure of the item descriptions and their relationship with different online resources. Overall, the main focus of this first phase has been on generating the text data required for the computational analysis and investigations into curatorial and cataloguing practice. This work involved new digitisation of the catalogue and a lot of experimentation using the Transkribus AI-empowered platform that proved best-suited for improving the layout and text recognition for the digitised images. During the last two months I have hugely benefited from the expertise of my colleague Tom Derrick, as we worked together on creating the training data and building structure models for the incunabula catalogue images.

An image from Transkribus Lite showing a page from the catalogue with separate regions drawn around columns 1 and 2, and the text baselines highlighted in purple
Layout recognition output for pages with only two columns, including text baselines, viewed on Transkribus Lite

 

An image from Transkribus Lite showing a page from the catalogue alongside the text lines
Text recognition output after applying the model trained with annotations for 2 columns on the page, viewed on Transkribus Lite

 

An image from Transkribus Lite showing a page from the catalogue with separate regions drawn around 4 columns of text separated by a single text block
Layout recognition output for pages with mixed layout of single text block and text in columns, viewed on Transkribus Lite

Whilst the data preparation phase has taken longer than I had planned due to the varied layout of the catalogue, this has been an important part of the process as the project outcomes are dependent on using the best quality text data for the incunabula descriptions. The next phase of the research will involve the segmentation of the records and extraction of relevant information to use with a range of computational tools. I will report on the progress with this work and the next steps early next year. Watch this space and do get in touch if you would like to learn more about my research.

This blogpost is by Dr Rossitza Atanassova, Digital Curator for Digitisation, British Library. She is on Twitter @RossiAtanassova  and Mastodon @[email protected]

12 April 2022

Making British Library collections (even) more accessible

Daniel van Strien, Digital Curator, Living with Machines, writes:

The British Library’s digital scholarship department has made many digitised materials available to researchers. This includes a collection of digitised books created by the British Library in partnership with Microsoft. This is a collection of books that have been digitised and processed using Optical Character Recognition (OCR) software to make the text machine-readable. There is also a collection of books digitised in partnership with Google. 

Since being digitised, this collection of digitised books has been used for many different projects. This includes recent work to try and augment this dataset with genre metadata and a project using machine learning to tag images extracted from the books. The books have also served as training data for a historic language model.

This blog post will focus on two challenges of working with this dataset: size and documentation, and discuss how we’ve experimented with one potential approach to addressing these challenges. 

One of the challenges of working with this collection is its size. The OCR output is over 20GB. This poses some challenges for researchers and other interested users wanting to work with these collections. Projects like Living with Machines are one avenue in which the British Library seeks to develop new methods for working at scale. For an individual researcher, one of the possible barriers to working with a collection like this is the computational resources required to process it. 

Recently we have been experimenting with a Python library, datasets, to see if this can help make this collection easier to work with. The datasets library is part of the Hugging Face ecosystem. If you have been following developments in machine learning, you have probably heard of Hugging Face already. If not, Hugging Face is a delightfully named company focusing on developing open-source tools aimed at democratising machine learning. 

The datasets library is a tool aiming to make it easier for researchers to share and process large datasets for machine learning efficiently. Whilst this was the library’s original focus, there may also be other uses cases for which the datasets library may help make datasets held by the British Library more accessible. 

Some features of the datasets library:

  • Tools for efficiently processing large datasets 
  • Support for easily sharing datasets via a ‘dataset hub’ 
  • Support for documenting datasets hosted on the hub (more on this later). 

As a result of these and other features, we have recently worked on adding the British Library books dataset library to the Hugging Face hub. Making the dataset available via the datasets library has now made the dataset more accessible in a few different ways.

Firstly, it is now possible to download the dataset in two lines of Python code: 

Image of a line of code: "from datasets import load_dataset ds = load_dataset('blbooks', '1700_1799')"

We can also use the Hugging Face library to process large datasets. For example, we only want to include data with a high OCR confidence score (this partially helps filter out text with many OCR errors): 

Image of a line of code: "ds.filter(lambda example: example['mean_wc_ocr'] > 0.9)"

One of the particularly nice features here is that the library uses memory mapping to store the dataset under the hood. This means that you can process data that is larger than the RAM you have available on your machine. This can make the process of working with large datasets more accessible. We could also use this as a first step in processing data before getting back to more familiar tools like pandas. 

Image of a line of code: "dogs_data = ds['train'].filter(lamda example: "dog" in example['text'].lower()) df = dogs_data_to_pandas()

In a follow on blog post, we’ll dig into the technical details of datasets in some more detail. Whilst making the technical processing of datasets more accessible is one part of the puzzle, there are also non-technical challenges to making a dataset more usable. 

 

Documenting datasets 

One of the challenges of sharing large datasets is documenting the data effectively. Traditionally libraries have mainly focused on describing material at the ‘item level,’ i.e. documenting one dataset at a time. However, there is a difference between documenting one book and 100,000 books. There are no easy answers to this, but libraries could explore one possible avenue by using Datasheets. Timnit Gebru et al. proposed the idea of Datasheets in ‘Datasheets for Datasets’. A datasheet aims to provide a structured format for describing a dataset. This includes questions like how and why it was constructed, what the data consists of, and how it could potentially be used. Crucially, datasheets also encourage a discussion of the bias and limitations of a dataset. Whilst you can identify some of these limitations by working with the data, there is also a crucial amount of information known by curators of the data that might not be obvious to end-users of the data. Datasheets offer one possible way for libraries to begin more systematically commuting this information. 

The dataset hub adopts the practice of writing datasheets and encourages users of the hub to write a datasheet for their dataset. For the British library books, we have attempted to write one of these datacards. Whilst it is certainly not perfect, it hopefully begins to outline some of the challenges of this dataset and gives end-users a better sense of how they should approach a dataset. 

29 September 2021

Sailing Away To A Distant Land - Mahendra Mahey, Manager of BL Labs - final post

Posted by Mahendra Mahey, former Manager of British Library Labs or "BL Labs" for short

[estimated reading time of around 15 minutes]

This is is my last day working as manager of BL Labs, and also my final posting on the Digital Scholarship blog. I thought I would take this chance to reflect on my journey of almost 9 years in helping to set up, maintain and enabling BL Labs to become a permanent fixture at the British Library (BL).

BL Labs was the first digital Lab in a national library, anywhere in the world, that gets people to experiment with its cultural heritage digital collections and data. There are now several Gallery, Library, Archive and Museum Labs or 'GLAM Labs' for short around the world, with an active community which I helped build, from 2018.

I am really proud I was there from the beginning to implement the original proposal which was written by several colleagues, but especially Adam Farquhar, former head of Digital Scholarship at the British Library (BL). The project was at first generously funded by the Andrew W. Mellon foundation through four rounds of funding as well as support from the BL. In April 2021, the project became a permanently funded fixture, helped very much by my new manager Maja Maricevic, Head of Higher Education and Science.

The great news is that BL Labs is going to stay after I have left. The position of leading the Lab will soon be advertised. Hopefully, someone will get a chance to work with my helpful and supportive colleague Technical Lead of Labs, Dr Filipe Bento, bright, talented and very hard working Maja and other great colleagues in Digital Research and wider at the BL.

The beginnings, the BL and me!

I met Adam Farquhar and Aly Conteh (Former Head of Digital Research at the BL) in December 2012. They must have liked something about me because I started working on the project in January 2013, though I officially started in March 2013 to launch BL Labs.

I must admit, I had always felt a bit intimidated by the BL. My first visit was in the early 1980s before the St Pancras site was opened (in 1997) as a Psychology student. I remember coming up from Wolverhampton on the train to get a research paper about "Serotonin Pathways in Rats when sleeping" by Lidov, feeling nervous and excited at the same time. It felt like a place for 'really intelligent educated people' and for those who were one for the intellectual elites in society. It also felt for me a bit like it represented the British empire and its troubled history of colonialism, especially some of the collections which made me feel uncomfortable as to why they were there in the first place.

I remember thinking that the BL probably wasn't a place for some like me, a child of Indian Punjabi immigrants from humble beginnings who came to England in the 1960s. Actually, I felt like an imposter and not worthy of being there.

Nearly 9 years later, I can say I learned to respect and even cherish what was inside it, especially the incredible collections, though I also became more confident about expressing stronger views about the decolonisation of some of these.  I became very fond of some of the people who work or use it, there are some really good kind-hearted souls at the BL. However, I never completely lost that 'imposter and being an outsider' feeling.

What I remember at that time, going for my interview, was having this thought, what will happen if I got the position and 'What would be the one thing I would try and change?'. It came easily to me, namely that I would try and get more new people through the doors literally or virtually by connecting them to the BL's collections (especially the digital). New people like me, who may have never set foot, or had been motivated to step into the building before. This has been one of the most important reasons for me to get up in the morning and go to work at BL Labs.

So what have been my highlights? Let's have a very quick pass through!

BL Labs Launch and Advisory Board

I launched BL Labs in March 2013, one week after I had started. It was at the launch event organised by my wonderfully supportive and innovative colleague, Digital Curator Stella Wisdom. I distinctly remember in the afternoon session (which I did alone), I had to present my 'ideas' of how I might launch the first BL Labs competition where we would be trying to get pioneering researchers to work with the BL's digital collections.

God it was a tough crowd! They asked pretty difficult questions, questions I myself was asking too which I still didn't know the answer too either.

I remember Professors Tim Hitchcock (now at Sussex University and who eventually sat (and is still sitting) on the BL Labs Advisory Board) and Laurel Brake (now Professor Emerita of Literature and Print Culture, Birkbeck, University of London) being in the audience together with staff from the Royal Library of Netherlands, who 6 months later launched their own brilliant KB Lab. Subsequently, I became good colleagues with Lotte Wilms who led their Lab for many years and is now Head of Research support at Tilburg University.

My first gut feeling overall after the event was, this is going to be hard work. This feeling and reality remained a constant throughout my time at BL Labs.

In early May 2013, we launched the competition, which was a really quick and stressful turnaround as I had only officially started in mid March (one and a half months). I remember worrying as to whether anyone would even enter!  All the final entries were pretty much submitted a few minutes before the deadline. I remember being alone that evening on deadline day near to midnight waiting by my laptop, thinking what happens if no one enters, it's going to be disaster and I will lose my job. Luckily that didn't happen, in the end, we received 26 entries.

I am a firm believer that we can help make our own luck, but sometimes luck can be quite random! Perhaps BL Labs had a bit of both!

After that, I never really looked back! BL Labs developed its own kind of pattern and momentum each year:

  • hunting around the BL for digital collections to make into datasets and make available
  • helping to make more digital collections openly licensed
  • having hundreds of conversations with people interested in connecting with the BL's digital collections in the BL and outside
  • working with some people more intensively to carry out experiments
  • developing ideas further into prototype projects
  • telling the world of successes and failures in person, meetings, events and social media
  • launching a competition and awards in April or May
  • roadshows before and after with invitations to speak at events around the world
  • the summer working with competition winners
  • late October/November the international symposium showcased things from the year
  • working on special projects
  • repeat!

The winners were announced in July 2013, and then we worked with them on their entries showcasing them at our annual BL Labs Symposium in November, around 4 months later.

'Nothing interesting happens in the office' - Roadshows, Presentations, Workshops and Symposia!

One of the highlights of BL Labs was to go out to universities and other places to explain what the BL is and what BL Labs does.  This ended up with me pretty much seeing the world (North America, Europe, Asia, Australia, and giving virtual talks in South America and Africa).

My greatest challenge in BL Labs was always to get people to truly and passionately 'connect' with the BL's digital collections and data in order to come up with cool ideas of what to actually do with them. What I learned from my very first trip was that telling people what you have is great, they definitely need to know what you have! However, once you do that, the hard work really begins as you often need to guide and inspire many of them, help and support them to use the collections creatively and meaningfully. It was also important to understand the back story of the digital collection and learn about the institutional culture of the BL if people also wanted to work with BL colleagues.  For me and the researchers involved, inspirational engagement with digital collections required a lot of intellectual effort and emotional intelligence. Often this means asking the uncomfortable questions about research such as 'Why are we doing this?', 'What is the benefit to society in doing this?', 'Who cares?', 'How can computation help?' and 'Why is it necessary to even use computation?'.

Making those connections between people and data does feel like magic when it really works. It's incredibly exciting, suddenly everyone has goose bumps and is energised. This feeling, I will take away with me, it's the essence of my work at BL Labs!

A full list of over 200 presentations, roadshows, events and 9 annual symposia can be found here.

Competitions, Awards and Projects

Another significant way BL Labs has tried to connect people with data has been through Competitions (tell us what you would like to do, and we will choose an idea and work collaboratively with you on it to make it a reality), Awards (show us what you have already done) and Projects (collaborative working).

At the last count, we have supported and / or highlighted over 450 projects in research, artistic, entrepreneurial, educational, community based, activist and public categories most through competitions, awards and project collaborations.

We also set up awards for British Library Staff which has been a wonderful way to highlight the fantastic work our staff do with digital collections and give them the recognition they deserve. I have noticed over the years that the number of staff who have been working on digital projects has increased significantly. Sometimes this was with the help of BL Labs but often because of the significant Digital Scholarship Training Programme, run by my Digital Curator colleagues in Digital Research for staff to understand that the BL isn't just about physical things but digital items too.

Browse through our project archive to get inspiration of the various projects BL Labs has been involved in or highlighted.

Putting the digital collections 'where the light is' - British Library platforms and others

When I started at BL Labs it was clear that we needed to make a fundamental decision about how we saw digital collections. Quite early on, we decided we should treat collections as data to harness the power of computational tools to work with each collection, especially for research purposes. Each collection should have a unique Digital Object Identifier (DOI) so researchers can cite them in publications.  Any new datasets generated from them will also have DOIs, allowing us to understand the ecosystem through DOIs of what happens to data when you get it out there for people to use.

In 2014, https://data.bl.uk was born and today, all our 153 datasets (as of 29/09/2021) are available through the British Library's research repository.

However, BL Labs has not stopped there! We always believed that it's important to put our digital collections where others are likely to discover them (we can't assume that researchers will want to come to BL platforms), 'where the light is' so to speak.  We were very open and able to put them on other platforms such as Flickr and Wikimedia Commons, not forgetting that we still needed to do the hard work to connect data to people after they have discovered them, if they needed that support.

Our greatest success by far was placing 1 million largely undescribed images that were digitally snipped from 65,000 digitised public domain books from the 19th Century on Flickr Commons in 2013. The number of images on the platform have grown since then by another 50 to 60 thousand from collections elsewhere in the BL. There has been significant interaction from the public to generate crowdsourced tags to help to make it easier to find the specific images. The number of views we have had have reached over a staggering 2 billion over this time. There have also been an incredible array of projects which have used the images, from artistic use to using machine learning and artificial intelligence to identify them. It's my favourite collection, probably because there are no restrictions in using it.

Read the most popular blog post the BL has ever published by my former BL Labs colleague, the brilliant and inspirational Ben O'Steen, a million first steps and the 'Mechanical Curator' which describes how we told the world why and how we had put 1 million images online for anyone to use freely.

It is wonderful to know that George Oates, the founder of Flickr Commons and still a BL Labs Advisory Board member, has been involved in the creation of the Flickr Foundation which was announced a few days ago! Long live Flickr Commons! We loved it because it also offered a computational way to access the collections, critical for powerful and efficient computational experiments, through its Application Programming Interface (API).

More recently, we have experimented with browser based programming / computational environments - Jupyter Notebooks. We are huge fans of Tim Sherrat who was a pioneer and brilliant advocate of OPEN GLAM in using them, especially through his GLAM Workbench. He is a one person Lab in his own right, and it was an honour to recognise his monumental efforts by giving him the BL Labs Research Award 2020 last year. You can also explore the fantastic work of Gustavo Candela and colleagues on Jupyter Notebooks and the ones my colleageue Filipe Bento created.

Art Exhibitions, Creativity and Education

I am extremely proud to have been involved in enabling two major art exhibitions to happen at the BL, namely:

Crossroads of Curiosity by David Normal

Imaginary Cities by Michael Takeo Magruder

I loved working with artists, its my passion! They are so creative and often not restricted by academic thinking, see the work of Mario Klingemann for example! You can browse through our archives for various artistic projects that used the BL's digital collections, it's inspiring.

I was also involved in the first British Library Fashion Student Competition won by Alanna Hilton, held at the BL which used the BL's Flickr Commons collection as inspiration for the students to design new fashion ranges. It was organised by my colleague Maja Maricevic, the British Fashion Colleges Council and Teatum Jones who were great fun to work with. I am really pleased to say that Maja has gone on from strength to strength working with the fashion industry and continues to run the competition to this day.

We also had some interesting projects working with younger people, such as Vittoria's world of stories and the fantastic work of Terhi Nurmikko-Fuller at the Australian National University. This is something I am very much interested in exploring further in the future, especially around ideas of computational thinking and have been trying out a few things.

GLAM Labs community and Booksprint

I am really proud of helping to create the international GLAM Labs community with over 250 members, established in 2018 and still active today. I affectionately call them the GLAM Labbers, and I often ask people to explore their inner 'Labber' when I give presentations. What is a Labber? It's the experimental and playful part of us we all had as children and unfortunately many have lost when becoming an adult. It's the ability to be fearless, having the audacity and perhaps even naivety to try crazy things even if they are likely to fail! Unfortunately society values success more than it does failure. In my opinion, we need to recognise, respect and revere those that have the courage to try but failed. That courage to experiment should be honoured and embraced and should become the bedrock of our educational systems from the very outset.

Two years ago, many of us Labbers 'ate our own dog food' or 'practised what we preached' when me and 15 other colleagues came together for 5 days to produce a book through a booksprint, probably the most rewarding professional experience of my life. The book is about how to set up, maintain, sustain and even close a GLAM Lab and is called 'Open a GLAM Lab'. It is available as public domain content and I encourage you to read it.

Online drop-in goodbye - today!

I organised a 30 minute ‘online farewell drop-in’ on Wednesday 29 September 2021, 1330 BST (London), 1430 (Paris, Amsterdam), 2200 (Adelaide), 0830 (New York) on my very last day at the British Library. It was heart-warming that the session was 'maxed out' at one point with participants from all over the world. I honestly didn't expect over 100 colleagues to show up. I guess when you leave an organisation you get to find out who you actually made an impact on, who shows up, and who tells you, otherwise you may never know.

Those that know me well know that I would have much rather had a farewell do ‘in person’, over a pint and praying for the ‘chip god’ to deliver a huge portion of chips with salt/vinegar and tomato sauce’ magically and mysteriously to the table. The pub would have been Mc'Glynns (http://www.mcglynnsfreehouse.com/) near the British Library in London. I wonder who the chip god was?  I never found out ;)

The answer to who the chip god was is in text following this sentence on white on white text...you will be very shocked to know who it was!- s

Spoiler alert it was me after all, my alter ego

Farwell-bl-labs-290921Mahendra's online farewell to BL Labs, Wednesday 29 September, 1330 BST, 2021.
Left: Flowers and wine from the GLAM Labbers arrived in Tallinn, 20 mins before the meeting!
Right: Some of the participants of the online farewell

Leave a message of good will to see me off on my voyage!

It would be wonderful if you would like to leave me your good wishes, comments, memories, thoughts, scans of handwritten messages, pictures, photographs etc. on the following Google doc:

http://tiny.cc/mahendramahey

I will leave it open for a week or so after I have left. Reading positive sincere heartfelt messages from colleagues and collaborators over the years have already lifted my spirits. For me it provides evidence that you perhaps did actually make a difference to somone's life.  I will definitely be re-reading them during the cold dark Baltic nights in Tallinn.

I would love to hear from you and find out what you are doing, or if you prefer, you can email me, the details are at the end of this post.

BL Labs Sailor and Captain Signing Off!

It's been a blast and lots of fun! Of course there is a tinge of sadness in leaving! For me, it's also been intellectually and emotionally challenging as well as exhausting, with many ‘highs’ and a few ‘lows’ or choppy waters, some professional and others personal.

I have learned so much about myself and there are so many things I am really really proud of. There are other things of course I wish I had done better. Most of all, I learned to embrace failure, my best teacher!

I think I did meet my original wish of wanting to help to open up the BL to as many new people who perhaps would have never engaged in the Library before. That was either by using digital collections and data for cool projects and/or simply walking through the doors of the BL in London or Boston Spa and having a look around and being inspired to do something because of it.

I wish the person who takes over my position lots of success! My only piece of advice is if you care, you will be fine!

Anyhow, what a time this has been for us all on this planet? I have definitely struggled at times. I, like many others, have lost loved ones and thought deeply about life and it's true meaning. I have also managed to find the courage to know what’s important and act accordingly, even if that has been a bit terrifying and difficult at times. Leaving the BL for example was not an easy decision for me, and I wish perhaps things had turned out differently, but I know I am doing the right thing for me, my future and my loved ones. 

Though there have been a few dark times for me both professionally and personally, I hope you will be happy to know that I have also found peace and happiness too. I am in a really good place.

I would like to thank former alumni of BL Labs, Ben O'Steen - Technical Lead for BL Labs from 2013 to 2018, Hana Lewis (2016 - 2018) and Eleanor Cooper (2018-2019) both BL Labs Project Officers and many other people I worked through BL Labs and wider in the Library and outside it in my journey.

Where I am off to and what am I doing?

My professional plans are 'evolving', but one thing is certain, I will be moving country!

To Estonia to be precise!

I plan to live, settle down with my family and work there. I was never a fan of Brexit, and this way I get to stay a European.

I would like to finish with this final sweet video created by writer and filmaker Ling Low and her team in 2016, entitled 'Hey there Young Sailor' which they all made as volunteers for the Malaysian band, the 'Impatient Sisters'. It won the BL Labs Artistic Award in 2016. I had the pleasure and honour of meeting Ling over a lovely lunch in Kuala Lumpa, Malaysia, where I had also given a talk at the National Library about my work and looked for remanants of my grandfather who had settled there many years ago.

I wish all of you well, and if you are interested in keeping in touch with me, working with me or just saying hello, you can contact me via my personal email address: [email protected] or follow my progress on my personal website.

Happy journeys through this short life to all of you!

Mahendra Mahey, former BL Labs Manager / Captain / Sailor signing off!

24 June 2021

My placement: Using Transkribus to OCR Two Centuries of Indian Print

I began a work placement with the Two Centuries of Indian Print project from the British Library working with my supervisor (Digital Curator) Tom Derrick, to automatically transcribe the Library’s Bengali books digitised and catalogued as part of the project. The OCR application we use for transcription is Transkribus, a leading text recognition application for historical documents. We also use a Google Sheet to instantly update each book’s basic information and job status.

In the first two days, I accepted training in how to use the Transkribus application by a face-to-face (virtual) demonstration from my supervisor since I didn't know how to use OCR. He also provided a manual for me to refer to in my practice. There are three main steps to complete a book transcription: uploading books, running layout analysis, and running text detection. We upload books from the British Library’s IIIF image viewer to Transkribus. I needed to first confirm the name and digital system number of a book from our team’s shared Google Sheet so that I could find the digital content of this book within the BL online catalogue. I would record the number of pages the book has into the Google Sheet at the same time. Then I copied the URL of the IIIF manifest and import this book into the collection of our project in Transkribus. After that, I would run layout analysis in Transkribus. It usually takes several minutes to run, and the more pages there are the more time it will take. Perfect layout analysis is where there is one baseline for each line of text on a page.

Although Transkribus is trained on 100+ pages, it still makes mistakes due to multiple causes. Title or chapter headers whose font size differs significantly from other text sometimes would be missed; patterned dividers and borders in the title page will easily been incorrectly identified as text; sometimes the color of paper is too dark, making it difficult to recognize the text. In these cases, the user needs to manually revise the recognition result. After checking the quality of the text analysis, I could then run text recognition. The final step is to check the results of the text recognition and update the Google Sheet.

TranskribusAppplication

Above: A view of a book in the Transkribus application, showing the page images and transcription underneath

During the three weeks of the placement, I handled a total of twelve books. In addition to the regular progression patterns described earlier, I was fortunate to come across several books that required special handling and used them to learn how to handle various situations. For example, the image above shows the result of text recognition for a page of the first book I dealt with in Transkribus, Dhārāpāta: prathama bhāg. Pāṭhaśālastha śiśu digera śikshārtha/ Cintāmani Pāl. Every word in this book is very short and widely spaced, making it very difficult for Transkribus to identify the layout. Because the book is only 28 pages long, I manually labeled all the layouts.

In addition to my work, I have had the pleasure of interacting with many British Library curators and investigators who are engaged in digitization. I attended a regular meeting of our project and learnt the division of labor of the digital project members. Besides, my supervisor Tom contacted some colleagues who work related to the digitization of Chinese collections and provided me with the opportunity to meet them, which has benefited me a lot.

The Principal Investigator for our 2CIP project, Adi, who also has been involved with research and development of Chinese OCR/HTR at the British Library, shared with me the challenges of Chinese OCR/HTR and the progress of current research at the British Library.

Curator for the International Dunhuang Project, Melodie, and a project manager, Tan, presented the research content and outcomes of the project. This project has many partner institutions in different countries that have collections related to the Silk Road. It is a very meaningful digitization project and I admire the development of this project.

The lead Curator for the British Library’s Chinese collections, Sara, introduced different types of Chinese collections and some representative collections in the British Library to me. She also shared with me the objective problems they would encounter when digitizing collections.

Three weeks passed quickly and I gained a lot from my experience at the British Library. In addition to the specifics of how to use Transkribus for text recognition, I have learned about the achievements and problems faced in digitizing Chinese collections from a variety of perspectives.

This is a guest post by UCL Digital Humanities MSc student Xinran Gu.

05 May 2021

Games in the Library and Games in the Woods

Congratulations to the winner, runners up and everyone who made a game last month for Leeds Libraries Games Jam on Novels That Shaped Our World, which invited jammers to create playful interactive adaptations of books in the BBC’s Novels that Shaped Our World list. To accompany this jam, they programmed a fantastic series of events, which if you missed seeing live, or want to re-watch, can be found in this YouTube playlist.

I absolutely love the premise of the winning submission Frankenstein's Double Wedding, Or, The Modern P…romeo…ethius by WretchedBees (Will Binns). You need a deck of cards to play this solo or cooperative game. Playing as Dr. Frankenstein, with the help of both your monster and betrothed, the game’s aim is to organise a double wedding, arranging catering, a florist, a venue and inviting wedding guests. Not forgetting, that you also need to create a spouse for your monster, before you can both get wed.

A silhoutte profile of a face looking to the left with a bolt of lightning in the face. There are also brains in lightbulbs and the spade, club, diamond and heart symbols from playing cards
Frankenstein's Double Wedding, Or, The Modern P…romeo…ethius by WretchedBees

Well deserved recognition also goes to the two runners up, these are The Open Wizarding Challenge by Suzini56, where to win, players navigate rooms and corridors of their wizarding school, dodging moving staircases and obstacles, aiming to be the first to reach the exit with their bag of collected items, picked up on the way. Also, Fortune of War: A game of Napoleonic era Naval Life by webcowgirl, which is based on Patrick O'Brian's Master and Commander books. Writing about her submission she says “this game tries to capture the flavor of the books, with its humor and humanity. Winning isn't just about money, it is ultimately also about pride, honor, and dignity.” Something we would all do well to remember.

A boardgame on a table with a paper ship at the centre of the board, and pot plants behind it
Fortune of War: A game of Napoleonic era Naval Life by webcowgirl

Other #NTSOWgamesjam submissions re-worked Pride and Prejudice, Nineteen Eighty-Four and Herman Melville's Bartleby, the Scrivener. You can check these out on the jam’s itch.io entries page. Being a Sandman graphic novels fan, I enjoyed looking at Of You by DarrenLEdwards, which has been structured so this tabletop roleplaying game could also be based on many other fantastical worlds such as Alice’s Adventures in Wonderland, The Neverending Story, The Wizard of Oz, Peter Pan, The Chronicles of Narnia, His Dark Materials etc.

If exploring fantasy worlds and playing games has inspired you to want to make a game, or if you are a seasoned game maker, then you may want to take part in our Games In the Woods jam this month, which I am running with Ash Green, Marion Tessier from Story Circles and Kingston Upon Thames Libraries, and Cheryl Tipp. This is an online tree themed game jam for all ages, which will run throughout the duration of the Urban Tree Festival. There will be an online launch event on Saturday 15th May with inspiring examples of interactive digital experiences featuring trees and a virtual “show & tell” event on Sunday 23rd May for jammers to celebrate their creations.

Before and during the Urban Tree Festival, game jammers can meet and chat with organisers and each other on our Discord Server: https://discord.gg/qWXH8NcjHE, so please join and say hello on there and use #gamesinthewoods on social media to share images and details of your work in progress.

A wood with a deer standing to the left and a fox standing on the right
Games in the Woods game jam

You are welcome to join alone or in a team to create digital and analogue games, interactive fiction, web comics, board games, escape games, card games – anything you want! The only constraints are time, the theme and your imagination. We especially encourage creative re-use of images from the British Library’s Flickr collection of digitised 19th century books, do check out these online Flora and Fauna galleries. There is also a fantastic curated selection of wildlife and environmental sound recordings picked by my colleague Cheryl Tipp, which you can use in your creations. These are available via this SoundCloud playlist.

Portrait photographs of Sue Thomas, Irini Papadimitriou and Cheryl Tipp
Sue Thomas, Irini Papadimitriou and Cheryl Tipp

Cheryl is also speaking at a free Digital Nature online event next Monday, 10th May, 19:30 - 20:30. Chaired by Irini Papadimitriou, Creative Director at Future Everything, this event also features Ben Eaton from Invisible Flock (read more about their woodland work Faint Signals here), and author of books on nature and technology Sue Thomas. This is part of the British Library’s springtime season of events The Natural Word, which explores nature writing and reflects on our need to reimagine our relationship with the environment. Hope to see you there.

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

Digital scholarship blog recent posts

Archives

Tags