Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

22 July 2021

Building the New Media Writing Prize Special Collection

The New Media Writing Prize is awarded annually to interactive works that use technology and digital tools in exciting and innovative ways. Organised by Bournemouth University, the prize is now in its 12th year and open for entries until 26th November 2021.

Banner saying "Innovative, Immersive, Interactive. The 2021 New Media Writing Prize is open for entries. Find out more.
The homepage banner on the New Media Writing Prize website

The British Library hosted a Digital Conversations event to celebrate the 10th anniversary of the prize in 2019 and as part of our work on collecting and preserving emerging formats, last year we started building a special collection to archive all shortlisted and winning entries to the prize in the UK Web Archive. At the moment of writing, the collection stands at 226 websites, including not only all the works that were web-based and live at the moment of collection, but blog posts, press kits, online reviews and author’s websites as well. This kind of contextual information (like the data recorded on the ELMCIP Knowledge Base website) is especially valuable in those instances where the work itself couldn’t be captured, due to the limitations of web archiving tools, or the fact that it had already disappeared from the Internet. More information on how the collection was conceived and developed is available in the Collection Scoping Document on the British Library Research Repository.

In order to improve access to the collection and assure quality for the websites we captured, a PhD placement project started at the beginning of this June. Tegan Pyke, from Cardiff Metropolitan University, is working on the collection to identify best captures for each of these works and is also developing a creative response to the collection.

Tegan writes:

From the New Media Writing Prize shortlists, a total of 78 works have been captured, with each work averaging 13 instances to compare and contrast. Each instance represents a web crawl undertaken by the team from the Emerging Formats project.

Screen capture of UKWA search results
A screenshot showing the instances collected for Serge Bouchardon’s 2011 Main Prize winning piece, "Loss of Grasp".

One of the most difficult aspects of this work has been deciding what, exactly, constitutes an ‘acceptable’ capture. By nature digital works are highly complex—featuring audio, visual, and kinetic assets—and using bespoke platforms, formats, and code. These attributes are heightened by the speed at which technology changes; what was acceptable a decade ago may be entirely defunct today, as is the case with Adobe removing their Flash Player support.

After an initial overview of the collection, I came to the conclusion that a strict set of criteria wouldn’t be appropriate. Nor would the capture of all aspects of a work, as many—such as Amira Hanafi’s What I’m Wearing and J R Carpenter’s The Gathering Cloud—make use of external links or externally hosted image and video files. If these lie outside the UK Legal Deposit’s scope, capturing them in their entirety becomes more difficult and sometimes impossible.

Instead, I decided to focus on narrative, asking three questions as I approached each instance: 

  • Can viewers complete the narrative? 
  • Does the theme remain understable?
  • Is the atmosphere (the overall mood of the piece) intact?

If an instance fulfils these questions, it’s acceptable, with the most complete of those captures being identified as suitable for display in the archive.

At this point, I’m half-way through comparing instances for the collection. Of the pieces captured, just less than half meet the criteria above. Out of these, most can be improved by additional crawls that capture the missing assets. Those that cannot be improved have, for the most part, been affected by software deprecation or EOL (end-of-life), where support has been completely removed.

I’m aiming to finish my review of the collection over the next couple of months, at which point I hope to provide further insight into the process. I’ve also started a collaboration with the BL's Wikimedian-in-Residence, Lucy Hinnie, to plan a Wikidata project related to the collection aiming to make use of contextual data points collected during its creation—I’m sure you’ll read about this work here soon!

This post is by Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari and Tegan Pyke, a PhD student at Cardiff Metropolitan University currently undertaking a placement in Contemporary British Published Collections at the British Library.

09 July 2021

Subjects Wanted for Soothing Sounds Psychology Studies

Can you help University of Reading researchers with their studies examining the potential therapeutic effects of  looking at ‘soothing’ images and listening to natural sounds on mental health and wellbeing?

Sound recordings for this research have been provided by Cheryl Tipp, Curator of Wildlife & Environmental Sounds, from the British Library Sound Archive.

One study focuses on young people; 13-17 year-olds are wanted for an easy online survey. Psychology Masters student Jasmiina Ryyanen from the University of Reading is asking young people to view and listen to 25 images and sounds, rating their moods before and after. Access the survey for 13-17 year-olds here: https://henley.eu.qualtrics.com/jfe/form/SV_eKaQjEf2H3Vqw9U.

Poster with details of Soothing Sounds student study for young people

There is also an online survey managed by Emily Witten, which is aimed at adults, so if you are over 18 please participate in this study: https://henley.eu.qualtrics.com/jfe/form/SV_cBa6tNtkN3fgkCO.  

Poster about Soothing Sounds student study for adults

Both surveys are completely randomised; some participants will be asked to look at images only, others to listen to sounds only, and the final group to look at images while listening to the sounds at the same time. These research projects have been fully approved by the University of Reading’s ethical standards board. If you have any questions about these surveys, please email Jasmiina Ryyanen (j.ryynanen(at)student.reading.ac.uk) and Emily Witten (e.i.c.witten(at)student.reading.ac.uk).

We hope you enjoy participating in these surveys and feel suitably soothed from the experience! 

This post is by Digital Curator Stella Wisdom (@miss_wisdom

24 June 2021

My placement: Using Transkribus to OCR Two Centuries of Indian Print

I began a work placement with the Two Centuries of Indian Print project from the British Library working with my supervisor (Digital Curator) Tom Derrick, to automatically transcribe the Library’s Bengali books digitised and catalogued as part of the project. The OCR application we use for transcription is Transkribus, a leading text recognition application for historical documents. We also use a Google Sheet to instantly update each book’s basic information and job status.

In the first two days, I accepted training in how to use the Transkribus application by a face-to-face (virtual) demonstration from my supervisor since I didn't know how to use OCR. He also provided a manual for me to refer to in my practice. There are three main steps to complete a book transcription: uploading books, running layout analysis, and running text detection. We upload books from the British Library’s IIIF image viewer to Transkribus. I needed to first confirm the name and digital system number of a book from our team’s shared Google Sheet so that I could find the digital content of this book within the BL online catalogue. I would record the number of pages the book has into the Google Sheet at the same time. Then I copied the URL of the IIIF manifest and import this book into the collection of our project in Transkribus. After that, I would run layout analysis in Transkribus. It usually takes several minutes to run, and the more pages there are the more time it will take. Perfect layout analysis is where there is one baseline for each line of text on a page.

Although Transkribus is trained on 100+ pages, it still makes mistakes due to multiple causes. Title or chapter headers whose font size differs significantly from other text sometimes would be missed; patterned dividers and borders in the title page will easily been incorrectly identified as text; sometimes the color of paper is too dark, making it difficult to recognize the text. In these cases, the user needs to manually revise the recognition result. After checking the quality of the text analysis, I could then run text recognition. The final step is to check the results of the text recognition and update the Google Sheet.

TranskribusAppplication

Above: A view of a book in the Transkribus application, showing the page images and transcription underneath

During the three weeks of the placement, I handled a total of twelve books. In addition to the regular progression patterns described earlier, I was fortunate to come across several books that required special handling and used them to learn how to handle various situations. For example, the image above shows the result of text recognition for a page of the first book I dealt with in Transkribus, Dhārāpāta: prathama bhāg. Pāṭhaśālastha śiśu digera śikshārtha/ Cintāmani Pāl. Every word in this book is very short and widely spaced, making it very difficult for Transkribus to identify the layout. Because the book is only 28 pages long, I manually labeled all the layouts.

In addition to my work, I have had the pleasure of interacting with many British Library curators and investigators who are engaged in digitization. I attended a regular meeting of our project and learnt the division of labor of the digital project members. Besides, my supervisor Tom contacted some colleagues who work related to the digitization of Chinese collections and provided me with the opportunity to meet them, which has benefited me a lot.

The Principal Investigator for our 2CIP project, Adi, who also has been involved with research and development of Chinese OCR/HTR at the British Library, shared with me the challenges of Chinese OCR/HTR and the progress of current research at the British Library.

Curator for the International Dunhuang Project, Melodie, and a project manager, Tan, presented the research content and outcomes of the project. This project has many partner institutions in different countries that have collections related to the Silk Road. It is a very meaningful digitization project and I admire the development of this project.

The lead Curator for the British Library’s Chinese collections, Sara, introduced different types of Chinese collections and some representative collections in the British Library to me. She also shared with me the objective problems they would encounter when digitizing collections.

Three weeks passed quickly and I gained a lot from my experience at the British Library. In addition to the specifics of how to use Transkribus for text recognition, I have learned about the achievements and problems faced in digitizing Chinese collections from a variety of perspectives.

This is a guest post by UCL Digital Humanities MSc student Xinran Gu.