Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

24 August 2021

Important information for email subscribers of the British Library's Digital Scholarship blog

Unfortunately, the third-party platform that the British Library uses for email notifications for our blogs is making changes to its infrastructure. This means that, from August 2021, we anticipate that email notifications will no longer be sent to subscribers (although the provider has been unable to specify when exactly these will cease).

To find out when new blog posts are published, we recommend following us on Twitter @BL_DigiSchol or checking this page on the British Library website where all our blogs are listed.

We want to assure you that we are actively looking into this issue and working to implement a solution which will continue your email notifications, however we do not know whether you will continue to receive notifications about new posts before we are able to implement this. But we promise to update the blog with further information as soon as we have it. Thank you for your patience and understanding while we resolve this.

We appreciate this is inconvenient and know many people are not on social media and have no intention of being so. Many rely on email notifications and may miss out without them. As soon as we have been able to implement a new solution we will post about it here. Thanks for bearing with us.

12 August 2021

Dates to discuss Wikidata at Wikimania 2021

Wikimania is often the highlight of any Wikimedian’s calendar. Hosted by the Wikimedia Foundation, Wikimania is a conference like no other. A large number of participants take part in the annual celebration of open knowledge and Wikimedia projects. Previous events have taken place in  Stockholm (2019), Cape Town (2018), Montreal (2017) and Italy (2016). Due to the ongoing global pandemic situation, this year's conference being held 13-17 August 2021 is taking place entirely online, something Wikimania is ideally suited for!

  Logo for Wikimania 2021, 4 squares, 1 with a drawing of 12 peoples faces as if they are in a videocall, the 2nd of 2 jigsaw puzzle pieces, the 3rd of paper confetti and the 4th square showing 2 people sitting at a table talking

In addition to more traditional conference sessions, Wikimania will be running an Unconference, a Community Village, and a community Hackathon. Communication is encouraged through a variety of channels including Telegram, IRC and Wiki talk pages.

Telegram machine
A photograph of an old telegraph key by Sandra Tan on Unsplash

Looking at the programme, so many interesting topics are on the table for presentation and discussion: from copyright reform, to innovation and community development, there’s a wide spectrum of material to interest all Wikimedians of every level. Handily, events are rated in terms of their suitability for beginners, to make things as welcoming as possible. There is a whole strand of presentations devoted to Wikidata, which you can view here.

I am very excited to be presenting remotely at this conference on behalf of the British Library. I will be introducing the work of Tom Derrick on the Bengali Books Wikisource Competition, and Dominic Kane (UCL) on the India Office Records project. We have shaped our panel to show what GLAM institutions can do to promote and effectively utilise Wiki platforms for public engagement with library and archive collections. Our panel will run on Sunday 15th of August at 8.15pm (7.15pm UTC).

Wikimania is free to attend online, 13-17 August 2021, registration is open until midnight on Thursday 12th August. We hope to see you there!

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian)

03 August 2021

Automating the Recognition of Chinese Manuscripts: New Chevening British Library Fellowship

 

The Chevening Fellowship Programme is the UK government’s international awards scheme aimed at fostering knowledge exchange and collaboration, and developing global leaders. In 2015, the Foreign, Commonwealth & Development Office (FCDO) has partnered with the British Library to offer professionals two new fellowships every year, and recently the two organisations have announced the renewal of their partnership until 2024/25.

Chevening logo and the British Library logo

These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. The Library has hosted international fellows through this scheme since 2016, with each fellowship framing a distinct project inspired by Library collections. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as archival material from Latin America and the Caribbean, African-language printed books, Nationalism, Independence, and Partition in South Asia and Big Data and Libraries.

We are thrilled to (re-)announce that one of the two placements available for the 2022/2023 academic year will focus on automating the recognition of historical Chinese handwritten texts. This fellowship, originally announced two years ago, had to be postponed due to the pandemic – and we are excited to be able to offer it again. This is a special opportunity to work in the Library’s Digital Research Team, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.
End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it is important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western languages print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, the Living with Machines project, has been applying Optical Character Recognition (OCR) technology to UK newspapers, designing and implementing new methods in data science and artificial intelligence, and analysing these materials at scale.

Taking a broader perspective on Library collections, we have been exploring opportunities with non-Western collections too. Library staff have been engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for English, Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. We have been working with Transkribus as well – for example, Alex Hailey, Curator for Modern Archives and Manuscripts, used the software to automatically transcribe 19th century botanical records from the India Office Records. An ongoing work led by Tom Derrick is to OCR our digitised collection of Bengali printed texts, digitised as part of the Two Centuries of Indian Print project.

 

Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
 
 
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2022 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” fellowship page: https://www.chevening.org/fellowship/british-library-historical-chinese-texts/.

Applications open on 3 August, 12:00 (midday) BST and close on 2 November, 12:00 (midday) GMT.

Good Luck!

This post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She is on twitter as @BL_AdiKS

 

22 July 2021

Building the New Media Writing Prize Special Collection

The New Media Writing Prize is awarded annually to interactive works that use technology and digital tools in exciting and innovative ways. Organised by Bournemouth University, the prize is now in its 12th year and open for entries until 26th November 2021.

Banner saying "Innovative, Immersive, Interactive. The 2021 New Media Writing Prize is open for entries. Find out more.
The homepage banner on the New Media Writing Prize website

The British Library hosted a Digital Conversations event to celebrate the 10th anniversary of the prize in 2019 and as part of our work on collecting and preserving emerging formats, last year we started building a special collection to archive all shortlisted and winning entries to the prize in the UK Web Archive. Thanks to Joan Francis for her valued support adding targets and metadata into the Annotation and Curation Tool, at the moment of writing, the collection stands at 226 websites, including not only all the works that were web-based and live at the moment of collection, but blog posts, press kits, online reviews and author’s websites as well. This kind of contextual information (like the data recorded on the ELMCIP Knowledge Base website) is especially valuable in those instances where the work itself couldn’t be captured, due to the limitations of web archiving tools, or the fact that it had already disappeared from the Internet. More information on how the collection was conceived and developed is available in the Collection Scoping Document on the British Library Research Repository.

In order to improve access to the collection and assure quality for the websites we captured, a PhD placement project started at the beginning of this June. Tegan Pyke, from Cardiff Metropolitan University, is working on the collection to identify best captures for each of these works and is also developing a creative response to the collection.

Tegan writes:

From the New Media Writing Prize shortlists, a total of 78 works have been captured, with each work averaging 13 instances to compare and contrast. Each instance represents a web crawl undertaken by the team from the Emerging Formats project.

Screen capture of UKWA search results
A screenshot showing the instances collected for Serge Bouchardon’s 2011 Main Prize winning piece, "Loss of Grasp".

One of the most difficult aspects of this work has been deciding what, exactly, constitutes an ‘acceptable’ capture. By nature digital works are highly complex—featuring audio, visual, and kinetic assets—and using bespoke platforms, formats, and code. These attributes are heightened by the speed at which technology changes; what was acceptable a decade ago may be entirely defunct today, as is the case with Adobe removing their Flash Player support.

After an initial overview of the collection, I came to the conclusion that a strict set of criteria wouldn’t be appropriate. Nor would the capture of all aspects of a work, as many—such as Amira Hanafi’s What I’m Wearing and J R Carpenter’s The Gathering Cloud—make use of external links or externally hosted image and video files. If these lie outside the UK Legal Deposit’s scope, capturing them in their entirety becomes more difficult and sometimes impossible.

Instead, I decided to focus on narrative, asking three questions as I approached each instance: 

  • Can viewers complete the narrative? 
  • Does the theme remain understandable?
  • Is the atmosphere (the overall mood of the piece) intact?

If an instance fulfils these questions, it’s acceptable, with the most complete of those captures being identified as suitable for display in the archive.

At this point, I’m half-way through comparing instances for the collection. Of the pieces captured, just less than half meet the criteria above. Out of these, most can be improved by additional crawls that capture the missing assets. Those that cannot be improved have, for the most part, been affected by software deprecation or EOL (end-of-life), where support has been completely removed.

I’m aiming to finish my review of the collection over the next couple of months, at which point I hope to provide further insight into the process. I’ve also started a collaboration with the BL's Wikimedian-in-Residence, Lucy Hinnie, to plan a Wikidata project related to the collection aiming to make use of contextual data points collected during its creation—I’m sure you’ll read about this work here soon!

This post is by Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari and Tegan Pyke, a PhD student at Cardiff Metropolitan University currently undertaking a placement in Contemporary British Published Collections at the British Library.

09 July 2021

Subjects Wanted for Soothing Sounds Psychology Studies

Can you help University of Reading researchers with their studies examining the potential therapeutic effects of  looking at ‘soothing’ images and listening to natural sounds on mental health and wellbeing?

Sound recordings for this research have been provided by Cheryl Tipp, Curator of Wildlife & Environmental Sounds, from the British Library Sound Archive.

One study focuses on young people; 13-17 year-olds are wanted for an easy online survey. Psychology Masters student Jasmiina Ryyanen from the University of Reading is asking young people to view and listen to 25 images and sounds, rating their moods before and after. Access the survey for 13-17 year-olds here: https://henley.eu.qualtrics.com/jfe/form/SV_eKaQjEf2H3Vqw9U.

Poster with details of Soothing Sounds student study for young people

There is also an online survey managed by Emily Witten, which is aimed at adults, so if you are over 18 please participate in this study: https://henley.eu.qualtrics.com/jfe/form/SV_cBa6tNtkN3fgkCO.  

Poster about Soothing Sounds student study for adults

Both surveys are completely randomised; some participants will be asked to look at images only, others to listen to sounds only, and the final group to look at images while listening to the sounds at the same time. These research projects have been fully approved by the University of Reading’s ethical standards board. If you have any questions about these surveys, please email Jasmiina Ryyanen (j.ryynanen(at)student.reading.ac.uk) and Emily Witten (e.i.c.witten(at)student.reading.ac.uk).

We hope you enjoy participating in these surveys and feel suitably soothed from the experience! 

This post is by Digital Curator Stella Wisdom (@miss_wisdom

24 June 2021

My placement: Using Transkribus to OCR Two Centuries of Indian Print

I began a work placement with the Two Centuries of Indian Print project from the British Library working with my supervisor (Digital Curator) Tom Derrick, to automatically transcribe the Library’s Bengali books digitised and catalogued as part of the project. The OCR application we use for transcription is Transkribus, a leading text recognition application for historical documents. We also use a Google Sheet to instantly update each book’s basic information and job status.

In the first two days, I accepted training in how to use the Transkribus application by a face-to-face (virtual) demonstration from my supervisor since I didn't know how to use OCR. He also provided a manual for me to refer to in my practice. There are three main steps to complete a book transcription: uploading books, running layout analysis, and running text detection. We upload books from the British Library’s IIIF image viewer to Transkribus. I needed to first confirm the name and digital system number of a book from our team’s shared Google Sheet so that I could find the digital content of this book within the BL online catalogue. I would record the number of pages the book has into the Google Sheet at the same time. Then I copied the URL of the IIIF manifest and import this book into the collection of our project in Transkribus. After that, I would run layout analysis in Transkribus. It usually takes several minutes to run, and the more pages there are the more time it will take. Perfect layout analysis is where there is one baseline for each line of text on a page.

Although Transkribus is trained on 100+ pages, it still makes mistakes due to multiple causes. Title or chapter headers whose font size differs significantly from other text sometimes would be missed; patterned dividers and borders in the title page will easily been incorrectly identified as text; sometimes the color of paper is too dark, making it difficult to recognize the text. In these cases, the user needs to manually revise the recognition result. After checking the quality of the text analysis, I could then run text recognition. The final step is to check the results of the text recognition and update the Google Sheet.

TranskribusAppplication

Above: A view of a book in the Transkribus application, showing the page images and transcription underneath

During the three weeks of the placement, I handled a total of twelve books. In addition to the regular progression patterns described earlier, I was fortunate to come across several books that required special handling and used them to learn how to handle various situations. For example, the image above shows the result of text recognition for a page of the first book I dealt with in Transkribus, Dhārāpāta: prathama bhāg. Pāṭhaśālastha śiśu digera śikshārtha/ Cintāmani Pāl. Every word in this book is very short and widely spaced, making it very difficult for Transkribus to identify the layout. Because the book is only 28 pages long, I manually labeled all the layouts.

In addition to my work, I have had the pleasure of interacting with many British Library curators and investigators who are engaged in digitization. I attended a regular meeting of our project and learnt the division of labor of the digital project members. Besides, my supervisor Tom contacted some colleagues who work related to the digitization of Chinese collections and provided me with the opportunity to meet them, which has benefited me a lot.

The Principal Investigator for our 2CIP project, Adi, who also has been involved with research and development of Chinese OCR/HTR at the British Library, shared with me the challenges of Chinese OCR/HTR and the progress of current research at the British Library.

Curator for the International Dunhuang Project, Melodie, and a project manager, Tan, presented the research content and outcomes of the project. This project has many partner institutions in different countries that have collections related to the Silk Road. It is a very meaningful digitization project and I admire the development of this project.

The lead Curator for the British Library’s Chinese collections, Sara, introduced different types of Chinese collections and some representative collections in the British Library to me. She also shared with me the objective problems they would encounter when digitizing collections.

Three weeks passed quickly and I gained a lot from my experience at the British Library. In addition to the specifics of how to use Transkribus for text recognition, I have learned about the achievements and problems faced in digitizing Chinese collections from a variety of perspectives.

This is a guest post by UCL Digital Humanities MSc student Xinran Gu.

18 June 2021

The VHS Tapes: Preserving Emerging Formats at the British Library

Researching how to collect, curate and preserve emerging formats is important work for us in the Library. Fortunately we aren't alone in our quest to understand how to manage born digital collections, we are active members of organisations such as the Digital Preservation Coalition and the Videogame Heritage Society, which are excellent networks and forums for us to share and learn from fellow GLAM professionals working in this area.

The Videogame Heritage Society (VHS) is a subject specialist network for digital game preservation, led by the National Videogame Museum (NVM), based in Sheffield. They provide advocacy, support and expertise on the preservation of digital games and digital game culture through a network of museums, heritage institutions, developers, publishers, private collectors and anyone with an interest in videogame history.

The VHS launch event on 21 February 2020 was one of the last physical events I attended before the first Covid-19 lockdown started. Due to the global pandemic, the NVM had to completely re-think how to deliver their programme of planned VHS events, and this has produced a new series of online events called VHS Tapes, which started in February 2021.

At these events, VHS lead Mikey, has been in conversation with members of the VHS community regarding the many issues surrounding digital game preservation, exhibition, and collection. Recordings of these can be found on the NVM's YouTube channel, in this playlist. They include conversations with the NVM's Conor ClarkeFoteini Aravani from the Museum of London and The Retro Hour Podcast. Not wanting to miss out on the fun! The British Library are invited speakers at an upcoming online VHS Tapes event on Tuesday 29 June 2021, 14:00-15:00, places are free, but please book here.

Lynda Clark, Giulia Carla Rossi and I will talk about the British Library’s research in collecting, curating and preserving emerging formats. Including eBook mobile apps, and web-based interactive works, such as those made with tools like Twine, which form the Interactive Narratives and New Media Writing Prize special collections in the UK Web Archive. We’ll discuss digital tools used to build these web archive collections, some of the content and themes of the interactive works collected, and the Library’s plans for the future. We hope to see you there!

A laptop screen showing the interface of the interactive writing tool Twine
An attendee working with the digital interactive writing tool Twine at a 2018 British Library Interactive Fiction Summer School course

This post is by Digital Curator Stella Wisdom (@miss_wisdom

14 June 2021

Adding Data to Wikidata is Efficient with QuickStatements

Once I was set up on Wikipedia (see Triangulating Bermuda, Detroit and William Wallace), I got started with Wikidata. Wikidata is the part of the Wikimedia universe which deals with structured data, like dates of birth, shelf marks and more.

Adding data to Wikidata is really simple: it just requires logging into Wikidata (or creating an account if you don’t already have one) and then pressing edit on any page. you want to edit.

Image of a Wikidata entry about Earth
Editing Wikidata

If the page doesn’t already exist, then creating it is also very simple: just select ‘create a new item’ from the menu on the left-hand side of the page.

When using Wikidata, there are some powerful tools which make adding data quicker and easier. One of these is Quick Statements. Unfortunately, using QuickStatements requires that you have made 50 edits on Wikidata before you make your first batch. Fortunately, it is rather quicker than Citation Hunt (for which, see Triangulating Bermuda, Detroit and William Wallace).

Image of Wikidata menu with 'Create a new item' highlighted
Creating a new item in Wikidata

I made those 50 edits very quickly, by setting up Wikidata item pages for each of the sample items from the India Office Records that we are working with (at the moment we are prioritising adding information about the records; further work will take place before any digitised items are uploaded to Wikimedia platforms). Basic information was added to each of the item pages.

Q107074264 (India Office List January 1885)

Q107074434 (India Office List July 1885)

Q107074463 (India Office List January 1886)

Q107074676 (India Office List July 1886)

Q107074754 (India Office List 1886 Supplement)

Q107074810 (1888-9 Report on the Administration of Bengal)

Q107074801 (1889-90 Report on the Administration of Bengal)

Once I had done this, it became clear that I needed to create more general pages, which could contain the DOIs that link back to the digitised records which are currently only accessible via batch download through the British Library research repository.

Q107134086 Page for administrative reports (V/10/60-1) in general.

Q107136752 Page for India lists (v/13/173-6) in general.

Image of the WikiProject page for the India Office Records
The WikiProject page for the India Office Records

The final preparatory step was to create a WikiProject page, which will facilitate collaboration on the project. This page contains links to all the pages involved in the project and will soon also contain useful resources such as templates for creating new pages as part of the project and queries for using the data.

After this, I began to experiment with Quick Statements, making heavy use of the useful guide to it available on Wikidata.

I decided to upload information on members of a particular regiment in Bengal, since this was information I could easily copy into a spreadsheet because the versions of the reports in the British Library research repository support Optical Character Recognition (OCR).

Image of the original India Office List containing information on members of the 14th Infantry Regiment
Section of the original India Office List containing information on members of the 14th Infantry Regiment (IOR/V/6/175, page 258)

Finally, once I had done all of this, I met with the curators of the India Office Records for feedback and suggestions. It became clear from this that there was in fact some confusion about the exact identification of the regiment they were involved in. Fortunately, it turned out we had identified the correct regiment, but had we made a mistake, it would have just required a simple batch of the Quick Statement edits to quickly put right.

Image of a section of a spreadsheet of members of the 14th Infantry Regiment
Section of my spreadsheet of members of the 14th Infantry Regiment

All in all, I can recommend using Wikidata and I hope I have shown that I can be a useful tool, but also that it is easy to use. The next step for our Wikidata project will be to upload templates and case studies to help and support future volunteer editors to develop it further. We will also add resources to support research on the uploaded data.

Image of Quick Statements for adding gender to each of the pages for the officers
Screenshot of Quick Statements for adding gender to each of the pages for the officers

This is a guest post by UCL Digital Humanities MA student Dominic Kane.