Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

14 September 2023

What's the future of crowdsourcing in cultural heritage?

The short version: crowdsourcing in cultural heritage is an exciting field, rich in opportunities for collaborative, interdisciplinary research and practice. It includes online volunteering, citizen science, citizen history, digital public participation, community co-production, and, increasingly, human computation and other systems that will change how participants relate to digital cultural heritage. New technologies like image labelling, text transcription and natural language processing, plus trends in organisations and societies at large mean constantly changing challenges (and potential). Our white paper is an attempt to make recommendations for funders, organisations and practitioners in the near and distant future. You can let us know what we got right, and what we could improve by commenting on Recommendations, Challenges and Opportunities for the Future of Crowdsourcing in Cultural Heritage: a White Paper.

The longer version: The Collective Wisdom project was funded by an AHRC networking grant to bring experts from the UK and the US together to document the state of the art in designing, managing and integrating crowdsourcing activities, and to look ahead to future challenges and unresolved issues that could be addressed by larger, longer-term collaboration on methods for digitally-enabled participation.

Our open access Collective Wisdom Handbook: perspectives on crowdsourcing in cultural heritage is the first outcome of the project, our expert workshops were a second.

Mia (me) and Sam Blickhan launched our White Paper for comment on pubpub at the Digital Humanities 2023 conference in Graz, Austria, in July this year, with Meghan Ferriter attending remotely. Our short paper abstract and DH2023 slides are online at Zenodo

So - what's the future of crowdsourcing in cultural heritage? Head on over to Recommendations, Challenges and Opportunities for the Future of Crowdsourcing in Cultural Heritage: a White Paper and let us know what you think! You've got until the end of September…

You can also read our earlier post on 'community review' for a sense of the feedback we're after - in short, what resonates, what needs tweaking, what examples could we include?

To whet your appetite, here's a preview of our five recommendations. (To find out why we make those recommendations, you'll have to read the White Paper):

  • Infrastructure: Platforms need sustainability. Funding should not always be tied to novelty, but should also support the maintenance, uptake and reuse of well-used tools.
  • Evidencing and Evaluation: Help create an evaluation toolkit for cultural heritage crowdsourcing projects; provide ‘recipes’ for measuring different kinds of success. Shift thinking about value from output/scale/product to include impact on participants' and community well-being.
  • Skills and Competencies: Help create a self-guided skills inventory assessment resource, tool, or worksheet to support skills assessment, and develop workshops to support their integrity and adoption.
  • Communities of Practice: Fund informal meetups, low-cost conferences, peer review panels, and other opportunities for creating and extending community. They should have an international reach, e.g. beyond the UK-US limitations of the initial Collective Wisdom project funding.
  • Incorporating Emergent Technologies and Methods: Fund educational resources and workshops to help the field understand opportunities, and anticipate the consequences of proposed technologies.

What have we missed? Which points do you want to boost? (For example, we discovered how many of our points apply to digital scholarship projects in general). You can '+1' on points that resonate with you, suggest changes to wording, ask questions, provide examples and references, or (constructively, please) challenge our arguments. Our funding only supported participants from the UK and US, so we're very keen to hear from folk from the rest of the world.

12 September 2023

Convert-a-Card: Past, Present and Future of Catalogue Cards Retroconversion

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected].

 

It’s been more than eight years, in June 2015, since the British Library launched its crowdsourcing platform, LibCrowds, with the aim of enhancing access to our collections. The first project series on LibCrowds was called Convert-a-Card, followed by the ever-so-popular In the Spotlight project. The aim of Convert-a-Card was to convert print card catalogues from the Library’s Asian and African Collections into electronic records, for inclusion in our online catalogue Explore.

A significant portion of the Library's extensive historical collections was acquired well before the advent of standard computer-based cataloguing. Consequently, even though the Library's online catalogue offers public access to tens of millions of records, numerous crucial research materials remain discoverable solely through searching the traditional physical card catalogues. The physical cards provide essential information for each book, such as title, author, physical description (dimensions, number of pages, images, etc.), subject and a “shelfmark” – a reference to the item’s location. This information still constitutes the basic set of data to produce e-records in libraries and archives.

Card Catalogue Cabinets in the British Library’s Asian & African Studies Reading Room © Jon Ellis
Card Catalogue Cabinets in the British Library’s Asian & African Studies Reading Room © Jon Ellis

 

The initial focus of Convert-a-Card was the Library’s card catalogues for Chinese, Indonesian and Urdu books – you can read more about this here and here. Scanned catalogue cards were uploaded to Flickr (and later to our Research Repository), grouped by the physical drawer in which they were originally located. Several of these digitised drawers became projects on LibCrowds.

 

Crowdsourcing Retroconversion

Convert-a-Card on LibCrowds included two tasks:

  1. Task 1 – Search for a WorldCat record match: contributors were asked to look at a digitised card and search the OCLC WorldCat database based on some of the metadata elements printed on it (e.g. title, author, publication date), to see if a record for the book already exists in some form online. If found, they select the matching record.
  2. Task 2 – Transcribe the shelfmark: if a match was found, contributors then transcribed the Library's unique shelfmark as printed on the card.

Online volunteers worked on Pinyin (Chinese), Indonesian and Urdu records, mainly between 2015 and 2019. Their valuable contributions resulted in lists of new records which were then ingested into the Library's Explore catalogue – making these items so much more discoverable to our users. For cards only partially matched with online records, curators and cataloguers had a special area on the LibCrowds platform through which they could address some of the discrepancies in partial matches and resolve them.

An example of an Urdu catalogue card
An example of an Urdu catalogue card

 

After much consideration, we have decided to sunset LibCrowds. However, you can see a good snapshot of it thanks to the UK Web Archive (with thanks to Mia Ridge and Filipe Bento for archiving it), or access its GitHub pages – originally set up and maintained by LibCrowds creator Alex Mendes. We have been using mainly Zooniverse for crowdsourcing projects (see for example Living with Machines projects), and you can see here some references to these and other crowdsourcing initiatives. Sunsetting LibCrowds provided us with the opportunity to rethink Convert-a-Card and consider alternative, innovative ways to automate or semi-automate the retroconversion of these valuable catalogue cards.

 

Text Recognition

As a first step, we were looking to automate the retrieval of text from the digitised cards using OCR/Machine Learning. As mentioned, this text includes shelfmark, title, author, place and date of publication, and other information. If extracted accurately enough, this text could be used for WorldCat lookup, as well as for enhancement of existing records. In most cases, the text was typewritten in English, often with additional information, or translation, handwritten in other languages. To start with, we’ve decided to focus only on the typewritten English – with the aspiration to address other scripts and languages in the future.

Last year, we ran some comparative testing with ABBYY FineReader Server (the software generally used for in-house OCR) and Transkribus, to see how accurately they perform this task. We trialled a set of cards with two different versions of ABBYY, and three different models for typewritten Latin scripts in Transkribus (Model IDs 29418, 36202, and 25849). Assessment was done by visually comparing the original text with the OCRed text, examining mainly the key areas of text which are important for this initiative, i.e. the shelfmark, author’s name and book title. For the purpose of automatically recognising the typewritten English on the catalogue cards, Transkribus Model 29418 performed better than the others – and more accurately than ABBYY’s recognition.

An example of a Pinyin card in Transkribus, showing segmentation and transcription
An example of a Pinyin card in Transkribus, showing segmentation and transcription

 

Using that as a base model, we incrementally trained a bespoke model to recognise the text on our Pinyin cards. We’ve also normalised the resulting text, for example removing spaces in the shelfmark, or excluding unnecessary bits of data. This model currently extracts the English text only, with a Character Error Rate (CER) of 1.8%. With more training data, we plan on extending this model to other types of catalogue cards – but for now we are testing this workflow with our Chinese cards.

 

Entities Extraction

Extracting meaningful entities from the OCRed text is our next step, and there are different ways to do that. One such method – if already using Transkribus for text extraction – is training and applying a bespoke P2PaLA layout analysis model. Such model could identify text regions, improve automated segmentation of the cards, and help retrieve specific regions for further tasks. Former colleague Giorgia Tolfo tested this with our Urdu cards, with good results. Trying to replicate this for our Chinese cards was not as successful – perhaps due to the fact that they are less consistent in structure.

Another possible method is by using regular expressions in a programming language. Research Software Engineer (RSE) Harry Lloyd created a Jupyter notebook with Python code to do just that: take the PAGE XML files produced by Transkribus, parse the XML, and extract the title, author and shelfmark from the text. This works exceptionally well, and in the future we’ll expand entity recognition and extraction to other types of data appearing on the cards. But for now, this information suffices to query OCLC WorldCat and see if a matching record exists.

One of the 26 drawers of Chinese (Pinyin) card catalogues © Jon Ellis
One of the 26 drawers of Chinese (Pinyin) card catalogues © Jon Ellis

 

Matching Cards to WorldCat Records

Entities extracted from the catalogue cards can now be used to search and retrieve potentially matching records from the OCLC WorldCat database. Pulling out WorldCat records matched with our card records would help us create new records to go into our cataloguing system Aleph, as well as enrich existing Aleph records with additional information. Previously done by volunteers, we aim to automate this process as much as possible.

Querying WorldCat was initially done using the z39.50 protocol – the same one originally used in LibCrowds. This is a client-server communications protocol designed to support the search and retrieval of information in a distributed network environment. With an excellent start by Victoria Morris and Giorgia Tolfo, who developed a prototype that uses PyZ3950 and PyMARC to query WorldCat, Harry built upon this, refined the code, and tested it successfully for data search and retrieval. Moving forward, we are likely to use the OCLC API for this – which should be a lot more straightforward!

 

Curator/Cataloguer Disambiguation

Getting potential matches from WorldCat is brilliant, but we would like to have an easy way for curators and cataloguers to make the final decision on the ideal match – which WorldCat record would be the best one as a basis to create a new catalogue record on our system. For this purpose, Harry is currently working on a web application based on Streamlit – an open source Python library that enables the building and sharing of web apps. Staff members will be able to use this app by viewing suggested matches, and selecting the most suitable ones.

I’ll leave it up to Harry to tell you about this work – so stay tuned for a follow-up blog post very soon!

 

11 September 2023

Join the British Library's Universal Viewer Product Team

The British Library has been a leading contributor to IIIF, the International Image Interoperability Framework, and the Universal Viewer for many years. We're about to take the next step in this work - and you can join us! We are recruiting for a Product Owner, a Research Software Engineer and a Senior Test Engineer (deadline 03 January 2024). 

In this post, Dr Mia Ridge, product owner for the Universal Viewer (UV) 2015-18, and Dr Rossitza Atanassova, UV business owner 2019-2023, share some background information on how new posts advertised for a UV product team will help shape the future of the Viewer at the Library and contribute to international work on the UV, IIIF standards and activities.

A lavishly decorated page from a fourteenth century manusript 'The Sherborne Missal' showing an illuminated capital with the Virgin Mary holding baby Jesus and surrounded by the three Kings.With other illuminations in the margins and the text.
Detail from Add MS 74236 'The Sherborne Missal' displayed in the Universal Viewer

 The creation of a Universal Viewer product team is part of wider infrastructure changes at the British Library, and marks a shift from contributing via specific UV development projects to thinking of the Viewer as a product. We'll continue to work with the Open Collective while focusing on Library-specific issues to support other activities across the organisation. 

Staff across the Library have contributed to the development of the Universal Viewer, including curators, digitisation teams and technology staff. Staff engage through bespoke training delivered by the IIIF Consortium, participation at IIIF workshops and conferences and experimentation with new tools, such as the digital storytelling tool Exhibit, to engage wide audiences. Other Library work with IIIF includes a collaboration with Zooniverse to enable items to be imported to Zooniverse via IIIF manifests, making crowdsourcing more accessible to organisations with IIIF items. Most recently with funding from the Andrew W Mellon Foundation we updated the UV to play audio from the British Library sound collections

Over half a million items from the British Library's collections are already available via the Universal Viewer, and that number grows all the time. Work on the UV has already let us retire around 35 other image viewers, a significant reduction in maintenance overheads and creating a more consistent experience for our readers.

However, there's a lot more to do! User expectations change as people use other document and media viewers, whether that's other IIIF tools like Mirador or the latest commercial streaming video platforms. We also need to work on some technical debt, ensure accessibility standards are met, improve infrastructure, and consolidate services for the benefits to users. Future challenges include enhancing UV capabilities to display annotations, formats such as newspapers, and complex objects such as 3D.

A view of the Library's image viewer, showing an early nineteenth century Javanese palm-leaf manuscript inside its decorated wooden covers. To the left of the image there is a list with the thumbnails of the manuscript leaves and to the right the panel displays bibliographic information about the item.
British Library Universal Viewer displaying Add MS 12278

 If you'd like to work in collaboration with an international open source community on a viewer that will reach millions of users around the world, one of these jobs may be for you!

Product Owner (job reference R00000196)

Ensure the strategic vision, development, and success of the project. Your primary goal will be to understand user needs, prioritise features and enhancements, and collaborate with the development team and community to deliver a high-quality open source product. 

Research Software Engineer (job reference R00000197)

Help identify requirements, and design and implement online interfaces to showcase our collections, help answer research questions, and support application of novel methods across team activities.

Senior Test Engineer (job reference R00000198)

Help devise requirements, develop high quality test cases, and support application of novel methods across team activities

To apply please visit the British Library recruitment siteApplications close on 3 January 2024. Interview dates are listed in the job ads.

Please ensure you answer all application questions (CVs cannot be submitted). At the BL we can only shortlist with information that applicants provide in response to questions on the application.  Any questions about the roles or the process? Drop us a line at [email protected].

06 September 2023

Open and Engaged 2023: Community over Commercialisation

The British Library is delighted to host its annual Open and Engaged Conference on Monday 30 October, in-person and online, as part of International Open Access Week.

Open and Engaged 2023: Community over Commercialisation, includes headshots of speakers and lists location as The British Library, London and contact as openaccess@bl.uk

In line with this year’s #OAWeek theme: Open and Engaged 2023: Community over Commercialisation will address approaches and practices to open scholarship that prioritise the best interests of the public and the research community. The programme will focus on community-governance, public-private collaborations, and community building aspects of the topic by keeping the public good in the heart of the talks. It will underline different priorities and approaches for Galleries-Libraries-Archives-Museums (GLAMs) and the cultural sector in the context of open access.

We invite everyone interested in the topic to join us on Monday, 30 October!

This will be a hybrid event taking place at the British Library’s Knowledge Centre in St. Pancras, London, and streamed online for those unable to attend in-person.

You can register for Open and Engaged 2023 by filling this form by Thursday, 26 October 18:00 BST. Please note that the places for in-person attendance are now full and the form is available only for online booking.

Registrants will be contacted with details for either in-person attendance or a link to access the online stream closer to the event.

Programme

Note that clocks change back to GMT in UK on Sunday, 29 October.

9:30     Registration opens for in-person attendees. Entrance Hall at the Knowledge Centre.

10:00   Welcome

10:10   Keynote from Monica Westin, Senior Product Manager at the Internet Archive

Commercial Break: Imagining new ownership models for cultural heritage institutions.

10:40   Session on public-private collaborations for public good chaired by Liz White, Director of Library Partnerships at the British Library.

  • Balancing public-private partnerships with responsibilities to our communities. Mia Ridge, Digital Curator, Western Heritage Collections, The British Library
  • Where do I stand? Deconstructing Digital Collections [Research] Infrastructures: A perspective from Towards a National Collection. Javier Pereda, Senior Researcher of the Towards a National Collection (TaNC)
  • "This is not IP I'm familiar with." The strange afterlife and untapped potential of public domain content in GLAM institutions. Douglas McCarthy, Head of Library Learning Centre, Delft University of Technology.

11:40   Break

12:10   Lightning talks on community projects chaired by Graham Jevon, Digital Service Specialist at the British Library.

  • The Turing Way: Community-led Resources for Open Research and Data Science. Emma Karoune, Senior Research Community Manager, The Alan Turing Institute.
  • Open Online Tools for Creating Interactive Narratives. Giulia Carla Rossi, Curator for Digital Publications and Stella Wisdom, Digital Curator for Contemporary British Collections, The British Library

12:45   Lunch

13:30   Session on the community-centred infrastructure in practice chaired by Jenny Basford, Repository Services Lead at the British Library.

  • AHRC, Digital Research Infrastructure and where we want to go with it. Tao Chang, Associate Director, Infrastructure & Major Programmes, Arts and Humanities Research Council (AHRC)
  • The critical role of repositories in advancing open scholarship. Kathleen Shearer, Executive Director, Confederation of Open Access Repositories (COAR). (Remote talk)
  • Investing in the Future of Open Infrastructure. Kaitlin Thaney, Executive Director, Invest in Open Infrastructure (IOI). (Remote talk)

14:30   Break

15:00   Session on the role of research libraries in prioritizing the community chaired by Ian Cooke, Head of Contemporary British Publications at the British Library.

  • Networks of libraries supporting open access book publishing. Rupert Gatti, Co-founder and the Director of Open Book Publishers, Director of Studies in Economics at the Trinity College Cambridge
  • Collective action for driving open science agenda in Africa and Europe. Iryna Kuchma, Open Access Programme Manager at EIFL. (Remote talk)
  • The Not So Quiet Rights Retention Revolution: Research Libraries, Rights and Supporting our Communities. William Nixon, Deputy Executive Director at RLUK-Research Libraries UK

16:00   Closing remarks

Social media hashtag for the event is #OpenEngaged. If you have any questions, please contact us at [email protected].

04 September 2023

ICDAR 2023 Conference Impressions

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected].

 

Last week I came back from my very first ICDAR conference, inspired and energised for things to come! The International Conference on Document Analysis and Recognition (ICDAR) is the main international event for scientists and practitioners involved in document analysis and recognition. Its 17th edition was held in San José, California, 21-26 August 2023.

ICDAR 2023 featured a three-day conference, including several competitions to challenge the field, as well as post-conference workshops and tutorials. All conference papers were made available as conference proceedings with Springer. 155 submissions were selected for inclusion into the scientific programme of ICDAR 2023, out of which 55 were delivered as oral presentations, and 100 as posters. The conference also teamed up with the International Journal of Document Analysis and Recognition (IJDAR) for a special journal track. 13 papers were accepted and published in a special issue entitled “Advanced Topics of Document Analysis and Recognition,” and were included as oral presentations in the conference programme. Do have a look at the programme booklet for more information!

ICDAR 2023 Logo
ICDAR 2023 Logo

Each conference day included a thought-provoking keynote talk. The first one, by Marti Hearst, Professor and Interim Dean of the UC Berkeley School of Information, was entitled “A First Look at LLMs Applied to Scientific Documents.” I learned about three platforms using Natural Language Processing (NLP) methods on PDF documents: ScholarPhi, Paper Plain, and SCIM. These projects help people read academic scientific publications, for example by enabling definitions for mathematical notations, or generating glossary for nonce words (e.g. acronyms, symbols, jargon terms); make medical research more accessible by enabling simplified summaries and Q&A; and classifying key passages in papers to enable quick and intelligent paper skimming.

The second keynote talk, “Enabling the Document Experiences of the Future,” was by Vlad Morariu, Senior Research Scientist at Adobe Research. Vlad addressed the need for human-document interaction, and took us through some future document experiences: PDF re-flows for mobile devices, documents read themselves, and conversational functionalities such as asking questions and receiving answers. Enabling this type of ultra-responsive documents is reliant on methods such as structural element detection, page layout understanding, and semantic connections.

The third and final keynote talk was by Seiichi Uchida, Distinguished Professor and Senior Vice President, Kyushu University, Japan. In his talk, “What Are Letters?,” Seiichi took us through the four main functions of letters and text: message (transmission of verbalised info), label (disambiguation of objects and environments), design (give a nonverbal info, such as impression), and code (readability under various noises and deformations). He provoked us to contemplate how our lives were affected by texts around us, and how could we analyse the correlation between our behaviour and the texts that we read.

Prof Seiichi Uchida giving his keynote talk on “What Are Letters?”
Prof Seiichi Uchida giving his keynote talk on “What Are Letters?”

When it came to papers submitted for review by the conference committee, the most prominent topic represented in those submissions was handwriting recognition, with a growing number of papers specifically tackling historical documents. Other submission topics included Graphics Recognition, Natural Language Processing for Documents (D-NLP), Applications (including for medical, legal, and business documents), and other types of Document Analysis and Recognition topics (DAR).

Screenshot of a slide showing the main submission topics for ICDAR 2023
Screenshot of a slide showing the main submission topics for ICDAR 2023

Some of the papers that I attended tackled Named Entity Recognition (NER) evaluation methods and genealogical information extraction; papers dealing with Document Understanding, e.g. identifying the internal structure of documents, and understanding the relations between different entities; papers on Text and Document Recognition, such as looking into a model for multilingual OCR; and papers looking into Graphics, especially the recognition of table structure and content, as well as extracting data from structure diagrammes, for example in financial documents, or flowchart recognition. Papers on Handwritten Text Recognition (HTR) dealt with methods for Writer Retrieval, i.e. identifying documents likely written by specific authors, the creation of generic models, text line detection, and more.

The conference included two poster sessions, featuring an incredibly rich array of poster presentations, as well as doctoral consortia. One of my favourite posters was presented by Mirjam Cuper, Data Scientist at the National Library of the Netherlands (KB), entitled “Unraveling confidence: examining confidence scores as proxy for OCR quality.” Together with colleagues Corine van Dongen and Tineke Koster, she looked into confidence scores provided by OCR engines, which indicate the level of certainty in which a word or character were accurately recognised. However, other factors are at play when measuring OCR quality – you can watch a ‘teaser’ video for this poster.

Conference participants at one of the poster sessions
Conference participants at one of the poster sessions

As mentioned, the conference was followed by three days of tutorials and workshops. I enjoyed the tutorial on Computational Analysis of Historical Documents, co-led by Dr Isabelle Marthot-Santaniello (University of Bale, Switzerland) and Dr Hussein Adnan Mohammed (University of Hamburg, Germany). Presentations focused on the unique challenges, difficulties, and opportunities inherent to working with different types of historical documents. The distinct difficulties posed by historical handwritten manuscripts and ancient artifacts necessitate an interdisciplinary strategy and the utilisation of state-of-the-art technologies – and this fusion leads to the emergence of exciting and novel advancements in this area. The presentations were interwoven with great questions and a rich discussion, indicative of the audience’s enthusiasm. This tutorial was appropriately followed by a workshop dedicated to Computational Palaeography (IWCP).

I especially looked forward to the next day’s workshop, which was the 7th edition of Historical Document Imaging and Processing (HIP’23). It was all about making documents accessible in digital libraries, looking at methods addressing OCR/HTR of historical documents, information extraction, writer identification, script transliteration, virtual reconstruction, and so much more. This day-long workshop featured papers in four sessions: HTR and Multi-Modal Methods, Classics, Segmentation & Layout Analysis, and Language Technologies & Classification. One of my favourite presentations was by Prof Apostolos Antonacopoulos, talking about his work with Christian Clausner and Stefan Pletschacher on “NAME – A Rich XML Format for Named Entity and Relation Tagging.” Their NAME XML tackles the need to represent named entities in rich and complex scenarios. Tags could be overlapping and nested, character-precise, multi-part, and possibly with non-consecutive words or tokens. This flexible and extensible format addresses the relationships between entities, makes them interoperable, usable alongside other information (images and other formats), and possible to validate.

Prof Apostolos Antonacopoulos talking about “NAME – A Rich XML Format for Named Entity and Relation Tagging”
Prof Apostolos Antonacopoulos talking about “NAME – A Rich XML Format for Named Entity and Relation Tagging”

I’ve greatly enjoyed the conference and its wonderful community, meeting old colleagues and making new friends. Until next time!

 

02 September 2023

Huzzah! Hear the songs from Astrologaster live at the Library

Digitised archives and library collections are rich resources for creative practitioners, including video game makers, who can bring history to life in new ways with immersive storytelling. A wonderful example of this is Astrologaster by Nyamyam, an interactive comedy set in Elizabethan London, based on the manuscripts of medical astrologer Simon Forman, which is currently showcased in the British Library’s Digital Storytelling exhibition.

Artwork from the game Astrologaster, showing Simon Forman surrounded by astrological symbols and with two patients standing each side of him

On Friday 15th September we are delighted to host an event to celebrate the making and the music of Astrologaster. Featuring game designer Jennifer Schneidereit in conversation with historian Lauren Kassell discussing how they created the game. Followed by a vocal quartet who will sing madrigal songs from the soundtrack composed by Andrea Boccadoro. Each character in the game has their own Renaissance style theme song with witty lyrics written by Katharine Neil. This set has never before been performed live, so we can’t wait to hear these songs at the Library and we would love for you to join us, click here to book. We've had the title song, which you can play below, as an earworm for the last few months!

Simon Forman was a self-taught doctor and astrologer who claimed to have cured himself of the plague in 1592. Despite being unlicensed and scorned by the Royal College of Physicians he established a practice in London where he analysed the stars to diagnose and solve his querents’ personal, professional and medical problems. Forman documented his life and work in detail, leaving a vast quantity of papers to his protégé Richard Napier, whose archive was subsequently acquired by Elias Ashmole for the Ashmolean Museum at the University of Oxford. In the nineteenth century this collection transferred to the Bodleian Library, where Forman’s manuscripts can still be consulted today.

Screen capture of the Casebooks digital edition showing an image of a manuscript page on the left and a transcript on the right
Screen capture image of the Casebooks digital edition showing ‘CASE5148’.
Lauren Kassell, Michael Hawkins, Robert Ralley, John Young, Joanne Edge, Janet Yvonne Martin-Portugues, and Natalie Kaoukji (eds.), ‘CASE5148’, The casebooks of Simon Forman and Richard Napier, 1596–1634: a digital edition, https://casebooks.lib.cam.ac.uk/cases/CASE5148, accessed 1 September 2023.

Funded by the Wellcome Trust, the Casebooks Project led by Professor Lauren Kassell at the University of Cambridge, spent over a decade researching, digitising, documenting and transcribing these records. Producing The casebooks of Simon Forman and Richard Napier, 1596–1634: a digital edition published by Cambridge Digital Library in May 2019. Transforming the archive into a rich searchable online resource, with transcriptions and editorial insights about the astrologers’ records, alongside digitised images of the manuscripts.

In 2014 Nyamyam’s co-founder and creative director Jennifer Schneidereit saw Lauren present her research on Simon Forman’s casebooks, and became fascinated by this ambitious astrologer. Convinced that Forman and his patients’ stories would make an engaging game with astrology as a gameplay device, she reached out to Lauren to invite her to be a consultant on the project. Fortunately Lauren responded positively and arranged for the Casebooks Project to formally collaborate with Nyamyam to mine Forman’s patient records for information and inspiration to create the characters and narrative in the Astrologaster game.  

Screen capture image of a playthrough video of Astrologaster, showing a scene in the game where you select an astrological reading
Still image of a playthrough video demonstrating how to play Astrologaster made by Florence Smith Nicholls for the Digital Storytelling exhibition

At the British Library we are interested in collecting and curating interactive digital narratives as part of our ongoing emerging formats research. One method we are investigating is the acquisition and creation of contextual information, such as recording playthrough videos. In the Digital Storytelling exhibition you can watch three gameplay recordings, including one demonstrating how to play Astrologaster. These were made by Florence Smith Nicholls, a game AI PhD researcher based at Queen Mary University of London, using facilities at the City Interaction Lab within the Centre for Human-Computer Interaction Design at City, University of London. Beyond the exhibition, these recordings will hopefully benefit researchers in the future, providing valuable documentation on the original ‘look and feel’ of an interactive digital narrative, in addition to instructions on use whenever a format has become obsolete.

The Digital Storytelling exhibition is open until the 15th October 2023 at the British Library, displaying 11 narratives that demonstrate the evolving field of interactive writing. We hope you can join us for upcoming related events, including the Astrologaster performance on Friday 15th September, and an epic Steampunk Late on Friday 13th October. We are planning this Late with Clockwork Watch, Blockworks and Lancaster University's Litcraft initiative, so watch this blog for more information on this event soon.

30 August 2023

The British Library Loves Manuscripts on Wikisource

This blog post was originally published on Wikimedia’s community blog, Diff, by Satdeep Gill (WMF) and Dr Adi Keinan-Schoonbaert (Digital Curator for Asian and African Collections, British Library)

 

The British Library has joined hands with the Wikimedia Foundation to support the Wikisource Loves Manuscripts (WiLMa) project, sharing 76 Javanese manuscripts, including what is probably the largest Javanese manuscript in the worlddigitised as part of the Yogyakarta Digitisation Project. The manuscripts, which are now held in the British Library, were taken from the Kraton (Palace) of Yogyakarta following a British attack in June 1812. The British Library’s digitisation project was funded by Mr. S P Lohia and included conservation, photography, quality assurance and publication on the Library’s Digitised Manuscripts website, and the presentation of complete sets of digital images to the Governor of Yogyakarta Sri Sultan Hamengkubuwono X, the National Library of Indonesia, and the Library and Archives Board of Yogyakarta.

3D model of Menak Amir Hamza (British Library Add MS 12309), probably the largest Javanese manuscript in the world

For the WiLMa project, the scanned images, representing more than 30,000 pages, were merged into pdfs and uploaded to Wikimedia Commons by Ilham Nurwansah, Wikimedian-in-Residence at PPIM and User:Bennylin from the Indonesian community. The manuscripts are now available on Wikimedia Commons in the Category:British Library manuscripts from Yogyakarta Digitisation Project.

“Never before has a library of Javanese manuscripts of such importance been made available to the internet, especially for easy access to the almost 100 million Javanese people worldwide.”

User:Bennylin said about the British Library donation

As a global movement, Wikimedia is able to connect the Library with communities of origin, who can use the digitised manuscripts to revitalise their language online. As such, we have a history of collaboration with the Wikimedia community, hosting Wikimedians-in-Residence and working with the Wikisource community. In 2021, we collaborated with the West Bengal Wikimedians User Group to organise two Wikisource competitions (in Spring and Autumn). Forty rare Bengali books, digitised as a part of the Two Centuries of Indian Print project, were made available on Wikisource. The Bengali Wikisource community has corrected more than 5,000 pages of text, which were OCRed as part of the project.

“As part of our global engagement with Wikimedia communities, we were thrilled to engage in a partnership with the Bengali Wikisource community for the proofreading of rare and unique books digitised as part of the Two Centuries of Indian Print project. We extend our gratitude towards the community’s unwavering commitment and the enthusiasm of its members, which have greatly enhanced the accessibility of these historic gems for readers and researchers.”

Dr Adi Keinan-Schoonbaert, Digital Curator, British Library

The developing Javanese Wikisource community has already started using the newly digitised Javanese manuscripts in their project, and has plans ranging from transliteration and translation, to recording the content being sung, as originally intended. (Recording of Ki Sujarwo Joko Prehatin, singing (menembang) the texts of Javanese manuscripts, at the British Library, 12 March 2019; recording by Mariska Adamson).

Screenshot of a Javanese manuscript being used for training an HTR model using Transkribus
Screenshot of a Javanese manuscript being used for training an HTR model using Transkribus

The Library’s collaboration with the Javanese community started earlier this year, when the Wikisource community included three manuscripts from the Library’s Henry D. Ginsburg Legacy Digitisation Projects in the list of focus texts for a Wikisource competition. Parts of these three long manuscripts were proofread by the community during the competition and now they are being used to create a Handwritten Text Recognition (HTR) model for the Javanese script using Transkribus, as part of our ongoing WiLMa initiative.

Stay tuned for further updates about WiLMa Learning Partners Network!

 

03 August 2023

My AHRC-RLUK Professional Practice Fellowship: A year on

A year ago I started work on my RLUK Professional Practice Fellowship project to analyse computationally the descriptions in the Library’s incunabula printed catalogue. As the project comes to a close this week, I would like to update on the work from the last few months leading to the publication of the incunabula printed catalogue data, a featured collection on the British Library’s Research Repository. In a separate blogpost I will discuss the findings from the text analysis and next steps, as well as share my reflections on the fellowship experience.

Since Isaac’s blogpost about the automated detection of the catalogue entries in the OCR files, a lot of effort has gone into improving the code and outputting the descriptions in the format required for the text analysis and as open datasets. With the invaluable help of Harry Lloyd who had joined the Library’s Digital Research team as Research Software Engineer, we verified the results and identified new rules for detecting sub-entries signaled by Another Copy rather than a main entry heading. We also reassembled and parsed the XML files, originally split in two sets per volume for the purpose of generating the OCR, so that the entries are listed in the order in which they appear in the printed volume. We prepared new text files containing all the entries from each volume with each entry represented as a single line of text, that I could use for the corpus linguistics analysis with AntConc. In consultation with the Curator, Karen Limper-Herz, and colleagues in Collection Metadata we agreed how best to store the data for evaluation and in preparation to update the Library’s online catalogue.

Two women looking at the poster illustrating the text analysis with the incunabula catalogue data
Poster session at Digital Humanities Conference 2023

Whilst all this work was taking place, I started the computational analysis of the English text from the descriptions. The reason for using these partial descriptions was to separate what was merely transcribed from the incunabula from the more language used by the cataloguer in their own ‘voice’. I have recorded my initial observations in the poster I presented at the Digital Humanities Conference 2023. Discussing my fellowship project with the conference attendees was extremely rewarding; there was much interest in the way I had used Transkribus to derive the OCR data, some questions about how the project methodology applies to other data and an agreement on the need to contextualise collections descriptions and reflect on any bias in the transmission of knowledge. In the poster I also highlight the importance of the cross-disciplinary collaboration required for this type of work, which resonated well with the conference theme of Collaboration as Opportunity.

I have started disseminating the knowledge gained from the project with members of the GLAM community. At the British Library Harry, Karen and I ran an informal ‘Hack & Yack’ training session showcasing the project aims and methodology through the use of Jupyter notebooks. I also enjoyed the opportunity to discuss my research at a recent Research Libraries UK Digital Scholarship Network workshop and look forward to further conversations on this topic with colleagues in the wider GLAM community. 

We intend to continue to enrich the datasets to enable better access to the collection, the development of new resources for incunabula research and digital scholarship projects. I would like to end by adding my thanks to Graham Jevon, for assisting with the timely publication of the project datasets, and above all to James, Karen and Harry for supporting me throughout this project.

This blogpost is by Dr Rossitza Atanassova, Digital Curator, British Library. She is on Twitter @RossiAtanassova  and Mastodon @[email protected]