Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

23 April 2025

DHNB 2025 - Digital Humanities in the Nordic and Baltic Countries Conference Report

This post is by Helena Byrne, Curator of Web Archives.

Conference banner with an image of the Estonian National Museum on blue and purple background
DHNB 2025 Conference Banner

This year’s Digital Humanities in the Nordic and Baltic countries conference took place at the Estonian National Museum in Tartu. Last year was the first time I attended the DHNB conference (report available on Digital Scholarship Blog). The theme for this year was “Digital Dreams and Practices”. There were pre-conference workshops from March 3-4 with the main conference starting on the morning of March 5 and finishing on March 7. I participated in the Web Archive Collections as Data workshop held in the morning session on day two. 


This was a big conference with about 200 researchers and GLAM sector participants who attended from organisations based all over Europe as well as Japan. With such a big attendance there were multiple parallel sessions on each day. A detailed overview of the programme is available to download from the DHNB website. There was also a large poster presentation session at the end of day two of the conference. In the main hall all presenters had one minute to introduce their poster before going onto the floor to discuss the wide variety of topics in more detail.

Posters on 10 stands lined up against windows in the museum hallway.
Posters on display at the DHNB 2025 Conference

 There was a keynote on each day of the conference. The second day keynote was by Andrea Kocsis from Edinburgh University and current National Librarian’s Research Fellow in Digital Scholarship 2024-25 at the National Library of Scotland. She has worked closely with UK Web archive colleagues across the UK Legal Deposit Libraries to make the collections more accessible to wider audiences.

All three keynotes are available to watch on the DHNB website - https://dhnb.eu/conferences/dhnb2025/keynote-speakers/ 

It is hard to pick one highlight out of such a rich conference but I think it would be the presentation Collecting memories of the early internet by Johanna Arnesson, Evelina Liliequist, Coppélie Cocq from Umeå University, Sweden. The abstract is available on page 24 of the Programme Book of Abstracts. One of the key takeaways from this presentation was that more case studies from different countries are required. So far there have only been a few case studies that have reviewed early memories and/or experiences of the internet but people would have experienced the internet differently depending on their home country, age, socioeconomic status etc. It would be interesting to see researchers using the UK Web Archive resources to run a similar study in the UK.

Poster presenters lined up in front of the screen on stage in the conference auditorium.
Poster Slam at the DHNB 2025 Conference

Although the National Library of Estonia building is currently closed for renovation, I was delighted that I could meet up with their web archivist to discuss web archiving challenges and opportunities in Estonia. 

For a more detailed report on the Web Archive Collections as Data workshop see the UK Web Archive blog.

09 April 2025

Wikisource 2025 Conference: Collaboration, Innovation, and the Future of Digital Texts

This blog post is byDr Adi Keinan-Schoonbaert, Digital Curator forAsian and African Collections, British Library. She's on Mastodon as@[email protected] and Bluesky as @adi-keinan.bsky.social

 

The Wikisource 2025 Conference, held in the lush setting of Bali, Indonesia between 14-16 February 2025, brought together a global community of Wikimedians, heritage enthusiasts, and open knowledge advocates. Organised by a coalition of Wikisource contributors, Wikimedia Foundation and Wikimedia Indonesia, the conference served as a dynamic space to discuss the evolving role of Wikisource, explore new technologies, and strengthen collaborations with libraries, cultural institutions, and other global stakeholders.

Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.
Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.

The conference, themed “Wikisource: Transform & Preserve Heritage Digitally,”  featured a rich programme of keynote talks, long presentations, lightning talks, and informal meet-ups. Central themes included governance, technological advancements, community engagement, and the challenge of scaling Wikisource as a set of collaborative, multilingual platforms. We also enjoyed a couple of fantastic cultural events, celebrating the centuries-old, unique heritage of Bali!

Keynotes and Indonesian Partnerships

Following a kick-off session on the state of Wikisource community and technology, several Indonesian partners shared insights into their work on heritage, preservation, and digital accessibility. Dr Munawar Holil (Kang Mumu) highlighted the efforts of Manassa (the Indonesian Manuscript Society) to safeguard over 121,000 manuscripts, the majority of which remain undigitised, with key collections located in Bali, Jakarta, and Aceh. Challenges include limited public awareness, sacred perceptions requiring ceremonial handling, and structural gaps in institutional training.

Dr Cokorda Rai Adi Paramartha from Udayana University addressed the linguistic diversity of Indonesia – home to 780 languages and 40 scripts, only eight (!) of which are in Unicode – and stressed the importance of developing digital tools like a Balinese keyboard to engage the younger generation. Both speakers underscored the role of community collaboration and technological innovation in making manuscripts more accessible and relevant in the digital age.

Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).
Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).

I had the honour – and the absolute pleasure! – of being invited as one of the keynote speakers for this conference. In my talk I explored collaborations between the British Library and Wikisource, focusing on engaging local communities, raising awareness of library collections, facilitating access to digitised books and manuscripts, and enhancing them with accurate transcriptions.

We have previously collaborated with Bengali communities on two competitions to proofread 19th century Bengali books digitised as part of the Two Centuries of Indian Print project. More recently, the Library partnered with the Wikisource Loves Manuscripts (WiLMa) project, sharing Javanese manuscripts digitised through the Yogyakarta Digitisation Project. I’ve highlighted past and present work with Transkribus undertaken to develop Machine Learning training models aimed at automating transcriptions in various languages, encouraging further collaborations that could benefit communities worldwide, and highlighting the potential of such partnerships in expanding access to digitised heritage.

Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.
Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.

Another keynote was delivered by Andy Stauder from the READ-COOP. After introducing the cooperative and Transkribus, Andy talked about a key component of their approach – CCR – which stands for Clean, Controllable, and Reliable data coupled with information extraction (NER), powered by end-to-end ATR (automated text recognition) models. This approach is essential for both training and processing with large language models (LLMs). The future may move beyond pre-training to embrace active learning, fine-tuning, retrieval-augmented generation (RAG), dynamic prompt engineering, and reinforcement learning, with an aim to generate linked knowledge—such as integration with Wikidata IDs. Community collaboration remains central, as seen in projects like the digitisation of Indonesian palm-leaf manuscripts using Transkribus.

Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation
Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation

Cassie Chan (Google APAC Search Partnerships) gave a third keynote on Google's role in digitising and curating cultural and literary heritage, aligning with Wikisource’s mission of providing free access to source texts. Projects like Google Books aim to make out-of-copyright works discoverable online, while Google Arts & Culture showcases curated collections such as the Timbuktu Manuscripts, aiding preservation and accessibility. These efforts support Wikimedia goals by offering valuable, context-rich resources for contributors. Additionally, Google's use of AI for cultural exploration – through tools like Poem Postcards and Art Selfie – demonstrates innovative approaches to engaging with global heritage.

Spotlight on Key Themes and Takeaways

The conference featured so many interesting talks and discussions, providing insights into projects, sharing knowledge, and encouraging collaborations. I’ll mention here just a few themes and some key takeaways, from my perspective as someone working with heritage collections, communities, and technology.

Starting with the latter, a major focus was on Optical Character Recognition (OCR) improvements. Enhanced OCR capabilities on Wikisource platforms not only improve text accuracy but also encourage more volunteers to engage in text correction. Implementing Google OCR, Tesseract, and more recently – Transkribus – are driving increased participation, as volunteers enjoy refining text accuracy. Among other speakers, User:Darafsh, Chairman of the Iranian Wikimedians User Group, mentioned the importance of teaching how to use Wikisource and OCR, and the development of Persian OCR at the University of Hamburg. Other talks relating to technology covered the introduction of new extensions, widgets, and mobile apps, highlighting the push to make Wikisource more user-friendly and scalable.

Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource
Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource

Some discussions explored the potential of WiLMa (Wikisource Loves Manuscripts) as a model for coordinating across stakeholders, ensuring the consistency of tools, and fostering engagement with cultural institutions. For example, Irvin Tomas and Maffeth Opiana talked about WiLMa Philippines. This project launched in June 2024 as the first WiLMa project outside of Indonesia, focusing on transcribing and proofreading Central Bikol texts through activities like monthly proofread-a-thons, a 12-hour transcribe-a-thon, and training sessions at universities.

Another interesting topic was that of Wikidata and Metadata. The integration of structured metadata remains a key area of development, enabling better searchability and linking across digital archives. Bodhisattwa Mandal (West Bengal Wikimedians User Group) talked about Wikisource content including both descriptive metadata and unstructured text. While most data isn’t yet stored in a structured format, using Wikidata enables easier updates, avoids redundancy, and improves search, queries, and visualisation. There are tools that support metadata enrichment, annotation, and cataloguing, and a forthcoming mobile app will allow Wikidata-based book searches. Annotating text with Wikidata items enhances discoverability and link content more effectively across Wikimedia projects.

Working for the British Library, I (naturally!) picked up on a few collaborative projects between Wikisource and public or national libraries. One talk was about a digitisation project for traditional Korean texts, a three-year collaboration with Wikimedia Korea and the National Library of Korea, successfully revitalising the Korean Wikisource community by increasing participation and engaging volunteers through events and partnerships.

Another project built a Wikisource community in Uganda by training university students, particularly from library information studies, alongside existing volunteers. Through practical sessions, collaborative tasks, and support from institutions like the National Library of Uganda and Wikimedia contributors, participants developed digital literacy and archival skills.

Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda
Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda

A third Wikisource and libraries talk was about a Wikisource to public library pipeline project, which started initially in a public library in Hokitika, New Zealand. This pipeline enables scanned public domain books to be transcribed on Wikisource and then made available as lendable eBooks via the Libby app, using OverDrive's Local Content feature. With strong librarian involvement, a clear workflow, and support from a small grant, the project has successfully bridged Wikisource and library systems to increase accessibility and customise reading experiences for library users.

The final session of the conference focused on shaping a future roadmap for Wikisource through community-driven conversation, strategic planning, and partnership development. Discussions emphasised the need for clearer vision, sustainable collaborations with technology and cultural institutions, improved tools and infrastructure, and greater outreach to grow both readership and contributor communities. Key takeaways included aligning with partners’ goals, investing in editor growth, leveraging government language initiatives, and developing innovative workflows. A strong call was made to prioritise people over platforms and to ensure Wikisource remains a meaningful and inclusive space for engaging with knowledge and heritage.

Looking Ahead

The Wikisource 2025 Conference reaffirmed the platform’s importance in the digital knowledge ecosystem. However, sustaining momentum requires ongoing advocacy, technological refinement, and deeper institutional partnerships. Whether through digitising new materials or leveraging already-digitised collections, there is a clear hunger for openly accessible public domain texts.

As the community moves forward, a focus on governance, technology, and strategic partnerships will be essential in shaping the future of Wikisource. The atmosphere was so positive and there was so much enthusiasm and willingness to collaborate – see this fantastic video available via Wikimedia Commons, which successfully captures the sentiment. I’m sure we’re going to see a lot more coming from Wikisource communities in the future!

 

18 March 2025

Help us explore Automatic Text Recognition in cultural heritage institutions!

This post is by Dr Valentina Vavassori, Digital Curator for Automatic Text Recognition.

At the British Library, one of our core values is to "collaborate to do more than we could by ourselves."

In my task to research options for our Automatic Text Recognition (ATR) pipeline, it was clear from the start that it was necessary to talk with different cultural institutions about their own work and processes in ATR and how they integrate it into their digitisation projects.

As part of this research, I have come to realise that the field is full of innovative ideas, with a strong focus on solving problems and learning from one another. 

Therefore, I am now asking people from cultural heritage institutions to complete this survey on how they work (or plan to work) with Automatic Text Recognition.

In the spirit of open access and sharing, the anonymised results will be published so that other institutions can use them.

Additionally, one question at the end of the survey asks if other institutions are interested in taking part in a working group on ATR and, if possible, to share their email so we can kick-start having meetings and discussions.

The survey will only take 5-10 minutes to complete, and it is available here:

https://online1.snapsurveys.com/AutomaticTextRecognition 

I hope you will be able to answer the survey, and I look forward to meeting with anyone who is interested!