Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

13 June 2025

Reflections from the IIIF Annual Conference 2025

This year’s IIIF Annual Conference and Showcase took place in Leeds. The Universal Viewer (UV) product team were in attendance once again, as well as Optical Character Recognition/Handwritten Text Recognition (OCR/HTR) Digital Curator Valentina Vavassori. Research Software Engineer Saira Akhter reports on the event...

The conference was a packed few days of presentations, workshops, and discussions. The first two days featured talks on a variety of subjects such as 3D, sustainability, interoperability, and AI. From the UV product team, Research Software Engineer James Misson gave a lightning talk on IIIF Timeline, a new tool for viewing IIIF collections on an interactive timeline interface.

Sharing his thoughts, James noted that the talk inspired a lot of further discussion about how dates and times are encoded within IIIF, and the difficulties of handling material without specific dates associated with them. Conference attendees enjoyed experimenting with the tool, and it has already been integrated into other IIIF systems such as Digirati's Manifest Editor, where users can build a collection and instantly preview it in the timeline.

James Misson presenting “Viewing collections chronologically with IIIF Timeline”
James Misson presenting “Viewing collections chronologically with IIIF Timeline”

In addition, UV Product Owner Erin Burnand talked about the development of the Universal Viewer community, and the shift from passive to active collaboration with increased engagement. She was joined by frequent collaborator Sara Weale, Head of Web Design & Development at Llyfrgell Genedlaethol Cymru - National Library of Wales, who serves as Scrum Master during Universal Viewer community development sprints. 

Sara Weale and Erin Burnand presenting “Developing a community to develop software: the Universal Viewer”
Sara Weale and Erin Burnand presenting “Developing a community to develop software: the Universal Viewer”

Birds of a Feather sessions were held on the final day. These sessions included workshops on using IIIF-related resources and informal discussions. Of particular interest to us was a session held by the National Archives UK and the Swiss Federal Archives which explored practical examples of implementing HTR within a IIIF ecosystem. Valentina reflected on the conference:

“HTR/OCR was an interesting topic that kept appearing in discussion. For example, the work presented by Jie Song, “Implementing IIIF Towards a Large-scale Image Native Archival Resources Management System", used an OCR plugin integrated in Word and RAG and GraphRAG coupled with GenAI to give access to Chinese scientist archival materials. The HTR Birds of a Feather session was also particularly relevant as it opened up the possibility of discussion and collaboration with other participants, to explore in more details the work done by The National Archives, UK and the Swiss Federal Archives and to discuss together topics such as HTR engines, models and pipelines and metadata standards.”

Recordings of the conference are available on YouTube.

29 May 2025

Discover Digital Sustainability

This blog post is byDr Adi Keinan-Schoonbaert, Digital Curator forAsian and African Collections, British Library. She's on Mastodon as@[email protected] and Bluesky as @adi-keinan.bsky.social

 

As someone with an ongoing interest in all things environmental sustainability – especially where digital work and technologies are concerned – I was highly motivated to plan and coordinate a month-long series of training events entitled “Discover Digital Sustainability”. Ran as part of the British Library’s Digital Scholarship Training Programme (DSTP) in March 2025, this training series explored the intersection of technology, digital practices, and environmental responsibility. Designed to spark practical climate action, it brought together staff from across the organisation, offering them the knowledge and tools needed to reduce the environmental impact of their digital work.

Building on the success of previous initiatives – including the DHCC/BL workshop in early 2024, ongoing Carbon Literacy Training, and the launch of the British Library’s Sustainability and Climate Change Strategy – this comprehensive programme featured a rich blend of talks, interactive workshops, and a reflective reading group session. Attendees engaged with topics ranging from data management, digital preservation, digitisation, and web design to the environmental implications of AI, hardware use, as well as procurement and supply chains.

Throughout the month, staff heard from experts in heritage, academia, and technology, as well as British Library colleagues leading by example in rethinking workflows and adopting greener digital practices. The training not only deepened our understanding of how to measure and reduce the carbon footprint of digital operations but also inspired a broader cultural shift toward sustainability in the way we create, manage, and deliver digital services.

 

Reading Group

The series kicked off with a dedicated reading group session focused on the theme of data and digital waste. It sparked thoughtful discussion around the environmental impacts of data creation, storage, and retention. Participants explored how to balance the social value of data with its environmental cost, drawing on key resources, mainly from Loughborough University’s Digital Decarbonisation project, which highlights the concept of “dark data” and the hidden carbon footprint associated with unused or unnecessary data.

The conversation set the stage for a timely follow-up webinar later that week, entitled “Your Data Carbon Footprint: What It Is & Why It Matters”, hosted by Leadership Through Data and featuring Professors Tom Jackson and Ian Hodgkinson from the Digital Decarbonisation project. Several attendees joined this session, which offered deeper insights into the scale and significance of data-related emissions and practical steps for managing digital waste more sustainably.

Screenshot of the “Digital Decarbonisation” homepage
Screenshot of the “Digital Decarbonisation” homepage

 

Staff Talks

The training series featured six engaging staff talks (we call them 21st Century Curatorship Talks), each offering unique perspectives on the challenges and opportunities of embedding sustainability into digital practices across the cultural, academic and technology sectors.

Katie Espley from the Library’s Corporate Information Management Unit shared practical strategies in her “Records Management Refresh” session, highlighting best practices for organising digital workspaces to reduce clutter and improve efficiency. Dr Nicôle Meehan (University of St Andrews) examined the environmental impact of museum digitisation, encouraging institutions to take action to reduce its carbon cost. From the tech industry, Lewis Richards (Chief Sustainability Officer, Microsoft UK) provided insights into Microsoft’s sustainability initiatives and how they are addressing environmental impact at scale.

Stacey Anderson (The Box Plymouth) contributed a session on sustainable digital preservation in museums and archives, while David Mahoney (University of Edinburgh) explored sustainable web design, offering practical ways to lower the environmental footprint of online content. Finally, Jon Ray (Oxford GLAM) presented the University of Oxford GLAM’s efforts to develop a digital sustainability action plan, showcasing how cultural organisations can take meaningful steps to reduce their digital carbon footprint. Together, these talks illustrated the breadth of sustainable digital practice and inspired staff to think critically about their own work.

 

Workshops

Three hands-on workshops offered staff the opportunity to dive deeper into the practicalities of digital sustainability, each approaching the topic from a different angle. In the “Future of Digital Sustainability”, Jo Walton and Nathalie Huegler from the University of Sussex Digital Humanities Lab led an interactive session and card game that explored the environmental impact of digital technologies and strategies for building more sustainable digital practices. Bailey Bryan and Tommy Ferry from the digital agency Wholegrain Digital followed with a focused workshop on the carbon footprint of web design, guiding participants through actionable steps to create more environmentally friendly websites.

The “Future of Digital Sustainability” workshop instructors and participants in action
The “Future of Digital Sustainability” workshop instructors and participants in action

Finally, a Hack & Yack session, led by Nora McGregor, brought staff together in a collaborative effort to co-create a Digital Sustainability online guide tailored for library professionals. This guide forms part of a broader initiative by the Digital Scholarship and Digital Cultural Heritage Collections Working Group (a LIBER working group chaired by Nora), now working in partnership with LIBER’s Data Science in Libraries group to build a centralised staff skills hub – a trusted resource for training at the intersection of libraries, cultural heritage, and digital technologies.

 

Key Takeaways

The different presentations and discussions brought to light the complex challenges and opportunities that lie at the intersection of digital practice and environmental responsibility. One of the recurring themes was the widespread underestimation or complete lack of measurement around digital-related carbon emissions. Many organisations still operate without clear data hygiene policies, and there is a notable lack of consistent, trustworthy reporting from digital service providers on their sustainability practices. This is compounded by limited awareness of the environmental footprint of routine digital activities, including hardware procurement, software use, and long-term digital storage.

Speakers highlighted the tensions many institutions face in balancing environmental goals with the practicalities of funding requirements, user experience expectations, and organisational responsibilities. The use of AI was a particular area of concern – not only for its increasing energy demands but also for the need to use it more responsibly and strategically. The importance of informing policy change and providing the sector with practical, actionable resources was emphasised as a priority for enabling progress.

The talks and workshops also underscored the importance of data and facts in making the case for change. Data centres alone are responsible for between 2.5% and 3.7% of global carbon emissions – potentially exceeding the aviation sector. Additionally, up to 65% of stored data is considered “dark” (unused), with another 15% classified as redundant. Reducing this unnecessary data load is essential, and speakers offered a range of sustainability strategies to help. These included selective digitisation that prioritises high-value or at-risk collections, the use of lossless compression formats to reduce file size without compromising quality, and scheduling energy-intensive processes like integrity checks during off-peak hours.

Other practical recommendations included sustainable web design principles, such as optimising images, reducing page weight, and minimising third-party scripts. Speakers also noted encouraging signs of progress in the form of evolving sector standards for sustainable digital archiving, and the growing emphasis on green procurement, evaluating vendors not just on price and functionality but also on their environmental credentials.

Finally, a clear call to action emerged around the need for collective effort. Sector-wide collaboration was viewed as essential for setting shared standards, influencing common suppliers, and fostering open dialogue. Tools such as carbon calculators and tracking toolkits can support these efforts, but lasting change will depend on sustained joint action and a willingness to embed sustainability into every aspect of digital practice. Importantly, this work must be done with a strong commitment to equity, ensuring that the push for digital efficiency does not come at the cost of access or inclusion.

 

What did participants think?

The training series saw a whopping number of 88 participants from many teams and department across the Library, many of which attended more than one event. A survey was conducted at the end of the training series, to assess the relevance of topics to both professional digital working and personal life, as well as gauge impact: what changes were participants going to make? How will their behaviour change? With some questions mandatory and some optional, the survey had 22 respondents in total.

On a scale from None (1) to Advanced (5), participants rated their knowledge of digital sustainability at 2.36 on average before the series, and at 3.64 on average afterwards. This demonstrates an increased level of confidence in understanding relevant topics. Survey results also show that almost everyone has learned new things, or reinforced things they’ve already known about. Most respondents also shared or intend to share knowledge acquired from the series, and most have made or intend on making changes in their professional role, with some also in their personal lives. Almost all of them would recommend similar events to colleagues in the future.

Agree/Disagree statements from the series' feedback survey
Agree/Disagree statements from the series' feedback survey

Looking into the changes that respondents were going to make, it was evident that participants were thinking critically about digital sustainability, from practical day-to-day actions to influencing organisational change. Several mentioned file and data management, for example reducing the duplication of files, doing a ‘spring clean’, or deleting unnecessary emails. Others talked about sustainable digital practices, like encouraging a climate aware task scheduler function built in for large jobs in development environments, or a desire to create a sustainability checklist for web editing. Several pointed to the importance of advocacy, knowledge sharing and team engagement, for example influencing policies, sharing knowledge about cloud storage considerations, encouraging mindful use of storage in platforms like MS Teams, or telling people about the Hugging Face list of models that require less resources.

Participants highlighted several aspects of the training series that they found especially valuable and enjoyable. The sessions were praised for being accessible to non-specialists, with clear and engaging presentations that made complex topics understandable. Attendees appreciated the opportunity to learn from experts across different fields, gaining new perspectives and recognising shared challenges. The diversity of talks throughout the month helped reinforce key ideas, making the learning experience more cohesive and impactful. Interactive elements, such as the “Future of Digital Sustainability” workshop and the educational card game, were particularly well received for being both fun and informative. Many found the enthusiasm and expertise of the speakers inspiring and valued the sense of community fostered throughout the series – feeling connected to others who are equally committed to addressing important issues.

We had some thoughtful and constructive feedback on how the training series could be improved. A recurring suggestion was to spread out the events over a longer period, with slightly shorter presentations and more interactive elements to keep engagement high. Several attendees expressed a desire for more practical, hands-on content, such as Hack & Yack sessions that focus on applying ideas directly to their work environments. There was also interest in exploring how digital sustainability can be balanced with usability and visual appeal, particularly for those working in online public engagement. A few attendees found the themes somewhat repetitive and proposed a structured progression – starting with theory and moving into application – for those attending multiple sessions. Additionally, ideas were raised about revisiting key topics in future sessions to keep the momentum going and share updates.

My favourite testimonial would probably be:

“I just wanted to drop a quick note to say how great it is that you're running these sessions. I think we often struggle to take environmental impacts into consideration when appraising options for digital systems and data projects so this is incredibly welcome.”

 

Feedback on the “Future of Digital Sustainability”

The “Future of Digital Sustainability” workshop received separate feedback, which was also really helpful. Participants found the workshop to be a highly engaging, enjoyable, and accessible way to explore the complex subject of digital sustainability. Many emphasised the fun, playful nature of the learning experience, noting how it made the subject matter more approachable and digestible. The interactive format encouraged rich discussions, peer learning, and collective reflection, allowing participants to learn not only from the facilitators but also from the expertise of their colleagues. Several appreciated the balance between educational depth and a light, participatory tone, saying it was both thought-provoking and practical. The workshop was seen as a great tool for building shared understanding and momentum within teams, with participants leaving more informed and motivated.

Interactive session at the end of the “Future of Digital Sustainability” workshop
Interactive session at the end of the “Future of Digital Sustainability” workshop

Suggestions for improvement focused mostly on refining the structure of the card game used in the workshop. Some participants noted a need to better balance the depth of discussion with the pace of gameplay, proposing fewer cards, clearer ground rules, and a short introduction to the game strategy. Others recommended changes to the mechanics – such as modifying hand sizes, how actions and events are played, and how points are distributed – to reduce repetition and enhance flow. A few attendees also wished for a better way to track the many concepts raised during the game. Despite these suggestions, the consensus was that the game was a valuable and enjoyable tool. Participants also expressed interest in future collaboration and emphasised the importance of translating insights into actionable decisions across the organisation.

And as for my favourite quote –

“The workshop was soooo good! I hope you're able to run it again so that more BL folk can attend. I think we all left feeling more positive, informed and empowered than we entered.”

 

Next Steps

Organising the “Discover Digital Sustainability” training series at the British Library was an incredibly rewarding experience. I learned a great deal through the process and especially valued the chance to connect with other organisations and professionals working in this space. These conversations were not only inspiring but also opened up exciting possibilities for future collaboration. I’m proud that we were able to deliver something both meaningful and impactful, helping to bring the topic of digital sustainability to the forefront within our institution.

Looking ahead, I’m optimistic that many of the insights and ideas from the series will be put into practice – in line with the Library’s strategic action plan – across teams, departments, and individuals. There’s more to come too: I’ll be speaking at DH2025 in Lisbon this July, sharing our work through a paper on “Digital Humanities and Environmental Sustainability at the British Library,” and also plan to publish the topic guide on digital sustainability later this summer. Plus, we’ll be running the “Future of Digital Sustainability” workshop and card game again during Green Libraries Week, this time at our Boston Spa site. Our commitment to learning continues, and we’ll keep seeking new opportunities to support staff through the Digital Scholarship Training Programme.

 

23 April 2025

DHNB 2025 - Digital Humanities in the Nordic and Baltic Countries Conference Report

This post is by Helena Byrne, Curator of Web Archives.

Conference banner with an image of the Estonian National Museum on blue and purple background
DHNB 2025 Conference Banner

This year’s Digital Humanities in the Nordic and Baltic countries conference took place at the Estonian National Museum in Tartu. Last year was the first time I attended the DHNB conference (report available on Digital Scholarship Blog). The theme for this year was “Digital Dreams and Practices”. There were pre-conference workshops from March 3-4 with the main conference starting on the morning of March 5 and finishing on March 7. I participated in the Web Archive Collections as Data workshop held in the morning session on day two. 


This was a big conference with about 200 researchers and GLAM sector participants who attended from organisations based all over Europe as well as Japan. With such a big attendance there were multiple parallel sessions on each day. A detailed overview of the programme is available to download from the DHNB website. There was also a large poster presentation session at the end of day two of the conference. In the main hall all presenters had one minute to introduce their poster before going onto the floor to discuss the wide variety of topics in more detail.

Posters on 10 stands lined up against windows in the museum hallway.
Posters on display at the DHNB 2025 Conference

 There was a keynote on each day of the conference. The second day keynote was by Andrea Kocsis from Edinburgh University and current National Librarian’s Research Fellow in Digital Scholarship 2024-25 at the National Library of Scotland. She has worked closely with UK Web archive colleagues across the UK Legal Deposit Libraries to make the collections more accessible to wider audiences.

All three keynotes are available to watch on the DHNB website - https://dhnb.eu/conferences/dhnb2025/keynote-speakers/ 

It is hard to pick one highlight out of such a rich conference but I think it would be the presentation Collecting memories of the early internet by Johanna Arnesson, Evelina Liliequist, Coppélie Cocq from Umeå University, Sweden. The abstract is available on page 24 of the Programme Book of Abstracts. One of the key takeaways from this presentation was that more case studies from different countries are required. So far there have only been a few case studies that have reviewed early memories and/or experiences of the internet but people would have experienced the internet differently depending on their home country, age, socioeconomic status etc. It would be interesting to see researchers using the UK Web Archive resources to run a similar study in the UK.

Poster presenters lined up in front of the screen on stage in the conference auditorium.
Poster Slam at the DHNB 2025 Conference

Although the National Library of Estonia building is currently closed for renovation, I was delighted that I could meet up with their web archivist to discuss web archiving challenges and opportunities in Estonia. 

For a more detailed report on the Web Archive Collections as Data workshop see the UK Web Archive blog.

09 April 2025

Wikisource 2025 Conference: Collaboration, Innovation, and the Future of Digital Texts

This blog post is byDr Adi Keinan-Schoonbaert, Digital Curator forAsian and African Collections, British Library. She's on Mastodon as@[email protected] and Bluesky as @adi-keinan.bsky.social

 

The Wikisource 2025 Conference, held in the lush setting of Bali, Indonesia between 14-16 February 2025, brought together a global community of Wikimedians, heritage enthusiasts, and open knowledge advocates. Organised by a coalition of Wikisource contributors, Wikimedia Foundation and Wikimedia Indonesia, the conference served as a dynamic space to discuss the evolving role of Wikisource, explore new technologies, and strengthen collaborations with libraries, cultural institutions, and other global stakeholders.

Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.
Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.

The conference, themed “Wikisource: Transform & Preserve Heritage Digitally,”  featured a rich programme of keynote talks, long presentations, lightning talks, and informal meet-ups. Central themes included governance, technological advancements, community engagement, and the challenge of scaling Wikisource as a set of collaborative, multilingual platforms. We also enjoyed a couple of fantastic cultural events, celebrating the centuries-old, unique heritage of Bali!

Keynotes and Indonesian Partnerships

Following a kick-off session on the state of Wikisource community and technology, several Indonesian partners shared insights into their work on heritage, preservation, and digital accessibility. Dr Munawar Holil (Kang Mumu) highlighted the efforts of Manassa (the Indonesian Manuscript Society) to safeguard over 121,000 manuscripts, the majority of which remain undigitised, with key collections located in Bali, Jakarta, and Aceh. Challenges include limited public awareness, sacred perceptions requiring ceremonial handling, and structural gaps in institutional training.

Dr Cokorda Rai Adi Paramartha from Udayana University addressed the linguistic diversity of Indonesia – home to 780 languages and 40 scripts, only eight (!) of which are in Unicode – and stressed the importance of developing digital tools like a Balinese keyboard to engage the younger generation. Both speakers underscored the role of community collaboration and technological innovation in making manuscripts more accessible and relevant in the digital age.

Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).
Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).

I had the honour – and the absolute pleasure! – of being invited as one of the keynote speakers for this conference. In my talk I explored collaborations between the British Library and Wikisource, focusing on engaging local communities, raising awareness of library collections, facilitating access to digitised books and manuscripts, and enhancing them with accurate transcriptions.

We have previously collaborated with Bengali communities on two competitions to proofread 19th century Bengali books digitised as part of the Two Centuries of Indian Print project. More recently, the Library partnered with the Wikisource Loves Manuscripts (WiLMa) project, sharing Javanese manuscripts digitised through the Yogyakarta Digitisation Project. I’ve highlighted past and present work with Transkribus undertaken to develop Machine Learning training models aimed at automating transcriptions in various languages, encouraging further collaborations that could benefit communities worldwide, and highlighting the potential of such partnerships in expanding access to digitised heritage.

Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.
Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.

Another keynote was delivered by Andy Stauder from the READ-COOP. After introducing the cooperative and Transkribus, Andy talked about a key component of their approach – CCR – which stands for Clean, Controllable, and Reliable data coupled with information extraction (NER), powered by end-to-end ATR (automated text recognition) models. This approach is essential for both training and processing with large language models (LLMs). The future may move beyond pre-training to embrace active learning, fine-tuning, retrieval-augmented generation (RAG), dynamic prompt engineering, and reinforcement learning, with an aim to generate linked knowledge—such as integration with Wikidata IDs. Community collaboration remains central, as seen in projects like the digitisation of Indonesian palm-leaf manuscripts using Transkribus.

Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation
Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation

Cassie Chan (Google APAC Search Partnerships) gave a third keynote on Google's role in digitising and curating cultural and literary heritage, aligning with Wikisource’s mission of providing free access to source texts. Projects like Google Books aim to make out-of-copyright works discoverable online, while Google Arts & Culture showcases curated collections such as the Timbuktu Manuscripts, aiding preservation and accessibility. These efforts support Wikimedia goals by offering valuable, context-rich resources for contributors. Additionally, Google's use of AI for cultural exploration – through tools like Poem Postcards and Art Selfie – demonstrates innovative approaches to engaging with global heritage.

Spotlight on Key Themes and Takeaways

The conference featured so many interesting talks and discussions, providing insights into projects, sharing knowledge, and encouraging collaborations. I’ll mention here just a few themes and some key takeaways, from my perspective as someone working with heritage collections, communities, and technology.

Starting with the latter, a major focus was on Optical Character Recognition (OCR) improvements. Enhanced OCR capabilities on Wikisource platforms not only improve text accuracy but also encourage more volunteers to engage in text correction. Implementing Google OCR, Tesseract, and more recently – Transkribus – are driving increased participation, as volunteers enjoy refining text accuracy. Among other speakers, User:Darafsh, Chairman of the Iranian Wikimedians User Group, mentioned the importance of teaching how to use Wikisource and OCR, and the development of Persian OCR at the University of Hamburg. Other talks relating to technology covered the introduction of new extensions, widgets, and mobile apps, highlighting the push to make Wikisource more user-friendly and scalable.

Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource
Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource

Some discussions explored the potential of WiLMa (Wikisource Loves Manuscripts) as a model for coordinating across stakeholders, ensuring the consistency of tools, and fostering engagement with cultural institutions. For example, Irvin Tomas and Maffeth Opiana talked about WiLMa Philippines. This project launched in June 2024 as the first WiLMa project outside of Indonesia, focusing on transcribing and proofreading Central Bikol texts through activities like monthly proofread-a-thons, a 12-hour transcribe-a-thon, and training sessions at universities.

Another interesting topic was that of Wikidata and Metadata. The integration of structured metadata remains a key area of development, enabling better searchability and linking across digital archives. Bodhisattwa Mandal (West Bengal Wikimedians User Group) talked about Wikisource content including both descriptive metadata and unstructured text. While most data isn’t yet stored in a structured format, using Wikidata enables easier updates, avoids redundancy, and improves search, queries, and visualisation. There are tools that support metadata enrichment, annotation, and cataloguing, and a forthcoming mobile app will allow Wikidata-based book searches. Annotating text with Wikidata items enhances discoverability and link content more effectively across Wikimedia projects.

Working for the British Library, I (naturally!) picked up on a few collaborative projects between Wikisource and public or national libraries. One talk was about a digitisation project for traditional Korean texts, a three-year collaboration with Wikimedia Korea and the National Library of Korea, successfully revitalising the Korean Wikisource community by increasing participation and engaging volunteers through events and partnerships.

Another project built a Wikisource community in Uganda by training university students, particularly from library information studies, alongside existing volunteers. Through practical sessions, collaborative tasks, and support from institutions like the National Library of Uganda and Wikimedia contributors, participants developed digital literacy and archival skills.

Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda
Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda

A third Wikisource and libraries talk was about a Wikisource to public library pipeline project, which started initially in a public library in Hokitika, New Zealand. This pipeline enables scanned public domain books to be transcribed on Wikisource and then made available as lendable eBooks via the Libby app, using OverDrive's Local Content feature. With strong librarian involvement, a clear workflow, and support from a small grant, the project has successfully bridged Wikisource and library systems to increase accessibility and customise reading experiences for library users.

The final session of the conference focused on shaping a future roadmap for Wikisource through community-driven conversation, strategic planning, and partnership development. Discussions emphasised the need for clearer vision, sustainable collaborations with technology and cultural institutions, improved tools and infrastructure, and greater outreach to grow both readership and contributor communities. Key takeaways included aligning with partners’ goals, investing in editor growth, leveraging government language initiatives, and developing innovative workflows. A strong call was made to prioritise people over platforms and to ensure Wikisource remains a meaningful and inclusive space for engaging with knowledge and heritage.

Looking Ahead

The Wikisource 2025 Conference reaffirmed the platform’s importance in the digital knowledge ecosystem. However, sustaining momentum requires ongoing advocacy, technological refinement, and deeper institutional partnerships. Whether through digitising new materials or leveraging already-digitised collections, there is a clear hunger for openly accessible public domain texts.

As the community moves forward, a focus on governance, technology, and strategic partnerships will be essential in shaping the future of Wikisource. The atmosphere was so positive and there was so much enthusiasm and willingness to collaborate – see this fantastic video available via Wikimedia Commons, which successfully captures the sentiment. I’m sure we’re going to see a lot more coming from Wikisource communities in the future!

 

18 March 2025

Help us explore Automatic Text Recognition in cultural heritage institutions!

This post is by Dr Valentina Vavassori, Digital Curator for Automatic Text Recognition.

At the British Library, one of our core values is to "collaborate to do more than we could by ourselves."

In my task to research options for our Automatic Text Recognition (ATR) pipeline, it was clear from the start that it was necessary to talk with different cultural institutions about their own work and processes in ATR and how they integrate it into their digitisation projects.

As part of this research, I have come to realise that the field is full of innovative ideas, with a strong focus on solving problems and learning from one another. 

Therefore, I am now asking people from cultural heritage institutions to complete this survey on how they work (or plan to work) with Automatic Text Recognition.

In the spirit of open access and sharing, the anonymised results will be published so that other institutions can use them.

Additionally, one question at the end of the survey asks if other institutions are interested in taking part in a working group on ATR and, if possible, to share their email so we can kick-start having meetings and discussions.

The survey will only take 5-10 minutes to complete, and it is available here:

https://online1.snapsurveys.com/AutomaticTextRecognition 

I hope you will be able to answer the survey, and I look forward to meeting with anyone who is interested!

13 March 2025

Fantastic Futures 2025 (FF2025) Call for Proposals

Fantastic Futures 2025: AI Everywhere, All at Once 

AI4LAM’s annual conference, December 3 – 5, 2025, British Library, London

The British Library and the Programme Committee for the Fantastic Futures 2025 conference are delighted to invite proposals for presentations and workshops for the Fantastic Futures 2025 conference.  

Fantastic Futures is the annual conference for the AI4LAM (Artificial Intelligence, for Libraries, Archives, Museums) community. Submissions are invited from colleagues around the world about organisations, collections, interest and experience with Artificial Intelligence (AI) and Machine Learning (ML) technologies applied to or developed with cultural, research and heritage collections. This includes practitioners in the GLAM (Galleries, Libraries, Archives, Museums) sector and Digital Humanities, Arts and Social Sciences, Data, Information and Computer Science researchers in Higher Education. 

Key information

  • Call for proposals shared: Thursday 13 March 2025 
  • Conference submission form opens: May 2025
  • Proposal submission deadline: midnight anywhere, Sunday 1 June 2025 
  • Notification of acceptance: 25 July 2025 
  • Conference dates: December 3 – 5, 2025 
  • Location: British Library, London, onsite – with some livestreams and post-event videos 

FF2025 Theme: AI Everywhere, All at Once 

We invite presentations on the theme of 'AI Everywhere, All at Once’. While AI has a long history in academia and practice, the release of public language models like ChatGPT propelled AI into public consciousness. The sudden appearance of AI ‘tools’ in the software we use every day, government consultations on AI and copyright and the hype about Artificial Intelligence mean that libraries, museums and archives must understand what AI means for them. Should they embrace it, resist it or fear it? How does it relate to existing practices and services, how can it help or undermine staff, and how do we keep up with rapid changes in the field? 

There are many opportunities and many challenges in delivering AI that create rich, delightful and immersive experiences of GLAM collections and spaces for the public, and meet the needs of researchers for relevant, reliable and timely information. Challenges range from the huge – environmental and economic sustainability, ensuring alignment with our missions, ethical and responsible AI, human-centred AI, ensuring value for money – to the practical – evaluation, scalability, cyber security, multimodal collections – and throughout it all, managing the pace of change. 

Our aim is to promote interdisciplinary conversations that foster broader understandings of AI methods, practices and technologies and enable critical reflections about collaborative approaches to research and practice. 

Themes   

We’re particularly interested in proposals that cover these themes:   

  • Ethical and Responsible AI 
  • Human-Centred AI / the UX of AI 
  • Trust, AI literacy and society 
  • Building AI systems for and with staff and users 
  • Cyber-security and resilience 
  • Interoperability and standards
  • FAIR, CARE, rights and copyright
  • Benchmarking AI / machine learning 
  • Regional, national, international approaches to AI 
  • Environmental sustainability  

Formats for presentations (Thursday, Friday December 4-5) 

  • Lightning talk: 5 mins. These might pitch an idea, call for collaborators, throw out a provocation or just provide a short update 
  • Poster  - perfect for project updates – what went well, what would you do differently, what lessons can others take? 
  • Short presentation: 15 mins   
  • Long presentation: 30 mins 
  • Panel: 45 mins, multiple presenters with short position statements then discussion 

Formats for workshops or working group sessions (Wednesday December 3) 

  • Formal, instructor-led sessions, including working groups, tutorials, hands-on workshops – 1 or 2 hours 
  • Informal, unstructured sessions, including unconferences, meetups, hacking – 1 or 2 hours 
  • Digital showcase (demo): 30 mins 

We value the interactions that an in-person event enables, so the default mode for this event is in-person presentations. However, if your proposal is accepted for inclusion in the conference but you are not able to travel to London, we can consider arrangements for making a virtual presentation on a case-by-case basis. Please contact the Programme Committee at [email protected] to discuss. 

The conference will be held over three days: one day of workshops and other events, and two days of formal sessions. The social programme will include opportunities for informal networking.  

Plenary sessions on Thursday and Friday will be livestreamed, recorded and published. 

Find out more and get updates 

  • Check the AI4LAM FF2025 page for updates, including Frequently Asked Questions and information on our review criteria
  • Organisers: Rossitza Atanassova, Neil Fitzgerald and Mia Ridge, British Library

Further details about the conference submission process and registration will be supplied soon. 

This post was last updated 16 May 2025.

24 January 2025

Universal Viewer v4.1.0 is here!

We’re excited to announce the release of Universal Viewer (UV) version 4.1.0, packed with new updates and features. 

Universal Viewer image controls
New image manipulation controls in UV 4.1.0

 

This version builds on the momentum from our community accessibility sprint, where the wider UV community came together to address key usability challenges. Highlights of the new release include: 

Accessibility Improvements:

  • Easier navigation for keyboard-only users.
  • Better support for assistive technologies such as screen readers.
  • Improved contrast and visibility of page elements.

New Features:

  • Image Controls: Adjust brightness, contrast, and saturation directly within the viewer.
  • Index Panel Configuration: A new setting allows the index panel to open by default when viewing collections. 

Bug Fixes & Security Updates:

  • Several bugs resolved to enhance stability and performance.
  • Dependency updates to ensure the Universal Viewer remains secure and up to date.

For the full details of what’s new, check out the release notes on GitHub.

Interested in joining the Universal Viewer community? To get involved join us on Slack, or follow UV on Bluesky or Mastodon to stay connected.

08 January 2025

2024 Year in Review - Digital Scholarship Training Programme

Nora McGregor, Digital Curator and manager of the Digital Scholarship Training Programme reflects on a year of delivering digital upskilling training to colleagues at British Library, part of the Digital Research Team's focus on Embedding Digital Humanities in the British Library | 39 | The Digital.

2024 was a strange and difficult year, to say the least, for us and all our lovely colleagues across the whole of the British Library as we contended daily with the ongoing effects of a cyber-attack disrupting just about every aspect of our work. Not to be cowed by criminality however, the Digital Research Team dug in and ensured the Digital Scholarship Training Programme (DSTP) continued without fail.

From our experience during the pandemic, we knew that in times of major disruption, British Library staff do not stand still. They focus on what they can do, including prioritising their upskilling and have come to count on the DSTP as a kind of refuge whilst temporarily separated from their collections and normal workload.

So it’s with gratefulness to my colleagues in the Digital Research Team, and to BL staff for their engagement, that I reflect proudly on a challenging year where we managed to deliver a whopping 39 individual training events with nearly 900 attendees!   

What we learned in 2024

Our training programme this year covered these topic priorities through a variety of talks, hands-on sessions, reading groups and formal workshops & courses: 

  • State-of-the-art Automatic Text Recognition (ATR) technologies
  • Useful data science, machine learning and AI applications for analysing and enhancing GLAM digital collections and data​
  • The intersection of climate change + Digital Humanities
  • Digital tools and methods to support the Library's Race Equality Action Plan
  • WikiData, WikiSource, Wikimedia Commons
  • OpenRefine for data-wrangling 
  • Collections as Data
  • Making the most of the IIIF standard

We’re especially thankful for all the academics & professionals who contributed to our learning throughout the year by sharing their projects, experience and expertise with us! If you’d like to be part of our programme in 2025 get in touch with us at [email protected] with your idea, we’d love to hear from you.

2024 Year in Review-External Infographic by Nora McGregor

My Personal Highlights 

In the coming months I will be interviewing my fellow Digital Curators to get their views on highlights from the 2024 Digital Scholarship Training Programme, either favourite events they attended or programmed in 2024 and topic areas they’re excited about this year. No easy ask actually, as I know they, like me, will have found every event spectacularly interesting and useful, but to highlight just a few for you...

21st Century Talks

Our 21st Century Curatorship talk series is looked after by Digital Curators Stella Wisdom and Adi Kienan-Schoonbaert. They are 1 hour invited guest lectures held once or twice a month where we learn about exciting, innovative, projects and research at the intersection of cultural heritage collections and new technologies. These talks are pitched for complete beginners – we try not to assume knowledge so that anyone from any department can come along! A few of my favourite talks in particular were from these projects:

  • DE-BIAS - Detecting and cur(at)ing harmful language in cultural heritage collections | Europeana PRO
    Kerstin Herlt and Kerstin Arnold introduced us to the DE-BIAS project which aims to detect and contextualise potentially harmful language in cultural heritage collections. Working with themes like migration and colonial past, gender and sexual identity, ethnicity and ethno-religious identity, the project collaborates with minority communities to better understand the stories behind the language used - or behind the gaps apparent. We learned about the development of the vocabulary and the tools the project has created.

  • The Print and Probability Project: From Restoration Era Printing to an Interim English Short Title Catalogue
    Nikolai Vogler gave us an entertaining view of a selection of findings from the University of California’s Print & Probability project, an interdisciplinary research group at the intersection of book history, computer vision, and machine learning that seeks to discover Restoration-era letterpress printers whose identities have eluded scholars for several hundred years. He also presented his work on creating an interim English Short Title Catalogue (ESTC) in response to the cyber-attack on the Library in 2023, a pursuit for which colleagues were incredibly grateful for!

  • “Dark Matter: X%” - how many early modern Hungarian books disappeared without any trace?
    This was such a fascinating talk by Péter Király, software developer and digital humanities researcher at the Göttingen computation centre, Germany. Estimating the unknown is always an interesting endeavour. There is a registry of surviving books, and we have collective knowledge about lost books, but how many early Hungarian printings have been lost without any historical trace? Their research group transformed the analytical bibliography "Régi Magyarországi Nyomtatványok" (Early Hungarian Printings) into a database and the use of mathematical models from the toolbox of biologists were employed to help estimate it. The analysis of the database also highlights unknown or less investigated areas and enables them to extend previous research focusing on a particular time range to the whole period (such as religious trends during reformation and counter reformation, the changes of genres over times).

Hack & Yacks

I have the privilege of programming and leading this particular series of events and they are my favourite days in the calendar! These are our casual, 2hr monthly meet ups where we all take some time to have a hands-on exploration of new tools, techniques, and applications. No previous experience is ever needed, these are aimed at complete beginners (we’re usually learning something new too!) and we welcome colleagues from across the Library to come have a play! Some sessions are more "yack" than "hack", while others are more quiet hacking depending on the topic but no matter the balance they're always illuminating.

  • Introduction to AI and Machine learning was great fun for me personally as I had the chance to give staff an interactive and hands-on introduction to concepts around AI and ML, as it relates to library work, and play around with some open machine learning tools. The session was based on much of the text and activities offered in this topic guide AI & ML in Libraries– Digital Scholarship & Data Science Topic Guides for Library Professionals and it was a useful way for me to test the content directly with its intended audience!

  • Catalogues as Data was a session run by Harry Lloyd our Research Software Engineer Extraordinaire and Rossitza Atanassova, Digital Curator, as a two part guided exploration of printed Catalogues as data, working with OCR output and corpus linguistic analysis. In the first half we followed steps in a Jupyter Notebook to extract catalogue entries from OCR text, troubleshoot errors in the algorithm, and investigate Named Entity Recognition techniques. In the second half we explore catalogue entries using corpus linguistic techniques using AntConc, gaining a sense of how cataloguing practice and the importance of different terms changes over time.

Digital Scholarship Reading Group

These monthly discussions led by Digital Curators Mia Ridge and Rossitza Atanassova, are always open to any of our BL colleagues & students, regardless of job title or department. Discussions are regularly attended by colleagues from a range of departments including curators, reference specialists, technology, and research services.

My favourite session of the year by far was “No stupid questions, AI in Libraries”, a lovely meandering session we held in December and a great way to wrap up the year. Instead of discussing any particular reading, we all shared bits about what we had read or learned about independently on the topic of AI in Libraries and had some good-natured debate about where we believe it’s all headed for us on personal and professional levels. Though no readings were required, these were offered in case folks wanted to swot up:

Formal Workshops

We also programme formal courses as needed and this year we focussed very much on building our knowledge of the Wikimedia Universe. I thoroughly enjoyed the lessons we got from Lucy Hinnie and Stuart Prior which covered nearly every aspect of Wikimedia, and we’ll doing much more with this new knowledge, particularly WikiData in 2025!