Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

29 May 2025

Discover Digital Sustainability

This blog post is byDr Adi Keinan-Schoonbaert, Digital Curator forAsian and African Collections, British Library. She's on Mastodon as@[email protected] and Bluesky as @adi-keinan.bsky.social

 

As someone with an ongoing interest in all things environmental sustainability – especially where digital work and technologies are concerned – I was highly motivated to plan and coordinate a month-long series of training events entitled “Discover Digital Sustainability”. Ran as part of the British Library’s Digital Scholarship Training Programme (DSTP) in March 2025, this training series explored the intersection of technology, digital practices, and environmental responsibility. Designed to spark practical climate action, it brought together staff from across the organisation, offering them the knowledge and tools needed to reduce the environmental impact of their digital work.

Building on the success of previous initiatives – including the DHCC/BL workshop in early 2024, ongoing Carbon Literacy Training, and the launch of the British Library’s Sustainability and Climate Change Strategy – this comprehensive programme featured a rich blend of talks, interactive workshops, and a reflective reading group session. Attendees engaged with topics ranging from data management, digital preservation, digitisation, and web design to the environmental implications of AI, hardware use, as well as procurement and supply chains.

Throughout the month, staff heard from experts in heritage, academia, and technology, as well as British Library colleagues leading by example in rethinking workflows and adopting greener digital practices. The training not only deepened our understanding of how to measure and reduce the carbon footprint of digital operations but also inspired a broader cultural shift toward sustainability in the way we create, manage, and deliver digital services.

 

Reading Group

The series kicked off with a dedicated reading group session focused on the theme of data and digital waste. It sparked thoughtful discussion around the environmental impacts of data creation, storage, and retention. Participants explored how to balance the social value of data with its environmental cost, drawing on key resources, mainly from Loughborough University’s Digital Decarbonisation project, which highlights the concept of “dark data” and the hidden carbon footprint associated with unused or unnecessary data.

The conversation set the stage for a timely follow-up webinar later that week, entitled “Your Data Carbon Footprint: What It Is & Why It Matters”, hosted by Leadership Through Data and featuring Professors Tom Jackson and Ian Hodgkinson from the Digital Decarbonisation project. Several attendees joined this session, which offered deeper insights into the scale and significance of data-related emissions and practical steps for managing digital waste more sustainably.

Screenshot of the “Digital Decarbonisation” homepage
Screenshot of the “Digital Decarbonisation” homepage

 

Staff Talks

The training series featured six engaging staff talks (we call them 21st Century Curatorship Talks), each offering unique perspectives on the challenges and opportunities of embedding sustainability into digital practices across the cultural, academic and technology sectors.

Katie Espley from the Library’s Corporate Information Management Unit shared practical strategies in her “Records Management Refresh” session, highlighting best practices for organising digital workspaces to reduce clutter and improve efficiency. Dr Nicôle Meehan (University of St Andrews) examined the environmental impact of museum digitisation, encouraging institutions to take action to reduce its carbon cost. From the tech industry, Lewis Richards (Chief Sustainability Officer, Microsoft UK) provided insights into Microsoft’s sustainability initiatives and how they are addressing environmental impact at scale.

Stacey Anderson (The Box Plymouth) contributed a session on sustainable digital preservation in museums and archives, while David Mahoney (University of Edinburgh) explored sustainable web design, offering practical ways to lower the environmental footprint of online content. Finally, Jon Ray (Oxford GLAM) presented the University of Oxford GLAM’s efforts to develop a digital sustainability action plan, showcasing how cultural organisations can take meaningful steps to reduce their digital carbon footprint. Together, these talks illustrated the breadth of sustainable digital practice and inspired staff to think critically about their own work.

 

Workshops

Three hands-on workshops offered staff the opportunity to dive deeper into the practicalities of digital sustainability, each approaching the topic from a different angle. In the “Future of Digital Sustainability”, Jo Walton and Nathalie Huegler from the University of Sussex Digital Humanities Lab led an interactive session and card game that explored the environmental impact of digital technologies and strategies for building more sustainable digital practices. Bailey Bryan and Tommy Ferry from the digital agency Wholegrain Digital followed with a focused workshop on the carbon footprint of web design, guiding participants through actionable steps to create more environmentally friendly websites.

The “Future of Digital Sustainability” workshop instructors and participants in action
The “Future of Digital Sustainability” workshop instructors and participants in action

Finally, a Hack & Yack session, led by Nora McGregor, brought staff together in a collaborative effort to co-create a Digital Sustainability online guide tailored for library professionals. This guide forms part of a broader initiative by the Digital Scholarship and Digital Cultural Heritage Collections Working Group (a LIBER working group chaired by Nora), now working in partnership with LIBER’s Data Science in Libraries group to build a centralised staff skills hub – a trusted resource for training at the intersection of libraries, cultural heritage, and digital technologies.

 

Key Takeaways

The different presentations and discussions brought to light the complex challenges and opportunities that lie at the intersection of digital practice and environmental responsibility. One of the recurring themes was the widespread underestimation or complete lack of measurement around digital-related carbon emissions. Many organisations still operate without clear data hygiene policies, and there is a notable lack of consistent, trustworthy reporting from digital service providers on their sustainability practices. This is compounded by limited awareness of the environmental footprint of routine digital activities, including hardware procurement, software use, and long-term digital storage.

Speakers highlighted the tensions many institutions face in balancing environmental goals with the practicalities of funding requirements, user experience expectations, and organisational responsibilities. The use of AI was a particular area of concern – not only for its increasing energy demands but also for the need to use it more responsibly and strategically. The importance of informing policy change and providing the sector with practical, actionable resources was emphasised as a priority for enabling progress.

The talks and workshops also underscored the importance of data and facts in making the case for change. Data centres alone are responsible for between 2.5% and 3.7% of global carbon emissions – potentially exceeding the aviation sector. Additionally, up to 65% of stored data is considered “dark” (unused), with another 15% classified as redundant. Reducing this unnecessary data load is essential, and speakers offered a range of sustainability strategies to help. These included selective digitisation that prioritises high-value or at-risk collections, the use of lossless compression formats to reduce file size without compromising quality, and scheduling energy-intensive processes like integrity checks during off-peak hours.

Other practical recommendations included sustainable web design principles, such as optimising images, reducing page weight, and minimising third-party scripts. Speakers also noted encouraging signs of progress in the form of evolving sector standards for sustainable digital archiving, and the growing emphasis on green procurement, evaluating vendors not just on price and functionality but also on their environmental credentials.

Finally, a clear call to action emerged around the need for collective effort. Sector-wide collaboration was viewed as essential for setting shared standards, influencing common suppliers, and fostering open dialogue. Tools such as carbon calculators and tracking toolkits can support these efforts, but lasting change will depend on sustained joint action and a willingness to embed sustainability into every aspect of digital practice. Importantly, this work must be done with a strong commitment to equity, ensuring that the push for digital efficiency does not come at the cost of access or inclusion.

 

What did participants think?

The training series saw a whopping number of 88 participants from many teams and department across the Library, many of which attended more than one event. A survey was conducted at the end of the training series, to assess the relevance of topics to both professional digital working and personal life, as well as gauge impact: what changes were participants going to make? How will their behaviour change? With some questions mandatory and some optional, the survey had 22 respondents in total.

On a scale from None (1) to Advanced (5), participants rated their knowledge of digital sustainability at 2.36 on average before the series, and at 3.64 on average afterwards. This demonstrates an increased level of confidence in understanding relevant topics. Survey results also show that almost everyone has learned new things, or reinforced things they’ve already known about. Most respondents also shared or intend to share knowledge acquired from the series, and most have made or intend on making changes in their professional role, with some also in their personal lives. Almost all of them would recommend similar events to colleagues in the future.

Agree/Disagree statements from the series' feedback survey
Agree/Disagree statements from the series' feedback survey

Looking into the changes that respondents were going to make, it was evident that participants were thinking critically about digital sustainability, from practical day-to-day actions to influencing organisational change. Several mentioned file and data management, for example reducing the duplication of files, doing a ‘spring clean’, or deleting unnecessary emails. Others talked about sustainable digital practices, like encouraging a climate aware task scheduler function built in for large jobs in development environments, or a desire to create a sustainability checklist for web editing. Several pointed to the importance of advocacy, knowledge sharing and team engagement, for example influencing policies, sharing knowledge about cloud storage considerations, encouraging mindful use of storage in platforms like MS Teams, or telling people about the Hugging Face list of models that require less resources.

Participants highlighted several aspects of the training series that they found especially valuable and enjoyable. The sessions were praised for being accessible to non-specialists, with clear and engaging presentations that made complex topics understandable. Attendees appreciated the opportunity to learn from experts across different fields, gaining new perspectives and recognising shared challenges. The diversity of talks throughout the month helped reinforce key ideas, making the learning experience more cohesive and impactful. Interactive elements, such as the “Future of Digital Sustainability” workshop and the educational card game, were particularly well received for being both fun and informative. Many found the enthusiasm and expertise of the speakers inspiring and valued the sense of community fostered throughout the series – feeling connected to others who are equally committed to addressing important issues.

We had some thoughtful and constructive feedback on how the training series could be improved. A recurring suggestion was to spread out the events over a longer period, with slightly shorter presentations and more interactive elements to keep engagement high. Several attendees expressed a desire for more practical, hands-on content, such as Hack & Yack sessions that focus on applying ideas directly to their work environments. There was also interest in exploring how digital sustainability can be balanced with usability and visual appeal, particularly for those working in online public engagement. A few attendees found the themes somewhat repetitive and proposed a structured progression – starting with theory and moving into application – for those attending multiple sessions. Additionally, ideas were raised about revisiting key topics in future sessions to keep the momentum going and share updates.

My favourite testimonial would probably be:

“I just wanted to drop a quick note to say how great it is that you're running these sessions. I think we often struggle to take environmental impacts into consideration when appraising options for digital systems and data projects so this is incredibly welcome.”

 

Feedback on the “Future of Digital Sustainability”

The “Future of Digital Sustainability” workshop received separate feedback, which was also really helpful. Participants found the workshop to be a highly engaging, enjoyable, and accessible way to explore the complex subject of digital sustainability. Many emphasised the fun, playful nature of the learning experience, noting how it made the subject matter more approachable and digestible. The interactive format encouraged rich discussions, peer learning, and collective reflection, allowing participants to learn not only from the facilitators but also from the expertise of their colleagues. Several appreciated the balance between educational depth and a light, participatory tone, saying it was both thought-provoking and practical. The workshop was seen as a great tool for building shared understanding and momentum within teams, with participants leaving more informed and motivated.

5-workshop2
Interactive session at the end of the “Future of Digital Sustainability” workshop

Suggestions for improvement focused mostly on refining the structure of the card game used in the workshop. Some participants noted a need to better balance the depth of discussion with the pace of gameplay, proposing fewer cards, clearer ground rules, and a short introduction to the game strategy. Others recommended changes to the mechanics – such as modifying hand sizes, how actions and events are played, and how points are distributed – to reduce repetition and enhance flow. A few attendees also wished for a better way to track the many concepts raised during the game. Despite these suggestions, the consensus was that the game was a valuable and enjoyable tool. Participants also expressed interest in future collaboration and emphasised the importance of translating insights into actionable decisions across the organisation.

And as for my favourite quote –

“The workshop was soooo good! I hope you're able to run it again so that more BL folk can attend. I think we all left feeling more positive, informed and empowered than we entered.”

 

Next Steps

Organising the “Discover Digital Sustainability” training series at the British Library was an incredibly rewarding experience. I learned a great deal through the process and especially valued the chance to connect with other organisations and professionals working in this space. These conversations were not only inspiring but also opened up exciting possibilities for future collaboration. I’m proud that we were able to deliver something both meaningful and impactful, helping to bring the topic of digital sustainability to the forefront within our institution.

Looking ahead, I’m optimistic that many of the insights and ideas from the series will be put into practice – in line with the Library’s strategic action plan – across teams, departments, and individuals. There’s more to come too: I’ll be speaking at DH2025 in Lisbon this July, sharing our work through a paper on “Digital Humanities and Environmental Sustainability at the British Library,” and also plan to publish the topic guide on digital sustainability later this summer. Plus, we’ll be running the “Future of Digital Sustainability” workshop and card game again during Green Libraries Week, this time at our Boston Spa site. Our commitment to learning continues, and we’ll keep seeking new opportunities to support staff through the Digital Scholarship Training Programme.

 

23 April 2025

DHNB 2025 - Digital Humanities in the Nordic and Baltic Countries Conference Report

This post is by Helena Byrne, Curator of Web Archives.

Conference banner with an image of the Estonian National Museum on blue and purple background
DHNB 2025 Conference Banner

This year’s Digital Humanities in the Nordic and Baltic countries conference took place at the Estonian National Museum in Tartu. Last year was the first time I attended the DHNB conference (report available on Digital Scholarship Blog). The theme for this year was “Digital Dreams and Practices”. There were pre-conference workshops from March 3-4 with the main conference starting on the morning of March 5 and finishing on March 7. I participated in the Web Archive Collections as Data workshop held in the morning session on day two. 


This was a big conference with about 200 researchers and GLAM sector participants who attended from organisations based all over Europe as well as Japan. With such a big attendance there were multiple parallel sessions on each day. A detailed overview of the programme is available to download from the DHNB website. There was also a large poster presentation session at the end of day two of the conference. In the main hall all presenters had one minute to introduce their poster before going onto the floor to discuss the wide variety of topics in more detail.

Posters on 10 stands lined up against windows in the museum hallway.
Posters on display at the DHNB 2025 Conference

 There was a keynote on each day of the conference. The second day keynote was by Andrea Kocsis from Edinburgh University and current National Librarian’s Research Fellow in Digital Scholarship 2024-25 at the National Library of Scotland. She has worked closely with UK Web archive colleagues across the UK Legal Deposit Libraries to make the collections more accessible to wider audiences.

All three keynotes are available to watch on the DHNB website - https://dhnb.eu/conferences/dhnb2025/keynote-speakers/ 

It is hard to pick one highlight out of such a rich conference but I think it would be the presentation Collecting memories of the early internet by Johanna Arnesson, Evelina Liliequist, Coppélie Cocq from Umeå University, Sweden. The abstract is available on page 24 of the Programme Book of Abstracts. One of the key takeaways from this presentation was that more case studies from different countries are required. So far there have only been a few case studies that have reviewed early memories and/or experiences of the internet but people would have experienced the internet differently depending on their home country, age, socioeconomic status etc. It would be interesting to see researchers using the UK Web Archive resources to run a similar study in the UK.

Poster presenters lined up in front of the screen on stage in the conference auditorium.
Poster Slam at the DHNB 2025 Conference

Although the National Library of Estonia building is currently closed for renovation, I was delighted that I could meet up with their web archivist to discuss web archiving challenges and opportunities in Estonia. 

For a more detailed report on the Web Archive Collections as Data workshop see the UK Web Archive blog.

09 April 2025

Wikisource 2025 Conference: Collaboration, Innovation, and the Future of Digital Texts

This blog post is byDr Adi Keinan-Schoonbaert, Digital Curator forAsian and African Collections, British Library. She's on Mastodon as@[email protected] and Bluesky as @adi-keinan.bsky.social

 

The Wikisource 2025 Conference, held in the lush setting of Bali, Indonesia between 14-16 February 2025, brought together a global community of Wikimedians, heritage enthusiasts, and open knowledge advocates. Organised by a coalition of Wikisource contributors, Wikimedia Foundation and Wikimedia Indonesia, the conference served as a dynamic space to discuss the evolving role of Wikisource, explore new technologies, and strengthen collaborations with libraries, cultural institutions, and other global stakeholders.

Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.
Wikisource Conference 2025 participants. Photo by Memora Productions for Wikimedia Indonesia.

The conference, themed “Wikisource: Transform & Preserve Heritage Digitally,”  featured a rich programme of keynote talks, long presentations, lightning talks, and informal meet-ups. Central themes included governance, technological advancements, community engagement, and the challenge of scaling Wikisource as a set of collaborative, multilingual platforms. We also enjoyed a couple of fantastic cultural events, celebrating the centuries-old, unique heritage of Bali!

Keynotes and Indonesian Partnerships

Following a kick-off session on the state of Wikisource community and technology, several Indonesian partners shared insights into their work on heritage, preservation, and digital accessibility. Dr Munawar Holil (Kang Mumu) highlighted the efforts of Manassa (the Indonesian Manuscript Society) to safeguard over 121,000 manuscripts, the majority of which remain undigitised, with key collections located in Bali, Jakarta, and Aceh. Challenges include limited public awareness, sacred perceptions requiring ceremonial handling, and structural gaps in institutional training.

Dr Cokorda Rai Adi Paramartha from Udayana University addressed the linguistic diversity of Indonesia – home to 780 languages and 40 scripts, only eight (!) of which are in Unicode – and stressed the importance of developing digital tools like a Balinese keyboard to engage the younger generation. Both speakers underscored the role of community collaboration and technological innovation in making manuscripts more accessible and relevant in the digital age.

Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).
Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).

I had the honour – and the absolute pleasure! – of being invited as one of the keynote speakers for this conference. In my talk I explored collaborations between the British Library and Wikisource, focusing on engaging local communities, raising awareness of library collections, facilitating access to digitised books and manuscripts, and enhancing them with accurate transcriptions.

We have previously collaborated with Bengali communities on two competitions to proofread 19th century Bengali books digitised as part of the Two Centuries of Indian Print project. More recently, the Library partnered with the Wikisource Loves Manuscripts (WiLMa) project, sharing Javanese manuscripts digitised through the Yogyakarta Digitisation Project. I’ve highlighted past and present work with Transkribus undertaken to develop Machine Learning training models aimed at automating transcriptions in various languages, encouraging further collaborations that could benefit communities worldwide, and highlighting the potential of such partnerships in expanding access to digitised heritage.

Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.
Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.

Another keynote was delivered by Andy Stauder from the READ-COOP. After introducing the cooperative and Transkribus, Andy talked about a key component of their approach – CCR – which stands for Clean, Controllable, and Reliable data coupled with information extraction (NER), powered by end-to-end ATR (automated text recognition) models. This approach is essential for both training and processing with large language models (LLMs). The future may move beyond pre-training to embrace active learning, fine-tuning, retrieval-augmented generation (RAG), dynamic prompt engineering, and reinforcement learning, with an aim to generate linked knowledge—such as integration with Wikidata IDs. Community collaboration remains central, as seen in projects like the digitisation of Indonesian palm-leaf manuscripts using Transkribus.

Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation
Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation

Cassie Chan (Google APAC Search Partnerships) gave a third keynote on Google's role in digitising and curating cultural and literary heritage, aligning with Wikisource’s mission of providing free access to source texts. Projects like Google Books aim to make out-of-copyright works discoverable online, while Google Arts & Culture showcases curated collections such as the Timbuktu Manuscripts, aiding preservation and accessibility. These efforts support Wikimedia goals by offering valuable, context-rich resources for contributors. Additionally, Google's use of AI for cultural exploration – through tools like Poem Postcards and Art Selfie – demonstrates innovative approaches to engaging with global heritage.

Spotlight on Key Themes and Takeaways

The conference featured so many interesting talks and discussions, providing insights into projects, sharing knowledge, and encouraging collaborations. I’ll mention here just a few themes and some key takeaways, from my perspective as someone working with heritage collections, communities, and technology.

Starting with the latter, a major focus was on Optical Character Recognition (OCR) improvements. Enhanced OCR capabilities on Wikisource platforms not only improve text accuracy but also encourage more volunteers to engage in text correction. Implementing Google OCR, Tesseract, and more recently – Transkribus – are driving increased participation, as volunteers enjoy refining text accuracy. Among other speakers, User:Darafsh, Chairman of the Iranian Wikimedians User Group, mentioned the importance of teaching how to use Wikisource and OCR, and the development of Persian OCR at the University of Hamburg. Other talks relating to technology covered the introduction of new extensions, widgets, and mobile apps, highlighting the push to make Wikisource more user-friendly and scalable.

Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource
Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource

Some discussions explored the potential of WiLMa (Wikisource Loves Manuscripts) as a model for coordinating across stakeholders, ensuring the consistency of tools, and fostering engagement with cultural institutions. For example, Irvin Tomas and Maffeth Opiana talked about WiLMa Philippines. This project launched in June 2024 as the first WiLMa project outside of Indonesia, focusing on transcribing and proofreading Central Bikol texts through activities like monthly proofread-a-thons, a 12-hour transcribe-a-thon, and training sessions at universities.

Another interesting topic was that of Wikidata and Metadata. The integration of structured metadata remains a key area of development, enabling better searchability and linking across digital archives. Bodhisattwa Mandal (West Bengal Wikimedians User Group) talked about Wikisource content including both descriptive metadata and unstructured text. While most data isn’t yet stored in a structured format, using Wikidata enables easier updates, avoids redundancy, and improves search, queries, and visualisation. There are tools that support metadata enrichment, annotation, and cataloguing, and a forthcoming mobile app will allow Wikidata-based book searches. Annotating text with Wikidata items enhances discoverability and link content more effectively across Wikimedia projects.

Working for the British Library, I (naturally!) picked up on a few collaborative projects between Wikisource and public or national libraries. One talk was about a digitisation project for traditional Korean texts, a three-year collaboration with Wikimedia Korea and the National Library of Korea, successfully revitalising the Korean Wikisource community by increasing participation and engaging volunteers through events and partnerships.

Another project built a Wikisource community in Uganda by training university students, particularly from library information studies, alongside existing volunteers. Through practical sessions, collaborative tasks, and support from institutions like the National Library of Uganda and Wikimedia contributors, participants developed digital literacy and archival skills.

Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda
Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda

A third Wikisource and libraries talk was about a Wikisource to public library pipeline project, which started initially in a public library in Hokitika, New Zealand. This pipeline enables scanned public domain books to be transcribed on Wikisource and then made available as lendable eBooks via the Libby app, using OverDrive's Local Content feature. With strong librarian involvement, a clear workflow, and support from a small grant, the project has successfully bridged Wikisource and library systems to increase accessibility and customise reading experiences for library users.

The final session of the conference focused on shaping a future roadmap for Wikisource through community-driven conversation, strategic planning, and partnership development. Discussions emphasised the need for clearer vision, sustainable collaborations with technology and cultural institutions, improved tools and infrastructure, and greater outreach to grow both readership and contributor communities. Key takeaways included aligning with partners’ goals, investing in editor growth, leveraging government language initiatives, and developing innovative workflows. A strong call was made to prioritise people over platforms and to ensure Wikisource remains a meaningful and inclusive space for engaging with knowledge and heritage.

Looking Ahead

The Wikisource 2025 Conference reaffirmed the platform’s importance in the digital knowledge ecosystem. However, sustaining momentum requires ongoing advocacy, technological refinement, and deeper institutional partnerships. Whether through digitising new materials or leveraging already-digitised collections, there is a clear hunger for openly accessible public domain texts.

As the community moves forward, a focus on governance, technology, and strategic partnerships will be essential in shaping the future of Wikisource. The atmosphere was so positive and there was so much enthusiasm and willingness to collaborate – see this fantastic video available via Wikimedia Commons, which successfully captures the sentiment. I’m sure we’re going to see a lot more coming from Wikisource communities in the future!