This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected] and Bluesky as @adi-keinan.bsky.social
The Wikisource 2025 Conference, held in the lush setting of Bali, Indonesia between 14-16 February 2025, brought together a global community of Wikimedians, heritage enthusiasts, and open knowledge advocates. Organised by a coalition of Wikisource contributors, Wikimedia Foundation and Wikimedia Indonesia, the conference served as a dynamic space to discuss the evolving role of Wikisource, explore new technologies, and strengthen collaborations with libraries, cultural institutions, and other global stakeholders.
The conference, themed “Wikisource: Transform & Preserve Heritage Digitally,” featured a rich programme of keynote talks, long presentations, lightning talks, and informal meet-ups. Central themes included governance, technological advancements, community engagement, and the challenge of scaling Wikisource as a set of collaborative, multilingual platforms. We also enjoyed a couple of fantastic cultural events, celebrating the centuries-old, unique heritage of Bali!
Keynotes and Indonesian Partnerships
Following a kick-off session on the state of Wikisource community and technology, several Indonesian partners shared insights into their work on heritage, preservation, and digital accessibility. Dr Munawar Holil (Kang Mumu) highlighted the efforts of Manassa (the Indonesian Manuscript Society) to safeguard over 121,000 manuscripts, the majority of which remain undigitised, with key collections located in Bali, Jakarta, and Aceh. Challenges include limited public awareness, sacred perceptions requiring ceremonial handling, and structural gaps in institutional training.
Dr Cokorda Rai Adi Paramartha from Udayana University addressed the linguistic diversity of Indonesia – home to 780 languages and 40 scripts, only eight (!) of which are in Unicode – and stressed the importance of developing digital tools like a Balinese keyboard to engage the younger generation. Both speakers underscored the role of community collaboration and technological innovation in making manuscripts more accessible and relevant in the digital age.
Dr Munawar Holil (left), Dr Cokorda Rai Adi Paramartha (right) and session moderator Ivonne Kristiani (WMF; centre).
I had the honour – and the absolute pleasure! – of being invited as one of the keynote speakers for this conference. In my talk I explored collaborations between the British Library and Wikisource, focusing on engaging local communities, raising awareness of library collections, facilitating access to digitised books and manuscripts, and enhancing them with accurate transcriptions.
We have previously collaborated with Bengali communities on two competitions to proofread 19th century Bengali books digitised as part of the Two Centuries of Indian Print project. More recently, the Library partnered with the Wikisource Loves Manuscripts (WiLMa) project, sharing Javanese manuscripts digitised through the Yogyakarta Digitisation Project. I’ve highlighted past and present work with Transkribus undertaken to develop Machine Learning training models aimed at automating transcriptions in various languages, encouraging further collaborations that could benefit communities worldwide, and highlighting the potential of such partnerships in expanding access to digitised heritage.
Dr Adi Keinan-Schoonbaert delivering a keynote address at the conference. Photo by Memora Productions for Wikimedia Indonesia.
Another keynote was delivered by Andy Stauder from the READ-COOP. After introducing the cooperative and Transkribus, Andy talked about a key component of their approach – CCR – which stands for Clean, Controllable, and Reliable data coupled with information extraction (NER), powered by end-to-end ATR (automated text recognition) models. This approach is essential for both training and processing with large language models (LLMs). The future may move beyond pre-training to embrace active learning, fine-tuning, retrieval-augmented generation (RAG), dynamic prompt engineering, and reinforcement learning, with an aim to generate linked knowledge—such as integration with Wikidata IDs. Community collaboration remains central, as seen in projects like the digitisation of Indonesian palm-leaf manuscripts using Transkribus.
Andy Stauder (READ-COOP) talking about collaboration around the Indonesian palm-leaf manuscripts digitisation
Cassie Chan (Google APAC Search Partnerships) gave a third keynote on Google's role in digitising and curating cultural and literary heritage, aligning with Wikisource’s mission of providing free access to source texts. Projects like Google Books aim to make out-of-copyright works discoverable online, while Google Arts & Culture showcases curated collections such as the Timbuktu Manuscripts, aiding preservation and accessibility. These efforts support Wikimedia goals by offering valuable, context-rich resources for contributors. Additionally, Google's use of AI for cultural exploration – through tools like Poem Postcards and Art Selfie – demonstrates innovative approaches to engaging with global heritage.
Spotlight on Key Themes and Takeaways
The conference featured so many interesting talks and discussions, providing insights into projects, sharing knowledge, and encouraging collaborations. I’ll mention here just a few themes and some key takeaways, from my perspective as someone working with heritage collections, communities, and technology.
Starting with the latter, a major focus was on Optical Character Recognition (OCR) improvements. Enhanced OCR capabilities on Wikisource platforms not only improve text accuracy but also encourage more volunteers to engage in text correction. Implementing Google OCR, Tesseract, and more recently – Transkribus – are driving increased participation, as volunteers enjoy refining text accuracy. Among other speakers, User:Darafsh, Chairman of the Iranian Wikimedians User Group, mentioned the importance of teaching how to use Wikisource and OCR, and the development of Persian OCR at the University of Hamburg. Other talks relating to technology covered the introduction of new extensions, widgets, and mobile apps, highlighting the push to make Wikisource more user-friendly and scalable.
Nicolas Vigneron showcasing the languages for which Google OCR was implemented on Wikisource
Some discussions explored the potential of WiLMa (Wikisource Loves Manuscripts) as a model for coordinating across stakeholders, ensuring the consistency of tools, and fostering engagement with cultural institutions. For example, Irvin Tomas and Maffeth Opiana talked about WiLMa Philippines. This project launched in June 2024 as the first WiLMa project outside of Indonesia, focusing on transcribing and proofreading Central Bikol texts through activities like monthly proofread-a-thons, a 12-hour transcribe-a-thon, and training sessions at universities.
Another interesting topic was that of Wikidata and Metadata. The integration of structured metadata remains a key area of development, enabling better searchability and linking across digital archives. Bodhisattwa Mandal (West Bengal Wikimedians User Group) talked about Wikisource content including both descriptive metadata and unstructured text. While most data isn’t yet stored in a structured format, using Wikidata enables easier updates, avoids redundancy, and improves search, queries, and visualisation. There are tools that support metadata enrichment, annotation, and cataloguing, and a forthcoming mobile app will allow Wikidata-based book searches. Annotating text with Wikidata items enhances discoverability and link content more effectively across Wikimedia projects.
Working for the British Library, I (naturally!) picked up on a few collaborative projects between Wikisource and public or national libraries. One talk was about a digitisation project for traditional Korean texts, a three-year collaboration with Wikimedia Korea and the National Library of Korea, successfully revitalising the Korean Wikisource community by increasing participation and engaging volunteers through events and partnerships.
Another project built a Wikisource community in Uganda by training university students, particularly from library information studies, alongside existing volunteers. Through practical sessions, collaborative tasks, and support from institutions like the National Library of Uganda and Wikimedia contributors, participants developed digital literacy and archival skills.
Nanteza Divine Gabriella giving a talk on ‘Training Wikisource 101’ and building a Wikisource community in Uganda
A third Wikisource and libraries talk was about a Wikisource to public library pipeline project, which started initially in a public library in Hokitika, New Zealand. This pipeline enables scanned public domain books to be transcribed on Wikisource and then made available as lendable eBooks via the Libby app, using OverDrive's Local Content feature. With strong librarian involvement, a clear workflow, and support from a small grant, the project has successfully bridged Wikisource and library systems to increase accessibility and customise reading experiences for library users.
The final session of the conference focused on shaping a future roadmap for Wikisource through community-driven conversation, strategic planning, and partnership development. Discussions emphasised the need for clearer vision, sustainable collaborations with technology and cultural institutions, improved tools and infrastructure, and greater outreach to grow both readership and contributor communities. Key takeaways included aligning with partners’ goals, investing in editor growth, leveraging government language initiatives, and developing innovative workflows. A strong call was made to prioritise people over platforms and to ensure Wikisource remains a meaningful and inclusive space for engaging with knowledge and heritage.
Looking Ahead
The Wikisource 2025 Conference reaffirmed the platform’s importance in the digital knowledge ecosystem. However, sustaining momentum requires ongoing advocacy, technological refinement, and deeper institutional partnerships. Whether through digitising new materials or leveraging already-digitised collections, there is a clear hunger for openly accessible public domain texts.
As the community moves forward, a focus on governance, technology, and strategic partnerships will be essential in shaping the future of Wikisource. The atmosphere was so positive and there was so much enthusiasm and willingness to collaborate – see this fantastic video available via Wikimedia Commons, which successfully captures the sentiment. I’m sure we’re going to see a lot more coming from Wikisource communities in the future!