Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

14 October 2024

Research and Development activities in the Qatar Programme Imaging Team

This blog post is by members of the Imaging Team at British Library/Qatar Foundation Partnership (BLQFP) Programme: Eugenio Falcioni (Imaging and Digital Product Manager), Dominique Russell, Armando Ribeiro and Alexander Nguyen (Senior Imaging Technicians), Selene Marotta (Quality Management Officer), Matthew Lee and Virginia Mazzocato (Senior Imaging Support Technicians).

The Imaging Team has played a pivotal role in the British Library/Qatar Foundation Partnership (BLQFP) Programme since its launch in 2012. However, the journey has not been without hurdles. In October 2023, the infamous cyber-attack on the British Library severely disrupted operations across the organisation, impacting the Imaging Team profoundly. Inspired by the Library's Rebuild & Renew Programme, we used this challenging period to focus on research and development, refining our processes and deepening our understanding of the studio’s work practices.

At the time of the attack, we were in the process of recruiting new members of the team who brought fresh energy, expertise, and enthusiasm. This also coincided with the appointment of a new Studio Manager. The formation of this almost entirely new team presented challenges as we adapted to the Library's disrupted environment. Yet, our synergy and commitment led us to find innovative ways of working. Although the absence of an IT infrastructure, and therefore imaging hardware and software, posed significant difficulties for day-to-day activities in photography and digitisation, we had the time to focus on continuous improvement, without the usual pressures of deadlines. We enhanced our digitisation processes and expertise through a combination of quality improvements, strategic collaborations, and the development of innovative tools. Through teamwork and perseverance, we transformed adversity into an opportunity for growth.

As an Imaging Team, we aim to create the optimal digital surrogate of the items we capture. The BLQFP defined parameters for imaging which specify criteria such as colour and resolution accuracy, ensuring compliance with International Imaging Standards (such as FADGI or ISO 19264).

During this unusual time, we focused on research and development into imaging standards, and updated our guidelines, resulting in a 150-page document detailing our workflow. This has improved consistency between setups and photographers, and has been fundamental in training new staff. We engaged in skills sharing workshops with Imaging Services, the Library’s core imaging department, and Heritage Made Digital (HMD), the Library’s department that manages digitisation workflows.

Over the months, we have tested our images and setup, cameras, lighting, and colour targets, all while shooting directly to camera cards and using a laser measure device to check resolution (PPI). As a result of this work, we feel more confident in producing images that conform to International Imaging Standards; capturing images that truly represent the collection items.

A camera stand with a bound volume with a colour target ruler on top and a laser device next to it.

Colour target on a bound volume

Alongside our testing, we arranged visits to imaging studios at other institutions where we shared our knowledge and learnt from the working processes of those who are digitising comparable collection material. During these visits, we gained a better understanding of the different imaging set-ups, the various international quality standards followed, and of how images produced are analysed. We also shared our approaches to capturing and stitching oversized items such as maps and foldouts. Lastly, we discussed quality assurance and workflow management tools. Overall, these visits across the sector have been a valuable exercise in making new connections, sharing ideas, and understanding that other institutions face similar problems when digitisation collection items.

Without the use of dedicated digitisation software, the capture of items such as manuscripts and large bound volumes has been challenging as we have been unable to check the images we were producing. For this reason, we prioritised items of the collection which were less demanding and postponed the quality assurance checks to a later date. We chose to capture 78 rpm records as they required only two shots (front and back), minimising any possible mistakes. The imaging of audio collection items was our first achievement as a team since the cyber-attack: we digitised over 1100 shellac discs, in collaboration with the BLQFP Audio Team, who had previously catalogued and digitised the sound recording.

A record with a green label reading Columbia

Image of a shellac disc (9CS0024993_ColumbiaGA3) digitised by the BLQFP

Through this capture task we gained the optimism and confidence to start capturing more material, starting with the bindings of all the available bound collection items. The binding capture process is time-consuming and requires a specific setup and position of the item to photograph the front, back, spine, edge, head, and tail of each volume. By capturing bindings now, we will be able to streamline the process when we resume the digitisation of entire volumes.

A camera stand with a red-bound volume supported by a frame over cardboard

Capturing the spine of a bound volume, using l-shaped card on support frame

During this time, we were also involved in scoping work to locate and assess the most challenging items and plan a digitisation strategy accordingly. We focused particularly on identifying oversized maps and foldouts, which will be captured in sections and subsequently digitally stitched. This task required frequent visits to the Library’s basement storage areas and collaboration with the BLQFP Workflow Team to optimise and migrate data from the scoping process into existing workflow management systems. By gathering this data, we could determine the physical characteristics of each collection series and select the most suitable capture device. It was also crucial to collaborate with the BLQFP Conservation Team to develop new digitisation tools for capturing oversized foldouts more quickly and securely.

A volume with an insert, folded and unfolded, over two black foam supports

Using c-shaped Plastazote created by the BLQFP Conservation Team to support an oversized fold-out

The past nine months have presented many challenges for our Team. Nevertheless, in the spirit of “Rebuild & Renew”, we have been able to solve problems and develop creative ways of working, pulling together all our individual skills and experiences. As we expand, we have used this time productively to understand the intricacies of digitising fragile, complex, and oversized material while working to rigorous colour and quality standards. With the imminent return of imaging software, the next step for the BLQFP Imaging Team will be to apply our knowledge and understanding to a mass digitisation environment with the expectations of targets and monthly deliverables.

Team members standing around a stand on which a volume with a large foldout is prepared for photography, with lighting on both sides of the stand

Capturing a large foldout

Posted by Digital Research Team at 9:50 AM

Tags

Experiments, Projects

16 September 2024

memoQfest 2024: A Journey of Innovation and Connection

Attending memoQfest 2024 as a translator was an enriching and insightful experience. Held from 13 to 14 June in Budapest, Hungary, the event stood out as a hub for language professionals and translation technology enthusiasts.

Streetview 2 of Budapest, near the venue for memoQfest 2024. Captured by the author

Streetviews of Budapest, near the venue for memoQfest 2024. Captured by the author

A Well-Structured Agenda

The conference had a well-structured agenda with over 50 speakers, including two keynotes, who brought valuable insights into the world of translation.

Jay Marciano, President of the Association for Machine Translation in the Americas (AMTA), delivered his highly anticipated presentation on understanding generative AI and large language models (LLMs). While he acknowledged their significant potential, Marciano expressed only cautious optimism on their future in the industry, stressing the need for a deeper understanding of the limitations. As he laid out, machines can translate faster but the quality of their output depends greatly on the quality of the training data, especially in certain domains or for specific clients. He believes that translators’ role will evolve so that they will become more involved with data curation, than with translation itself, to improve the quality of machine output.

Dr Mike Dillinger, the former Technical Lead for Knowledge Graphs in the AI Division at LinkedIn, and now a technical advisor and consultant, also delved into the challenges and opportunities presented by AI-generated content in his keynote speech, The Next 'New Normal' for Language Services. Dillinger holds a nuanced perspective on the intersection of AI, machine translation (MT), and knowledge graphs. As he explained, knowledge graphs can be designed to integrate, organize, and provide context for large volumes of data. They are particularly valuable because they go beyond simple data storage, embedding rich relationships and context. They can therefore make it easier for AI systems to process complex information, enhancing tasks like natural language processing, recommendation engines, and semantic search.

Dillinger therefore advocated for the integration of knowledge graphs with AI, arguing that high-quality, context-rich data is crucial for improving the reliability and effectiveness of AI systems. Knowledge graphs can significantly enhance the capabilities of LLMs by grounding language in concrete concepts and real-world knowledge, thereby addressing some of the current limitations of AI and LLMs. He concluded that, while LLMs have made significant strides, they often lack true understanding of the text and context.

Enhancing Translation Technology for BLQFP

The event also offered hands-on demonstrations of memoQ's latest features and updates such as significant improvements to the In-country Review tool (ICR), a new filter for Markdown files, and enhanced spellcheck.

Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner

As a participant, I was keen to explore how some of these features could be used to enhance translation processes at the British Library. For example, could machine translation (MT) be used to translate catalogue records? Over the last twelve years, the translation team of the British Library/Qatar Foundation Partnership project has built up a massive translation memory (TM) – a bilingual repository of all our previous translations. A machine could be trained on our terminology and style, using this TM and our other bilingual resources, such as our vast and growing term base (TB). With appropriate data curation, MT could be a cost-effective and efficient way to maximise our translation operations.

There are challenges, however. For example, before it can be used to train a machine, our TM would need to be edited and cleaned, removing repetitive and inappropriate content. We would need to choose the most appropriate translations, while maintaining proper alignment between segments. The same applies to our TB, which would need to be curated. Some of these data curation tasks cannot be pursued at this time, as we remain without access to much of our data following the cyberattack incident. Moreover, these careful preparatory steps would not suffice, as any machine output would still need to be post-edited by skilled human translators. As both the conference’s keynote speakers agreed, it is not yet a simple matter of letting the machines do the work.

This blog post is by Musa Alkhalifa Alsulaiman, Arabic Translator, British Library/Qatar Foundation Partnership.

Posted by Digital Research Team at 10:00 AM

Tags

Digital scholarship, Events, Experiments, Tools

28 August 2024

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship

British Library is delighted to host its annual Open and Engaged Conference on Monday 21 October, in-person and online, as part of the International Open Access Week. The Conference is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK).

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship will centre leveraging the power of communities in the axis of open scholarship, open infrastructure, emerging technologies, collections as data, equity and integrity, skills development and sustainable models to elevate research of all kinds for the public good. We take a cross sectoral approach to the conference programme – unifying around shared-values for openness – by reflecting on practices within research libraries both in higher education and GLAM (Galleries, Libraries, Archives, Museums) sectors as well as the national and public libraries.

Open and Engaged 2024 is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK). Everyone interested in the conference topics is welcome to join us on Monday, 21 October!

This will be a hybrid event taking place at the British Library’s Knowledge Centre in St. Pancras, London, and streamed online for those unable to attend in-person.

The event will be recorded and recordings made available in the British Library’s Research Repository.

Registration

Registration is closed for in-person and online attendance. Registrants have been contacted with details. Any questions, please contact [email protected].

Programme

Slides and recordings of the talks are available as a collection in the British Library’s Research Repository.

09:30 Registration

10:00 Welcome remarks

10:10 Opening keynote panel: Cross disciplinary approach to open scholarship

Chaired by Sally Chambers, Head of Research Infrastructure Services at the British Library.

Paola Castano, Research Fellow at the University of Exeter
Hugh Shanahan, Professor of Open Science at the Royal Holloway University of London
Jane Winters, Professor of Digital Humanities & Director of the Digital Humanities Research Hub at the University of London

10:50 Empowering communities through equity, inclusivity, and ethics

Chaired by Beth Montague-Hellen, Head of Library and Information Services at the Francis Crick Institute.

This session addresses the role of the communities in governance, explores the ethical implications of AI for citizens and highlights the value of public engagement, and discusses the central importance of equity, inclusivity, and integrity in scholarly communications.

Community Power in AI. Jeni Tennison, Executive Director of Connected by Data
Delivering a Community-led Open Infrastructure. Gillian Daly, Executive Officer, at the Scottish Confederation of University and Research Libraries (SCURL)
Advancing Equity and Inclusivity through AI Ethics and Governance. Ann Borda, Ethics Fellow in the Public Policy Programme at The Alan Turing Institute. (recorded presentation)

11:40 Break

12:10 Deepening partnership in skills development through shared values

Chaired by Kirsty Wallis, Head of Research Liaison at UCL.

This session explores initiatives that foster skills development in libraries with a cross sectoral approach and dives into the role of libraries to support communities in building resilience.

Co-delivery: working with partners to create transformational services. Ed Jewel, President of Libraries Connected
Digital Skills in Arts and Humanities (DISKAH): Deepening Skills and Engagement with Digital Infrastructures. Karina Rodriguez Echavarria, Reader at the University of Brighton
Training & Skills Development at the DCC. Dominique Green, Research Data Specialist (Training Lead) at the Digital Curation Centre, The University of Edinburgh

13:00 Lunch

13:45 Open repositories for research of all kinds

This session addresses the role of infrastructure to carry out open scholarship practices, explores the practice as research in the axis of diverse outputs and infrastructure, discusses institutional resilience in digital strategies.

Chaired by William J Nixon, Deputy Executive Director at Research Libraries UK (RLUK).

Connecting Art, Science, and Open Repositories @ The National Gallery. Joseph Padfield, Principal Scientist at the National Gallery
Back to the Future: European Repositories for the Era of Open Science. Eloy Rodrigues, Director at the University of Minho Libraries and Executive Board member at OpenAIRE
COAR Notify: Supporting Next Generation Repositories. Paul Walk, Director at Antleaf Ltd

14:45 Break

15:15 Enabling collections as data: from policy to practice

Chaired by Jez Cope, Data Services Lead at the British Library.

This session dives into the digital collections as data by exploring policies and practices across different sectors, public-private partnerships in making collections publicly available, dynamics in preservation versus access approach in national libraries whilst underlining the public good.

Developing Workflows to Publish Collections as Data. Gustavo Candela Romero, Assistant Professor at the University of Alicante
Enabling collections as data: the Museum Data Service. Kevin Gosling, Chief Executive at the Collections Trust
Exploring Collections as Data in the European context: current initiatives, future prospects. Sally Chambers, Head of Research Infrastructure Services at the British Library and Director of DARIAH-EU based at the Ghent Centre for Digital Humanities, Ghent University

16:15 Closing keynote: Stories Change Lives

Chaired by Liz White, Director of Library Partnerships at the British Library.

Erik Boekesteijn, Member of Board of Directors at the Storyhouse and Senior Advisor at the National Library of the Netherlands

16:45 Closing remarks

17:00 Networking session

19:00 End

The hashtag for the event is #OpenEngaged on social media platform of your choice. If you have any questions, please contact us at [email protected].

Posted by Digital Research Team at 1:00 PM

Tags

Data, Digital scholarship, Events, Research collaboration

26 July 2024

Charting the European D-SEA Conference at the Stabi

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected].

Earlier this month, I had the pleasure of attending the “Charting the European D-SEA: Digital Scholarship in East Asian Studies” conference held at the Berlin State Library (Staatsbibliothek zu Berlin), also known as the Stabi. The conference, held on 11-12 July 2024, aimed to fill a gap in the European digital scholarship landscape by creating a research community and a space for knowledge exchange on digital scholarship issues across humanities disciplines concerned with East Asian regions and languages.

The event was a dynamic fusion of workshops, presentations and panel discussions. Over three days of workshops (8-10 July), participants were introduced to key digital methods, resources, and databases. These sessions aimed to transmit practical knowledge in digital scholarship, focusing on East Asian collections and data. The subsequent two days were dedicated to the conference proper, featuring a broad range of presentations on various themes.

The reading room in the Berlin State Library, Haus Potsdamer Straße

DH and East Asian Studies in Europe and Beyond

Conference organisers Jing Hu and Brent Ho from the Stabi, and Shih-Pei Chen and Dagmar Schäfer from the Max Planck Institute for the History of Science (MPIWG), set the stage for an enriching exchange of ideas and knowledge. The diversity of topics covered was impressive – from the more established digital resources and research tools to AI applications in historical research – the sessions provided a comprehensive overview of the current state and future directions of the field.

There were so many excellent presentations – and I often wished I could clone myself to attend parallel sessions! As expected, there was much focus on working with AI – machine learning and generative AI – and their potential in historical and humanities research. AI technologies offer powerful tools for data analysis and pattern recognition, and can significantly enhance research capabilities.

Damian Mandzunowski (Heidelberg University) talked about using AI to extract and analyse information from Chinese Comics

Shaojian Li (Renmin University of China) looked into automating the classification of pattern images using deep learning

One notable session was "Reflections on Deep Learning & Generative AI," chaired by Brent Ho and discussed by Clemens Neudecker. The roundtable highlighted the evolving role of AI in humanities research. Calvin Yeh from MPIWG discussed AI's potential to augment, rather than just automate, research processes. He shared intriguing examples of using AI tools like ChatGPT to simulate group discussions and suggest research actions. Hongsu Wang from Harvard University presented on the use of Large Language Models and traditional Transformers in the China Biographical Database (CBDB) project, demonstrating the effectiveness of these models in data extraction and standardisation.

Calvin Yeh (MPIWG) discussed AI for “Augmentation, not only Automation” and experimented with ChatGPT discussing a research approach, designing a research process and simulating a group discussion

Hongsu Wang (Harvard University) talked about extracting and standardising data using LLMs and traditional Transformers in the CBDB project – here showcasing Jeffrey Tharsen’s research to create a network graph using a prompt in ChatGPT

Exploring the Stabi

Our group tour in the Stabi was a personal highlight for me. This historic library, part of the Prussian Cultural Heritage Foundation, is renowned for its extensive collections and commitment to making digitised materials publicly accessible. The library operates from two major public sites – Haus Unter Den Linden and Haus Potsdamer Straße. Tours of both locations were available, but I chose to explore the more recent building, designed by Hans Scharoun and located in the Kulturforum on Potsdamer Straße in West Berlin – the history and architecture of which is fascinating.

A group of the conference delegates enjoying the tour of SBB’s Haus Potsdamer Straße

I really enjoyed catching up with old colleagues and making new connections with fellow scholars passionate about East Asian digital humanities!

To conclude

In conclusion, the Charting European D-SEA Conference at the Stabi was an enriching experience, providing deep insights into the integration of digital methods in East Asian studies. It provided valuable insights into the advancements in digital scholarship and allowed me to connect with a global community of scholars. The combination of traditional and more recent digital practices, coupled with the forward-looking discussions on AI and deep learning, made this conference a significant milestone in the field. I look forward to seeing how these conversations evolve and contribute to the broader landscape of digital humanities.

Posted by Digital Research Team at 1:56 PM

Tags

Data, Digital scholarship, East Asia, Experiments, Humanities

16 July 2024

'AI and the Digital Humanities' session at CILIP's 2024 conference

Digital Curator Mia Ridge writes... I was invited to chair a session on 'AI and the digital humanities' at CILIP's 2024 conference with Ciaran Talbot (Associate Director AI & Ideas Adoption, University of Manchester Library) and Glen Robson (IIIF Technical Co-ordinator, International Image Interoperability Framework Consortium). Here's a quick post with some reflections on themes in the presentations and the audience Q&A.

A woman stands on stage in front of slides; two men sit at a panel table on the stage

CILIP's photo of our session

I presented a brief overview of some of the natural language processing (NLP) and computer vision methods in the Living with Machines project. That project and other work at the British Library showed that researchers can create innovative Digital Humanities methods and improve collections data with current AI / machine learning tools. But is there a gap between 'utilities' and 'cutting edge research' that AI can't (yet) fill for libraries?

AI (machine learning) makes library, museum and archive collections more accessible in two key ways. Firstly, more and better metadata and links across collections can make individual items more discoverable (e.g. identifying places mentioned in text; visual search to find similar images). Secondly, thinking of 'collections as data' and sharing datasets for research lets others find insights and inspiration.

Some of the value in AI might lie in the marketing power of the term - we've had the technical ability to view collections across silos for some time, but the institutional will might have lagged behind. Identifying the real gaps that AI can meet is hard, cross-institutional work - you need to understand what time-consuming work could be automated with ML/AI. Ciaran's talk gave a sense of the collaborative, co-creative effort required to understand actual processes and real problems and devise ways to optimise them. An 'anarchy' phase might be part of that process, and a roadmap can help set a shared vision as you work out where AI tools will actually save time or just create more but different work.

Glen gave some great examples of how IIIF can help organisations and researchers, and how AI tools might work with IIIF collections. He highlighted the intellectual property questions that 'open access' collections being mined for AI models raises, and pointed people to HaveIBeenTrained to see if their collections have been scraped.

I was struck by the delicate balance between maintaining trust and secure provenance while also supporting creative and playful uses of AI in collections. Labelling generative AI images and texts is vital. Detecting subtle errors and structural biases requires effort and expertise. As a sector, we need to keep learning, talking and collaborating to understand what generative AI means for users and collection holders.

The first question from the audience was about the environmental impact of AI. I was able to say that our work-in-progress principles for AI at the British Library ask people to consider the environmental impact of AI (not just its carbon footprint, but also water usage and rare minerals mining) in balance with other questions of public value for proposed experiments and projects. Ciaran said that Manchester have appointed a sustainability manager, which is probably something we'll see more of in future. There was a question about what employers are looking for in library and informatics students; about where to go for information and inspiration about AI in libraries (AI4LAM is a good start); and about how to update people's perceptions of libraries and the skills of library professionals.

Thanks to everyone at CILIP for all the work they put into the conference, and the fantastic AV team working in the keynote room at the Birmingham Hilton Metropole.

Posted by Digital Research Team at 3:17 PM

Tags

Digital scholarship, LIS research, Projects, Research collaboration

08 July 2024

Embracing Sustainability at the British Library: Insights from the Digital Humanities Climate Coalition Workshop

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected].

Sustainability has become a core value at the British Library, driven by our staff-led Sustainability Group and bolstered by the addition of a dedicated Sustainability Manager nearly a year ago. As part of our ongoing commitment to environmental responsibility, we have been exploring various initiatives to reduce our environmental footprint. One such initiative is our engagement with the Digital Humanities Climate Coalition (DHCC), a collaborative and cross-institutional effort focused on understanding and minimising the environmental impact of digital humanities research.

Screenshot from the Digital Humanities Climate Coalition website

Discovering the DHCC and its toolkit

The Digital Humanities Climate Coalition (DHCC) has been on my radar for some time, primarily due to their exemplary work in promoting sustainable digital practices. The DHCC toolkit, in particular, has proven to be an invaluable resource. Designed to help individuals and organisations make more environmentally conscious digital choices, the toolkit offers practical guidance for building sustainable digital humanities projects. It encourages researchers to adopt climate-responsible practices and supports those who may lack the practical knowledge to devise greener initiatives.

The toolkit is comprehensive, providing tips on the planning and management of research infrastructure and data. It aims to empower researchers to make climate-friendly technological decisions, thereby fostering a culture of sustainability within the digital humanities community.

My primary goal in leveraging the DHCC toolkit is to raise awareness about the environmental impact of digital work and technology use. By doing so, I hope to empower Library staff to make informed decisions that contribute to our sustainability goals. The toolkit’s insights are crucial for anyone involved in digital research, offering both strategic guidance and practical tips for minimising ecological footprints.

Planning a workshop at the British Library

With the support of our Research Development team, I organised a one-day workshop at the British Library, inviting Professor James Baker, Director of Digital Humanities at the University of Southampton and a member of the DHCC, to lead the event. The workshop was designed to introduce the DHCC toolkit and provide guidance on implementing best practices in research projects. The in-person, full-day workshop was held on 5 February 2024.

Workshop highlights

The workshop featured four key sessions:

Session 1: Introductions and Framing: We began with an overview of the DHCC and its work within the GLAM sector, followed by an introduction to sustainability at the British Library, the roles that libraries play in reducing carbon footprint and awareness raising, the Green Libraries Campaign (of which the British Library was a founding partner), and perspectives on digital humanities and the use of computational methods.

CILIP’s Green Libraries Campaign banner

Session 2: Toolkit Overview: Prof Baker introduced the DHCC toolkit, highlighting its main components and practical applications, focusing on grant writing (e.g. recommendations on designing research projects, including Data Management Plans), and working practices (guidance on reducing energy consumption in day-to-day working life, e.g. communication and shared working, travel, and publishing and preserving data). The session included responses from relevant Library teams, on topics such as research project design, data management and our shared research repository.

DHCC Information, Measurement and Practice Action Group. (2022). A Researcher Guide to Writing a Climate Justice Oriented Data Management Plan (v0.6). Zenodo. https://doi.org/10.5281/zenodo.6451499

Session 3: Advocacy and Influencing: This session focused on strategies for advocating for sustainable practices within one's organisation and influencing others to adopt these practices. We covered the Library’s staff-led Sustainability Group and its activities, after which participants were then asked to consider the actions that could be taken at the Library and beyond, taking into account the types of people that might be influenced (senior leaders, colleagues, peers in wider networks/community).

Session 4: Feedback and Next Steps: Participants discussed their takeaways from the workshop and identified actionable steps they could implement in their work. This session included conversations on ways to translate workshop learnings into concrete next steps, and generated light ‘commitments’ for the next week, month and year. One fun way to set oneself a yearly reminder is to schedule an eco-friendly e-card to send to yourself in a year!

Post-workshop follow-up

Three months after the workshop had taken place, we conducted a follow-up survey to gauge its impact. The survey included a mix of agree/disagree statements (see chart below) and optional long-form questions to capture more detailed feedback. While we had only a few responses, survey results were constructive and positive. Participants appreciated the practical insights and reported better awareness of sustainable practices in their digital work.

Participants’ agree/disagree ratings for a series of statements about the DHCC workshop’s impact

Judging from responses to the set of statements above, at least several participants have embedded toolkit recommendations, made specific changes in their work, shared knowledge and influenced their wider networks. We got additional details on these actions in responses to the open-ended questions that followed.

What did staff members say?

Here are some comments made in relation to making changes and embedding the DHCC toolkit’s recommendation:

“Changes made to working policy and practice to order vegetarian options as standard for events.”

“I have referenced the toolkit in a chapter submitted for a monograph, in relation to my BL/university research.”

“I have discussed the toolkit's recommendations with colleagues re the projects I am currently working on. We agreed which parts of the projects were most carbon intensive and discussed ways to mitigate that.”

“I recommended a workshop on the toolkit to my [research] funding body.”

“Have engaged more with small impacts - less email traffic, fewer attachments, fewer images.”

A couple of comments were made with regard to challenges or barriers to change making. One was about colleagues being reluctant to decrease flying, or travel in general, as a way to reduce one’s carbon footprint. The second point referred to an uncertainty on how to influence internal discussions on software development infrastructure – highlighting the challenge of finding the right path to the right people.

An interesting comment was made in relation to raising environmental concerns and advocating the Toolkit:

“Shared the toolkit with wider professional network at an event at which environmentally conscious and sustainable practices were raised without prompting. Toolkit was well received with expressions of relief that others are thinking along these lines and taking practical steps to help progress the agenda.”

And finally, an excellent point about the energy-intensive use of ChatGPT (or other LLMs), which was covered at the workshop:

“The thing that has stayed with me is what was said about water consumption needed to cool the supercomputers - how every time you run one of those Chat GPT (or equivalent) queries it is the equivalent of throwing a litre of water out the window, and that Microsoft's water use has gone up 30%. I've now been saying this every time someone tells me to use one of these GPT searches. To be honest it has put me off using them completely.”

In summary

The DHCC workshop at the British Library was a great success, underscoring the importance of sustainability in digital humanities, digital projects and digital working. By leveraging the DHCC toolkit, we have taken important steps toward making our digital practices more environmentally responsible, and spreading the word across internal and external networks. Moving forward, we will continue to build on this momentum, fostering a culture of sustainability and empowering our staff to make informed, climate-friendly decisions.

Thank you to workshop contributors, organisers and helpers:

James Baker, Joely Fake, Maja Maricevic, Catherine Ross, Andy Rackley, Jez Cope, Jenny Basford, Graeme Bentley, Stephen White, Bianca Miranda Cardoso, Sarah Kirk-Browne, Andrea Deri, and Deirdre Sullivan.

Posted by Digital Research Team at 7:38 AM

Tags

Collaborations, Data, Digital scholarship, Events, Humanities, Projects, Research collaboration, Tools

04 July 2024

DHBN 2024 - Digital Humanities in the Nordic and Baltic Countries Conference Report

This is a joint blog post by Helena Byrne, Curator of Web Archives, Harry Lloyd, Research Software Engineer, and Rossitza Atanassova, Digital Curator.

This year’s Digital Humanities in the Nordic and Baltic countries conference took place at the University of Iceland School of Education in Reykjavik. It was the eight conference which was established in 2016, but the first time it was held in Iceland. The theme for the conference was “From Experimentation to Experience: Lessons Learned from the Intersections between Digital Humanities and Cultural Heritage”. There were pre-conference workshops from May 27-29 with the main conference starting on the afternoon of May 29 and finishing on May 31. In her excellent opening keynote Sally Chambers, Head of Research Infrastructure Services at the British Library, discussed the complex research and innovation data space for cultural heritage. Three British Library colleagues report highlights of their conference experience in this blog post.

Helena Byrne, Curator of Web Archives, Contemporary British & Irish Publications.

I presented in the Born Digital session held on May 28. There were four presentations in this session and three were related to web archiving and one related to Twitter (X) data. I co-presented ‘Understanding the Challenges for the Use of Web Archives in Academic Research’. This presentation examined the challenges for the use of web archives in academic research through a synthesis of the findings from two research studies that were published through the WARCnet research network. There was lots of discussion after the presentation on how web archives could be used as a research data management tool to help manage online citations in academic publications.

Helena presenting to an audience during the conference session on born-digital archives

Helena presenting in the born-digital archives session

The conference programme was very strong and there were many takeaways that relate to my role. One strong theme was ‘collections as data’. At the UK Web Archive we have just started to publish some of our inactive curated collections as data. So these discussions were very useful. One highlight was the ‘Panel: Publication and reuse of digital collections: A GLAM Labs approach’. What stood out for me in this session was the checklist for publishing collections as data. It was very reassuring to see that we had pretty much everything covered for the release of the UK Web Archive datasets.

Rossitza and I were kindly offered a tour of the National and University Library of Iceland by Kristinn Sigurðsson, Head of Digital Projects and Development. We enjoyed meeting curatorial staff from the Special Collections who showed us some of the historical maps of Iceland that have been digitised. We also visited the digitisation studio to see how they process periodicals, and spoke to staff involved with web archiving. Thank you to Kristinn and his colleagues for this opportunity to learn about the library’s collections and digital services.

Rossitza and Helena standing by the moat outside the National Library of Iceland building

Rossitza and Helena outside the National and University Library of Iceland

Inscription in Icelandic reading National and University Library of Iceland outside the Library building

The National and University Library of Iceland

Harry Lloyd, Research Software Engineer, Digital Research.

DHNB2024 was a rich conference from my perspective as a research software engineer. Sally Chambers’ opening keynote on Wednesday afternoon demonstrated an extraordinary grasp of the landscape of digital cultural heritage across the EU. By this point there had already been a day and a half of workshops, including a session Rossitza and I presented on Catalogues as Data.

I spent the first half using a Jupyter notebook to explain how we extracted entries from an OCR’d version of the catalogue of the British Library’s collection of 15th century books. We used an explainable algorithm rather than a ‘black-box’ machine learning one, so we walked through the steps involved and discussed where it worked well and where it could be improved. You can follow along by clicking the ‘launch notebook’ button in the ReadMe here.

Harry pointing to an image from the catalogue of printed books on a screen for the workshop audience

Harry explaining text recognition results during the workshop

Handing over to Rossitza in the second half to discuss her corpus linguistic analysis worked really well by giving attendees a feel for the complete workflow. This really showed in some great conversations we had with attendees over the following days about tricky problems like where to store the ‘true’ results of OCR.

A few highlights from the rest of the conference were Clelia LaMonica’s work using Latin large language model to analyse kinship in texts from Medieval Burgundy. Large language models trained on historic texts are important as the majority are trained on modern material and struggle with historical language. Jørgen Burchardt presented some refreshingly quantitative work on bias across a digitised newspaper collection, very reminiscent of work by Kaspar Beelen. Overall it was a productive few days, and I very much enjoyed my time in Reykjavik.

Rossitza Atanassova, Digital Curator, Digital Research.

This was my second DHNB conference and I was looking forward to reconnecting with the community of researchers and cultural heritage practitioners, some of whom I had met at DHNB2019 in Copenhagen. Apart from the informal discussions with attendees, I contributed to DHNB2024 in two main ways.

As already mentioned, Harry and I delivered a pre-conference workshop showcasing some processes and methodology we use for working with printed catalogues as data. In the session we used the corpus tool AntConc to perform computational analysis of the descriptions for the British Library’s collection of books published in the 15th century. You can find out more about the project here and reuse the workshop materials published on Zenodo here.

I also joined the pre-conference meeting of the international GLAM Labs Community held at the National and University Library of Iceland. This was the first in-person meeting of the community in five years and was a productive session during which we brainstormed ‘100 ideas for the GLAM Labs Community’. Afterwards we had a sneak peak of the archive of the National Theatre of Iceland which is being catalogued and digitised.

The main hall of the Library.

The DHNB community is so welcoming and supportive, and attracts many early career digital humanists. I was particularly interested to hear from doctoral students researching the use of AI with digitised archives, and using NLP methods with historical collections. One of the projects that stood out for me was Johannes Widegren’s PhD research into the ethical use of AI to enable access and discovery of Sami cultural heritage, and to develop library and archival practice.

I was also interested in presentations that discussed workflows for creating Named Entity Recognition resources for historical archives and I plan to try out the open-source Label Studio tool that I learned about. And of course, the poster session is always a highlight and I enjoyed finding out about a range of projects, including computational analysis of Scandinavian runic-texts, digital reconstruction of Gothenburg’s 1923 Jubilee exhibition, and training large language models to track semantic variation in climate change vocabulary in Danish news articles.

A line up of people standing in front of a screen advertising the venue for DHNB25 in Estonia

The poster presentations session chaired by Olga Holownia

We are grateful to all DHNB24 organisers for the warm welcome and a great conference experience, with special thanks to the inspirational and indefatigable Olga Holownia!

Posted by Digital Research Team at 6:59 AM

Tags

Digital scholarship, Events, Experiments, Printed books, Projects, Research collaboration, Tools

28 June 2024

IIIF Annual Conference 2024: A Journey of Innovation and Inspiration

The British Library Universal Viewer team were delighted to attend the IIIF conference and showcase 2024 at UCLA in Los Angeles, California. This was our the first official event since the team formed earlier in the year, and we felt incredibly fortunate to be travelling across numerous time zones to join over 70 members of the IIIF community for four days of innovation, learning and inspiration.

The Universal Viewer team outside the De Neve Plaza at UCLA

The first two days of the conference were held at the De Neve Plaza and took the form of lightning talks from delegates from a variety of different industries, and on many different topics. This format meant there was something to interest everyone, regardless of experience, and was great for keeping concentration levels high despite the jet lag!

Birds of a feather sessions were held on the third day of the conference, with a last-minute entry from the Universal Viewer team – although lack of space meant that this was an impromptu meeting in the Kerckhoff Coffee House. However, this meant we were able to plan future work, specifically on annotations, in the sunshine on the terrace.

6a00d8341c464853ef02c8d3b42230200c-320wi

Attendees of the UV Birds of a Feather session at the Kerckhoff Coffee House

Here were the exciting takeaways!

Lanie Okorodudu: I was interested on how IIIF resources and IIIF-related tools could be used as a part of curriculums in online learning platforms to create meaningful knowledgeable experiences for students. I was also intrigued by “Tropiiify”, which is a plug-in for exporting IIIF collections and designed for non-technical users.

Erin Burnand: I loved hearing about how IIIF can provide innovative solutions for incredible (but complex) collections such as the Judy Chicago Research Portal (Pennsylvania State University Library) and the work on Eastern Silk Road collections for the International Dunhuang Programme (presented by the BL’s Anastasia Pineschi)

James Misson: The conference was an amazing opportunity to connect with fellow IIIF users, from IIIF newcomers, to those who helped define the original specifications. I enjoyed hearing work on the carbon footprint of OCR, and the transformation of historical textiles into sound to make an exhibition more accessible to visually impaired people. It was inspiring to see the range of uses IIIF has, and I was especially excited by Allmaps (allmaps.org), a toolbox for working with IIIF maps. The conference was a testament to how open the IIIF community is, and everyone generously shared their knowledge with our new team – conversations that continued in the bars of Westwood and In-n-Out Burger.

Saira Akhter: I found the discussions on the use of AI within IIIF interesting, such as for facial recognition within historic photographs and future integration with OCR/HRT tools and outputs. The showcase at the Getty was great for learning more about IIIF itself, and it was cool to see how the idea for IIIF was first written on a napkin at a restaurant. I also enjoyed seeing more novel uses of IIIF, such as for importing paintings into Animal Crossing.

Recordings of the conference are now available on YouTube.

Posted by Digital Research Team at 11:20 AM