Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

30 June 2025

UK Web Archive Report on Digital Methodologies for the Study of Religion Symposium

By Helena Byrne, Curator of Web Archives

Digital Methodologies for the Study of Religion event details

The UK Web Archive participated in the one day symposium Digital Methodologies for the Study of Religion on 25th June 2025. This knowledge exchange symposium was organised as part of the ESRC-funded Digital British Islam Project. It was a hybrid event with a mix of online presentations and in person presentations at Coventry University.

The fourteen presentations were divided into four thematic panels: Panel 1 – Innovative Methods and Platforms, Panel 2 – Digital Archives and Cataloguing, Panel 3 – Mixed Methods and Online-Offline Dynamics and Panel 4 – Emerging Ethical Challenges.

The UK Web Archive participated in Panel 2 – Digital Archives and Cataloguing. The first speaker, Emily Cottrell from Université de Strasbourg, outlined a project that produced an online database to study digitised religious texts. The final two presentations in the panel were from Gary R Bunt from Digital British Islam at University of Wales Trinity Saint David and Anna Grasso from Digital Islam Across Europe at University of Edinburgh. Professor Bunt outlined the scope of the Digital British Islam web archive collection as well as the lessons learnt from developing the curation skills needed to develop a web archive collection. Dr Grasso then gave an overview of the Digital Islam Across Europe web archive collection and how they were able to use the ARCH platform through their Archive-It subscription. It was really interesting to hear curatorial insights from these web archive collections and how the data collected can be used to further understand the lived experience of Islamic communities in Britain and across Europe.

The British Library presentation was Using the UK Web Archive to understand religion on the web. This presentation gave a general introduction to the UK Web Archive explaining who is involved in curating the UK Web Archive collections, an overview of Non-Print Legal Deposit and how this shapes curation practices. It gave an overview of how religion is represented within the UK Web Archive. Religions are broadly represented across many of the over one hundred curated collections and there are currently nine individual collections that focus on a topic related to religion. The presentation gave an overview of the recent work we did to publish metadata from the UK Web Archive as data by co-developing the Datasheets for Web Archives Toolkit. So far, the Scottish Churches - Collection Seed List is the only data set related to religion that has been published but keep an eye on the UK Web Archive for updates on when the next phase of data sets will be published.

Potential research example with the Scottish Churches - Collection Seed List data set

All the presentations gave methodological insights that could be reused by researchers studying a different subject and I would highly recommend checking out the recordings when they are made available through the project website: https://digitalbritishislam.com/

One highlight for anyone who manages a GLAM sector catalogue was the presentation by Dr. Nur Efeoglu who presented Curating Islam Online: Religious Heritage in UK Museum Digital Catalogues. This presentation focused on reviewing three UK museum catalogues for content related to the Selçuk and Ottoman period. The lessons learnt from this report are valuable for running any effective catalogue. My favourite quote from this presentation was "curation should be a collaboration not a monologue". This is something we try to encourage in the UK Web Archive by collaborating with subject experts to curate collections on various topics and from gathering nominations for the archive from the public.

Posted by Helena Byrne at 11:18 AM

Tags

Contemporary Britain, eResources, Religion, Web/Tech

20 June 2025

RESAW 2025: Report from UK Web Archive Colleagues

RESAW 2025 Conference Banner

Introduction

The RESAW (Research Infrastructure for the Study of Archived Web) 2025 conference took place at the University of Siegen in Germany. It was organized by the Collaborative Research Centre 1187 “Media of Cooperation” at the University of Siegen in cooperation with the Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg.

This was a special conference as the organisers of this, past and and future conferences had a special presentation (it included cake and balloons) to mark ten years since the first RESAW conference was held in Aarhus, Denmark. They all paid tribute to Niels Brügger from Aarhus University who founded RESAW and helped develop the RESAW community.

The conference theme, “The Datafied Web” explored this theme from a historical perspective. The call for papers stated that “we would like to explore the historical roots, trends, and trajectories that shaped the data-driven paradigm in web development and to examine the genealogies of the datafied and metrified web”. The opening panel discussion aimed to define what is meant by “the datified web”.

UK Web Archive colleagues from Bodleian Libraries, the British Library and National Library of Scotland attended the Web Archiving Conference. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.

Reflections

Helena Byrne - Curator of Web Archives - British Library

I was part of the panel called Web archives practices along with colleagues from the Portuguese and Belgian web archive. My presentation, Lessons learnt from preparing collections as data: the UK Web Archive experience, gave an overview of the project that spanned from October 2022 to November 2024 to develop a framework for publishing UK Web Archive curated collections as data.

There were so many great presentations and panels at this conference that it is hard to just pick one highlight. The opening panel discussion defining “the datified web” raised lots of interesting points. In this panel Anne Helmond made the important point that “while the front-end of the web has changed dramatically, the back-end has undergone a deeper transformation” and the study of the web requires a mix of methodologies and resources. Another session that stood out was the panel on Past Metrics. We were reminded in this session about the visitor counters that used to be popular on early versions of websites. This was especially poignant as just a few days before this presentation I received an enquiry about a website and when I used the Memento Time Travel search function to view if any other web archive’s held a copy of it. I found one copy from its earlier years. This version had a prominent visitor counter and evoked a nostalgic response as I’d realised I hadn’t seen one for many years and had forgotten about this feature.

Beatrice Cannelli - Curatorial and Policy Research Officer (Algorithmic Archive Project) - Bodleian Libraries

At this year’s RESAW conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an ‘Algorithmic Archive’: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop brought together diverse perspectives from practitioners and researchers working with social media data, fostering discussions regarding the development of sustainable strategies to collect social media platforms. The workshop was a valuable opportunity to gather insights for the Algorithmic Archive project, particularly regarding issues and expectations related to short- and long-term access to social media data.

Among the many engaging sessions, I found the one on “the challenges of archival practices” particularly interesting. Using the case of the web archive at the Aix-Marseille University, the panellists underscored the importance of encouraging critical engagement with issues researchers face, such as data ethics, data surveillance and archival responsibility, especially when dealing with potentially sensitive web archived data. Similarly, the panel of “Data Regimes” reflected on the complexity of data stewardship, where open data policies often clash with ethical concerns, especially when dealing with sensitive content like social media data. This often leaves researchers and librarians to navigate these grey areas without clear guidance, raising questions about reuse and long-term preservation.

Pierre Marshall - Technical Research Officer (Algorithmic Archive Project) - Bodleian Libraries

Vasco Rato gave an overview of arquivo.pt’s API. Arquivo.pt runs a CDX(J) server, and about half of the traffic to the archive comes from the API. Rato mentioned that sometimes people _ask_ for WARCs, but what they really want is just the text or media content of a page. It would be a better user experience to provide text or image search directly through the API. The CDX(J) server also helps anyone wanting to page through the archive without downloading the whole thing. Most researchers don't have the capacity to store and process 1.5PB of WARC files.

Helge Holzmann of the Internet Archive ran a workshop on the Archives Research Compute Hub (ARCH) service. Holzmann talked us through a series of recipes for the ArchiveSpark library, intended to make it easier for researchers to run data-centric queries against items in the Internet Archive. Besides the content of the workshop, I appreciated Holzmann's use of 2000s-era retro web graphics to illustrate his presentation. We are all here for the datafied web, but beyond the data I'm happy to celebrate the art of the early web.

The BnF also presented their Skyblogs collection, including work on parsing the page markup (back) into a data model for analysis across the corpus.

The common theme I took from these sessions is that there's a lot to learn from making large web datasets usefully available to academics. Hopefully next year Beatrice and I will be back with some examples of what internet researchers could do with our planned social media archive.

Andrea Kocsis - Chancellor’s Fellow in Humanities Informatics, University of Edinburgh/ The National Librarian’s Fellow in Digital Scholarship 2024-45, The National Library of Scotland

I was glad to present our work on web archive engagement with Leontien Talboom, where we discussed how to support not only traditional readers and computational users, but also the digitally curious who often fall between categories. I also shared a glimpse into the creative process behind Digital Ghosts, the web archive exhibition I’m currently developing with artist Dorsey Kaufmann and the National Library of Scotland, which will take place in November at Inspace in Edinburgh.

One of the talks that stayed with me was Ian Milligan’s reflection on the ethical challenges of crowdsourced digital archives in the context of 9/11. I plan to bring this ethical dilemma of accessibility, metadata, and data protection into my teaching next year in Future Libraries and Archives at the Edinburgh Futures Institute. The most inspiring talk for me, though, was Nanna Bonde Thylstrup’s keynote on data loss. Her interdisciplinary framing - drawing equally from humanities, sociology, and STEM - challenged the usual discourse of data loss as an evolutionary narrative and instead reframed it as a question of digital politics and infrastructure. Overall, RESAW was inspiring both intellectually and as a generous, thoughtful community of dedicated netpreservers.

Conclusion

Attending the RESAW conference is a great opportunity to exchange ideas, learn about innovative research projects, and foster collaborations in the field of web archive studies. The UK Web Archive colleagues contributed significantly through presentations and active participation in other sessions. Participation at conferences in this manner supports the recognition and reuse of the UK Web Archive collections as a significant resource in the wider academic discourse on web archiving. We look forward to participating in the next edition of the conference which will take place in June 2027 at the University of Groningen, the Centre for Media and Journalism Studies & Centre for Digital Humanities. The theme for 2027 is “Engaging Public Internet Histories: New Ways of Telling the Story of & with the Web”. So keep an eye out for the call for papers for the seventh RESAW conference in 2026.

Posted by Helena Byrne at 2:07 PM

Tags

Digital scholarship, Research collaboration, Web/Tech

12 May 2025

IIPC Web Archiving Conference 2025: Report from UK Web Archive Colleagues

IIPC GA & WAC 2025 Banner

Introduction

This year’s IIPC General Assembly and Web Archiving Conference took place at the National Library of Norway in Oslo.

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Web Archiving Conference both as delegates and presenters. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.

Reflections

Leontien Talboom – Technical Analyst - Cambridge University Libraries

This was my third time attending the WAC conference, but my first time visiting Oslo. It was great to reconnect with colleagues and hear about the range of projects currently happening across the community.

I found the update from Chris Royds and Tom Storrar on the UKGWA particularly interesting, especially their work on using Retrieval-Augmented Generation (RAG) to take into account takedown policy processes. The Poster Slam session also provided a good overview of the diverse work taking place in the field.

Together with Andrea Kocsis, I presented some of our recent work on improving access to web archives for different types of users, including readers, the digitally curious, and data users. This builds on previous work, and it was useful to share it in the context of web archives. We’ve also recently published an article on this, which is available here.

Overall, it was a valuable experience, and I appreciated the chance to hear from others and share some of our own work.

Andrea Kocsis - Fellow - National Library of Scotland

Our work covered how user research segmentation in web archives can reshape the way we engage with digital collections. Our talk focused on the power of metadata to create more intuitive and accessible experiences for different audiences. For digital researchers, we highlighted the potential of datasheets for datasets via the case study of the Archive of Tomorrow project, while for the digitally curious, we suggested using Jupyter notebooks with pre-processed enhanced metadata to make exploration easier, introducing the outcomes of The National Librarian’s Research Fellowship in Digital Scholarship 2024-25. For the general reader, we discussed the role of storytelling in turning web archives into something more than just data or collection. We also had the exciting opportunity to announce the “Digital Ghosts - Exploring Scotland’s Heritage on the Web” exhibition we are curating in November 2025 in Edinburgh, bringing together tactile artwork and Scottish web heritage in a fresh, dynamic way. The discussions we had about building inclusive, user-focused web archives were energising and reaffirmed how essential accessibility is for the future of these collections.

Eilidh MacGlone - Web Archivist - National Library of Scotland

The IIPC General Assembly and the conference in Oslo was an opportunity to think again about how the National Library of Scotland contributes to the consortium and the benefit we gain from our membership. IIPC’s events, some available to the public, are a key international membership body for web archiving and a key collecting area for us. Asking questions of the people who maintain tools I use (and recommend to the public!) is something I really value, along with the ability to meet and make plans for better services (watch this space!). A high point was being in the audience for Dr Andrea Kocsis talk, who was the Librarian’s Scholar this year. She presented work to enhance data originally created by my Collections and Research colleague, Trevor Thomson, aiming to help researchers discover content at scale, within the legal deposit environment. I am excited to experience the exhibition, which will physically express some of what we collect, with the artist Dorsey Bromwell Kaufmann at the Being Human Festival held in Edinburgh later this year.

Beatrice Cannelli - Curatorial and Policy Research Officer - Bodleian Libraries

This was my second time attending the IIPC WAC Conference, and once again, it was a fantastic opportunity to connect with colleagues from around the world and gain insights into current developments in the field.

At this year’s conference, I had the pleasure of participating as a speaker in the panel titled Beyond Preservation: Engaging Audiences and Researchers with Web Archives, organised by Eveline Vlassenroot, Peter Mechant, Friedel Geeraert, and Christina Vandendyck. Together with my fellow panellists—Cui Cui, Andrea Kocsis, and Anders Klindt Myrvoll—we explored how web archives can better engage with a broad range of users. Through case studies and collaborative initiatives, we highlighted effective ways in which archives are fostering connections with researchers, communities, and the wider public. The panel sparked valuable discussion on how web archives can enable innovative research methodologies and promote greater public involvement.

Given my particular interest in social media archiving, it is no surprise that one of the sessions that I particularly enjoyed was Curating Social Media. This session offered a rich overview of projects and initiatives in this area, featuring presentations from the British Library, the National Library of Singapore, the National Library and National Archives of Luxembourg, and the National Archives of the Netherlands. I left the session inspired by the diversity of approaches and full of new ideas and perspectives, many of which will certainly be considered in the context of the Algorithmic Archive project I’m currently working on at the Bodleian Libraries.

Gil Hoggarth - Web Archive Technical Lead - British Library

After an earlier potential weather warning, the Oslo conference was held in the National Library of Norway's main building in both nice weather and a warm welcome! It was great to hear the presentations, short talks and general conversation from the Web Archiving community on a wide range of topics - and to catch up with our previous Technical Lead, Andy Jackson. The progress made (or at least in development) by numerous institutions was impressive, from the ever-present quality assurance investigations and technical workshops, to new approaches and new large scale projects - including the host's Building a Research Infrastructure for the Norwegian Web Archive programme. I presented an overview of the impact of the cyber-attack on the British Library and prompted people to consider such an awful event as likely to change an institution's culture as well as its technology. The event ended with a thought provoking insight into how web data can be used by AI to identify public debate in online forums.

Caylin Smith - Head of Digital Preservation - Cambridge University Libraries

This WAC marked my third time attending the conference, and it’s continued to deliver valuable contributions to the web archiving discipline. I’m part of the Digital Preservation Coaltiion’s Carbon Footprint Task Group, so I attended the talks in the Sustainability session. All of the speakers provided helpful guidance and resources for how to take a sustainable approach to capturing online content and providing access. At CUL, my colleagues and I are factoring the carbon footprint for digital services into the new services we’re setting up for the libraries’ digital collections. The Curating Social Media session was full of useful lessons learned for archiving social media accounts, including government officials and the general public.

Cui Cui - PhD Researcher/ Customer services and Circulation Librarian - University of Sheffield/ Bodleian Libraries University of Oxford

I have been working on participatory web archiving practices for over 5 years as a part time research student, and attending IIPC conference always marks a milestone in my research journey. This year is particularly important as I shared the preliminary findings from interviews with web archivists, researchers and community members. I feel honoured to be invited to a discussion panel to exchange ideas with colleagues and audience, from which I learned so much about archivists’ aspirations and practices. Receiving feedback and listening to practical challenges shared by field experts was incredibly valuable and encouraging. Although I am not a web archivist myself, I could genuinely feel a sense of belonging within the community! I returned feeling inspired and energised, with fresh perspectives and renewed motivation to continue my journey, despite sometimes feeling I have taken too long to complete my research!

The conference featured numerous high-quality presentations, which I believe are valuable to other professionals. Some practices were innovative and highlighted unique web archiving practices that could be also applicable to other fields of library and archive professions. The closing keynote, Quantifying Complexity: Using Web Data to Decode Online Public Debate, has showcased how web data can be essential in understanding public discourse. It also addressed how marginalised communities could be “silenced” in online debates. The web sphere is a complex space, as I pointed out in my presentation, and it brings another layer of challenge when web archivists work toward a more diverse and representative collection development policy,

Helena Byrne - Curator of Web Archives - British Library

This year I presented a summary of the National Olympic and Paralympic Committees as well as the 2024 Summer Olympic and Paralympic Games collections at the IIPC General Assembly. The General Assembly was on Tuesday 8th April and the main conference was held on Wednesday 9th and Thursday 10th. On day one of the conference I co-facilitated a workshop on Web Archive Collections as Data. This workshop is part of a series of workshops to gather insights into what support is needed to be able to apply the Glam Labs Collections as Data Checklist to web archive content. The first of these workshops was held at DHNB 2025.

As always there were so many good presentations at the conference and lots of corridor conversations that could lead to future collaborative projects. I chaired the Lightning Talk Session #3. This was a great mix of projects ranging from evaluating web archive workflows and addressing English language bias in tools. The last presentation in this session was “What you see no one saw”. This project aims to capture the diversity of web experiences, particularly in relation to web-based advertisements. It is really important that web archives can reflect the diversity of experiences that different people have on the web. However, the project is funded by IMLS and they had the funding withdrawn in the recent restructure of government funding in the US, so it will be interesting to see how it can progress.

Nicola Bingham - Lead Curator of Web Archives - British Library

Last month, I had the pleasure of attending my 11th IIPC Web Archiving Conference, hosted this year by the National Library of Norway in Oslo. This was my first time in Norway—and what a fantastic setting it was for such a dynamic and engaging event.

This year’s conference was particularly meaningful for me as I chaired my final session as co-chair of the IIPC’s Content Development Group (CDG), a role I’ve held since 2018. It’s been an incredibly rewarding experience, and although I’m stepping down from the position, I’ll still be involved—after all, no one really retires from the CDG! The group is in excellent hands, with Shereen Tay (National Library of Singapore), Anaïs Crinière-Boizet (Bibliothèque nationale de France), and Melissa Wertheimer (Library of Congress) taking the reins as co-chairs.

I also had the opportunity to present alongside our British Library colleague Jennie Grimshaw in a session titled Innovative Web Archiving Amid Crisis: Leveraging Browsertrix and Hybrid Working Models to Capture the UK General Election 2024. We shared our experience of using a hybrid model to archive the upcoming general election—marking a milestone as it was the first time we used the Browsertrix tool to capture social media content.

The conference was, as always, a space of learning, collaboration, and inspiration. I’m grateful for the opportunity to contribute, to reflect on my time with the CDG, and to look ahead to the evolving landscape of web archiving.

Conclusion

The IIPC General Assembly and Web Archiving Conference 2025 met the high standards set at previous conferences. It is a great opportunity to exchange ideas, learn about innovative projects, and foster collaborations in the field of web archiving. The UK Web Archive colleagues contributed significantly through presentations and active participation.

Posted by Helena Byrne at 10:00 AM

UK Web Archive blog

Introduction