UK Web Archive blog

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

30 April 2025

Just launched - The Routledge Companion to Transnational Web Archive Studies

By Helena Byrne, Curator of Web Archives

The Routledge Companion to Transnational Web Archive Studies Edited By Susan Aasman  Anat Ben-David  Niels Brügger
The Routledge Companion to Transnational Web Archive Studies Edited By Susan Aasman, Anat Ben-David, Niels Brügger

On Monday 28, April 2025, The Routledge Companion to Transnational Web Archive Studies was launched. The book “explores the untapped potential of web archives for researching transnational digital history and communication. It covers cross- border, cross- collection, and cross- institutional examination of web archives on a global scale”.

It is an interdisciplinary collaboration and one of the last outputs from the WARCnet research network, comprising  28 chapters  grouped into five sections. The last chapter in each section is a conversation which multiple authors contributed to by responding to questions set by the editors related to the theme of that section.

Lead editor Susan Aasman stated “The companion contains concrete examples on how to research national web domains through a transnational perspective; provides case studies with grounded explorations of the COVID- 19 crisis as a distinctly transnational event captured by web archives; offers methodological considerations while unpacking techniques and skill sets for conducting transnational web archive research; and critically engages the politics and power dynamics inherent to web archives as institutionalised collections”.

UK Web Archive curators, based at the British Library together with curators at University of Westminster contributed to chapters and conversations in the book. The editors stated that “The Routledge Companion to Transnational Web Archive Studies is an essential read for graduate students and scholars from internet and media studies, cultural studies, history, and digital humanities. It will also appeal to web archiving practitioners, including librarians, web curators, and IT developers”.

To celebrate the launch of the book, Routledge is offering a 20% discount with the code 25AFLY2 on http://www.routledge.com/. This code expires on 30th September 2025 and cannot be used with any other special offers.

23 April 2025

Web Archives Collections as Data at the Digital Humanities in the Nordic and Baltic Countries (DHNB) Workshop Report

By Helena Byrne, Curator of Web Archives

DHNB 2025 Conference Banner
DHNB 2025 Conference Banner

The UK Web Archive was one of five web archive organisations represented in the Web Archive Collections as Data workshop held at the Digital Humanities in the Nordic and Baltic Countries (DHNB) 2025 conference held at the National Museum of Estonia in Tartu. The UK Web Archive has participated in the 2025, 2024 and 2023 DHNB conference. The workshop was organised by Olga Holownia, Senior Programme Officer at the International Internet Preservation Consortium (IIPC). It served as an introduction to web archives and web archives collections as data with a focus on use cases but also the challenges related to producing, sharing and publishing, collections as data.

The first stage of the workshop gave a brief overview of the collections as data movement within the GLAM sector, and introduced the Collections as Data Checklist developed by members of the GLAM Labs community. It also introduced what web archives are and where you can access them, how a selection of web archives are making their collections available as data as well as what are the potential research opportunities for these collections. The panel included Olga Holownia (IIPC), Gustavo Candela (University of Alicante), Helena Byrne (British Library), Jon Carlstedt Tønnessen (National Library of Norway), Anders Klindt Myrvoll (Royal Danish Library), Sophie Ham and Steven Claeyssens (KB, National Library of the Netherlands). 

The UK Web Archive presentation promoted the recently published Datasheets for Web Archives Toolkit and the new metadata data sets that are available through the British Library Research Repository. The presentation gave an overview of how the project started, the background to how the Toolkit was prepared and how it was implemented.

Web Archives Collections as Data Workshot at DHNB 2025
Web Archives Collections as Data Workshop at DHNB 2025. Photographer: Helena Byrne & Carmen Kurg.

 

The activity stage of the workshop focused on how we could adapt the Collections as Data Checklist for web archives. The participants were split into three groups. They reviewed the checklist through the lens of if it is applicable to web archives, how it could be adapted if it does not fit, what solutions can be developed to overcome some of the challenging sections of the checklist. There was a rich discussion amongst the groups which also benefited from having both researchers and library professionals involved in reviewing the checklist.

Web Archives Collections as Data Workshop at DHNB 2025
Web Archives Collections as Data Workshop at DHNB 2025. Photographer: Carmen Kurg.

The general consensus from the groups was that maybe more detail is needed to accompany the Checklist so that it could be applied to web archive collections. Some of the points on the Checklist are particularly difficult to apply to web archive collections. There was a lot of discussion on the first two points as they cover licensing and citation. These are particularly difficult for web archives due to national legislation; most web archives operate on a dark or grey access model and most onsite terminals used to access web archives have copy and paste functions disabled so citation can become problematic. However, the participants were positive about the potential to apply an annotated or adapted Collections as Data Checklist specifically for web archives. The brainstorming session at this workshop was the first step of starting a discussion about what resources are needed to improve the process of publishing web archive collections as data. The second of these discussions was picked up at the IIPC Web Archiving Conference in April 2025. 

For a more general report from the DHNB conference click the link to the Digital Scholarship blog to read the report: https://blogs.bl.uk/digital-scholarship/2025/04/dhnb-2025-digital-humanities-in-the-nordic-and-baltic-countries-conference-report.html 

25 November 2024

Datasheets for Web Archives Toolkit is now live

By Helena Byrne, Curator of Web Archives

Datasheets for Web Archives Toolkit Banner with authour names and logos
Datasheets for Web Archives Toolkit

Since autumn 2022, Emily Maemura from the University of Illinois and Helena Byrne from the UK Web Archive team at the British Library have been exploring how the Datasheets for Datasets framework, devised for machine learning by Gebru et. al, could be applied to web archives. In order to explore the research question “can we use datasheets to describe the provenance of web archives, supporting research uses?” a series of workshops were organised in 2023. 

These workshops included a card sorting exercise with expertise in web archives as well as general information management. After the card sorting exercise there was a general discussion about using this framework to describe web archive collections.

These workshops formed the core of the guidance documentation published in the Datasheets for Web Archives Toolkit published in the British Library Research Repository.

The Toolkit

This Toolkit provides information on the creation of datasheets for web archives datasets. The datasheet concept is based on past work from Gebru et al. at Microsoft Research. The datasheet template and samples here were developed through a series of workshops with web archives curators, information professionals, and researchers during Spring and Summer 2023. The toolkit is composed of several parts including templates, examples, and guidance documents. Documents in the toolkit are available at a single DOI (https://doi.org/10.22020/rq8z-r112) and include:

  1. Toolkit Overview 
  2. Datasheets Question Guide
  3. Datasheet Blank Template

Implementation 

The UK Web Archive has implemented this framework to publish data sets from its curation software the W3 Annotation Curation Tool (ACT). These data sets are available to view in the UK Web Archive: Data folder in the British Library Research Repository. So far there are just a few collections published but this will grow over the coming months.