UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate

28 July 2023

UK-Ireland Digital Humanities Association Launch Event Report from the British Library

By Helena Byrne, Curator of Web Archives, Frankie Perry, Music Manuscripts and Archives Cataloguer and Stella Wisdom, Digital Curator for Contemporary British Collections

UK-Ireland Digital Humanities Association Launch Event Banner with event details
UK-Ireland Digital Humanities Association Launch Event Banner

The First Annual Event for the UK-Ireland Digital Humanities Association took place  on 29th and 30th June 2023 at Senate House, University of London as well as online. The Association “aims to build a collaborative vision for the field, and create new and sustainable long-term partnerships in alignment with the international community”. The programme set across one and half days covered a wide variety of topics and included an opportunity for the Community Interest Groups to meet up. 

The British Library was involved in four presentations either as an individual presentation or as part of a collaborative project. In this blog post we hear back from the British Library colleagues who attended.

Helena Byrne, Curator of Web Archives

I was involved in two collaborative presentations with Sharon Healy (Maynooth University) and Juan-José Boté-Vericad (Universitat de Barcelona). Our first presentation was a lightning talk on day one called 'Finding Web Archives under the ‘Big Tent’ of DH: A Case Study of Ireland and the UK'. This presented one element of a forthcoming chapter in a WARCnet edited collection on web archiving. This presentation reviewed postgraduate courses for the provision of web archiving in information management and digital humanities courses in Britain and Ireland. Our second presentation was part of Panel #2 on day two called 'The Potential of a Reborn Digital Archival Edition for Collating a Corpus of Archived Web Materials'. This presentation outlined a methodology for researchers without coding skills to select, collate and analyse a corpus of archived websites. 

The highlight for me was Panel #3, especially the presentation 'Towards a Critical Black Digital Humanities: A Critical Librarian’s Response' by Naomi L.A Smith (University of West London). This presentation and the discussion that followed highlighted some of the challenges as well as some of the positive action steps that can be taken to ensure digital humanities research is more inclusive. 

Frankie Perry, Postdoctoral Research Assistant, InterMusE project, University of York / Music Manuscripts and Archives Cataloguer, British Library

I gave a paper with Prof. Rachel Cowgill (University of York) who is Principal Investigator on the InterMusE project – a collaborative venture between musicologists, computer scientists, and archive and library specialists funded by the AHRC’s UK-US New Directions for Digital Scholarship in Cultural Institutions programme. The British Library is an institutional partner, with Dr Rupert Ridgewell (Lead Curator, Printed Music) as Co-Investigator; the universities of Swansea and Illinois at Urbana-Champagne are further partners, and we’re also working with the University of Waikato. In our paper, we introduced the complexities of sourcing, digitising, and piecing together ephemera relating to historical musical events (eg. concert programmes, flyers, newspaper reviews), using as our case study materials relating to the British Music Society (1918-1933) and its regional centres and branches. We showed the interface of the digital archive built for the project, which uses a combination of the Greenstone Digital Library system, the Mirador Annotation Viewer, and the SimpleAnnotationServer to make materials browsable, searchable, and interactive for musicologists and community users alike.

I really enjoyed the event and the snapshot it provided into current digital humanities research and techniques. I especially enjoyed a paper by Orla Delaney (Cambridge) on 'Database ethnography and the museum object record', and one by Lisa Griffith (Digital Repository of Ireland) and Laura Molloy (CODATA) titled 'Pathways to collaboration – creating and sharing GLAM image collections as data'.

Stella Wisdom, Digital Curator for Contemporary British Collections

My lightning talk 'Collaborating to Curate and Exhibit Complex Digital Literature' reflected on the cooperation between curators, researchers, experimental writers and creative practitioners to plan and produce the British Library’s Digital Storytelling exhibition (2 June 2023 - 15 October 2023). A hands-on display, which explores the ways that digital innovations have transformed and enhanced our narrative experiences. Showcasing eleven examples of electronic literature that invite readers to become a part of the story themselves, through interactive narratives that respond to user input, reading experiences influenced and personalised by data feeds, and works that draw from multiple platforms and audience participation to create immersive story worlds. Preparing and in some cases modifying these interactive works to display them in a public gallery has only been possible through practical collaborations between Library staff with the writers and games studios who created these digital stories. I shared some insights from my experience of this co-curation work and encouraged attendees to visit the exhibition.

It was a pleasure to meet a number of people in real life who I had only previously spoken with online. A personal highlight was hearing Reham Hosny from the University of Cambridge and Minia University speak about 'DH and E-Lit Communities: Intersectional Perspectives'. In the refreshment breaks at this event I chatted with Reham about her novel, Al-Barrah (The Announcer) and she demonstrated to me how both augmented reality and hologram technologies work with the printed book to immerse readers in this thought provoking narrative.

12 July 2023

UK Web Archive Technical Update - Summer 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q1 report.

At the end of the last quarter, we launched the 2023 Domain Crawl. This started well (as described in the 2023 Q1 report) but a few days later it became clear the crawl was going a bit too well. We were collecting so quickly, we started to run out of space on the temporary store we use as a buffer for incoming content.

The full story of how we responded to this situation is quite complicated, so I wrote up the detailed analysis in a separate blog post. But in short, we took the opportunity to move to a faster transfer process and switch to a widely-used open source tool called Rclone. After about a week of downtime, the crawl was up and running again, and we were able to keep up and store and index all the new WARC files as they come in.

Since then, the crawl has been running pretty well, but there have been some problems…

2023-07-05-dc-storage-and-queues
2023 Domain Crawl Storage and Queues

The crawler uses disk space in two main ways: the database of queues of URLs to visit (a.k.a. the crawl frontier), and the results of the crawl (the WARCs and logs). The work with Rclone helped us get the latter under control, with the move from /mnt/gluster/dc2023 to sharing the main /opt drive and uploading directly to Hadoop. These uploads run daily, leading to a saw-tooth pattern as free space gets rapidly released before being slowly re-consumed.

But the frontier shares the same disk space, and can grow very large during a crawl. So it’s important we keep an eye on things to make sure we don’t run out of space. In the past, before we made some changes to Heritrix itself, it was possible for a domain crawl to consume huge amounts of disk space. Once, we hit over 100TB for the frontier, which becomes very difficult to manage. In recent domain crawls, our configuration changes we’ve managed to get this down to more like 10TB.

But, as you can see, around the 13th of June, we hit some kind of problem, where the apparent number of queues in the frontier started rapidly increasing, as did the rate at which we were consuming disk space. We deleted some crawler checkpoints to recover some space, as we very rarely need to restart the crawl from anything other than the most recent daily checkpoint, but this only freed-up modest amounts of space. Fortunately, the aggressive frontier growth seemed to subside before we ran out of space, and the crawl is now stable again.

Unfortunately, it’s not clear what happened. Based on previous crawls, it seems unlikely that the crawler suddenly discovered many more millions of web hosts at this point in the crawl. In the past, the number of queues has been consistently up to around 20 million at most, so this leap to over 30 million is surprising. It is possible we hit some weird web structures, but it’s difficult to tell as we do don’t yet have reliable tools for quickly analysing what’s going on in this situation.

Suspiciously, just prior to this problem, we resolved a different issue with the system used to record what URLs had been seen already. This had been accidentally starved of resources, causing problems when the crawler was trying to record what URLs had been seen. This lead to the gaps in the crawl monitoring data just prior to the frontier growth, as the system stopped working and required some reconfiguration. It’s possible this problem left the crawler in a bit of a confused state, leading to mis-management of the frontier database. Some analysis of the crawl will be needed to work out what happened.

In the laster quarter, the new URL search feature was deployed on our BETA service. Following favourable feedback on the new feature, the main https://www.webarchive.org.uk/ service has been updated to match. We hope you find the direct URL search useful.

We’ve also updated the code that recognises whether a visitor is in a Legal Deposit reading room, as it wasn’t correctly identifying readers at Cambridge University Library. Finally, there was an issue with how the CAPTCHAs on the contact and nomination forms were being validated, which has also been resolved.

Our colleagues from Webrecorder delivered the initial set of changes to the ePub renderer, making it easier to cite a paragraph of one of our Legal Deposit eBooks. Given how long the ePub format has been around, it is perhaps surprising that support for ‘obvious’ features like citations and printing are still quite immature, inconsistent and poorly-standardised. To make citation possible, we have ended up adopting the same approach as Calibre’s Reference Mode and implemented a web-based version that integrates with out access system.

We’ve also worked on updating the service documentation based on feedback from our Legal Deposit Library partners, resolved some problems with how the single-concurrent-use locks were being handled and managed, and implemented most of the translations for the Welsh language service. The translations should be complete shortly, and and updated service can be rolled out, including the second set of changes from Webrecorder (focused on searching the text of ePub documents).

Replication to NLS

The long process of establishing a replica of our holdings at the National Library of Scotland (NLS) is finally nearing completion. We have an up-to-date replica, and have been attempting to arrange the transfer of the servers. This turned out to be a bit more complicated that we expected, so has been delayed, but should be completed in the next few weeks.

Minor Updates

For curators, one small but important fix was improving how the W3ACT curation tool validates URLs. This was thought to have been fixed already, but the W3ACT software was not using URL validation consistently and this meant it was still blocking the creation of crawl target records with top-level domains like .sport (rather than the more familiar .uk or .com etc.). As of June 23rd, we released version 2.3.5 of W3ACT that should finally resolve this issue.

Apart from that, we also updated Apache Airflow to version 2.5.3, and leveraged our existing Prometheus monitoring system to send alerts if any of our SSL certificates are about to expire.

03 July 2023

RESAW 2023 Conference Report from the UK Web Archive

By Cui Cui Bodleian Libraries/University of Sheffield Information School, Nicola Bingham, Helena Byrne, British Library, Alice Austin Edinburgh University.

RESAW 2023 Exploring the archived web during a highly transformative age - Sciencesconf.org
RESAW 2023 Exploring the Archived Web During a Highly Transformative Age

2023 was the fifth RESAW conference. RESAW stands for Research Infrastructure for the Study of Archived Web Materials. It was established in 2012, aims to promote a collaborative European research infrastructure for the study of archived web materials and holds a conference every two years. The 2023 conference was held in Marseille from June 5-6 under the theme ‘Exploring the Archived Web During a Highly Transformative Age’. There was a packed programme with a number of UK based presentations especially from the UK Web Archive teams based at the Bodleian Libraries, British Library and Archive of Tomorrow project partner, University of Edinburgh.

The keynote presentations from the conference were streamed live and the recording of the day two keynote ‘Saving Ukrainian Cultural Heritage Online' by Sebastian Majstorovic (European University Institute) is available on the Inspé Aix-Marseille YouTube channel.

In this blog post participants from the UK Web Archive teams have reported back on their conference experience.

Bodleian Libraries/University of Sheffield Information School 

Cui Cui, Web Archivist / PhD researcher

The experience of presenting two papers in the fifth RESAW conference turned out to be a highly emotional one for me. The first presentation alongside my fellow web archivist, Alice Austin from University of Edinburgh, marked the end of the Archive of Tomorrow project. The opportunity provided me with a chance to reflect on the work we carried out for the project. The second presentation concluded the initial phase of my PhD research project on participatory web archiving. Presenting at the conference compelled me to summarise the findings from a survey I delivered last year, aiming to gain insights into the current practices of participatory web archiving. This experience not only marked a significant milestone, but also served as a starting point to bring theories and practices together to develop better web archives. 

During a panel discussion titled “Interrogating the logics of web archiving in the era of platformization”, Jessica Ogden, Katie Mackinnon, Emily Maemura posed some critical questions about web archiving practices. Who are we collecting for, what shall we collect and how can we approach this process ethically? They particularly put content creators at the centre of considerations and challenged web archivists to critically reflect our practices and ethical considerations. It is assuring that we are not alone in grappling with these complex issues as web archivists. These questions echo with the constant dilemmas we face as web archivists. In particular, the Archive of Tomorrow project highlighted the double-bind situations we encountered when dealing with ethical considerations and piloted engagement work with content creators. From both researchers’ and archivists’ perspectives, it is evidenced that these concerns call for more evidence-based studies and a deeper understanding of the views held by content creators and other wide range of stakeholders. 

Overall, the RESAW conference provided a thought-provoking experience. It allowed me to reflect on our work, consolidate my understanding, and recognise the need for continued efforts to address these complex issues.

British Library

Nicola Bingham, Lead Curator of Web Archives

I felt very privileged to attend this conference at the Mucemlab in Marseille, set in the courtyard of Fort Saint-Jean, with a stunning mix of old and new architecture and amazing sea views. During the conference, I found numerous presentations informative, engaging, thought-provoking and humorous, however, among them, two in particular, sparked profound reflections on curatorial praxis within the context of my own work.

Henrik Smith-Sivertsen took the audience on a captivating journey into the world of digital music archiving. With a focus on three distinct songs, he illustrated how the mediascapes in which they were published have a significant impact on the archiving process. Through his exploration, he highlighted the challenges of capturing and preserving complex digital objects from social media platforms and streaming services. The question of which version(s) to capture became a pivotal point of discussion, raising awareness of the dynamic nature of digital music and the evolving digital landscape it resides in. A thought-provoking video presentation showcased the different online iterations of Lukas Graham's "7 Years" from 2015. The variations in platforms, remixes, and user-generated content surrounding this song demonstrated the diverse ways in which music proliferates and evolves online. The presentation served as a powerful reminder of the challenges faced by archivists when attempting to capture and preserve such dynamic and multi-faceted digital musical artefacts.

Tiancheng Leo Cao from the University of Texas at Austin's intriguing paper focused on the changing meanings of openness within the museum context. He shed light on the gradual shift from an institution-oriented understanding to an access-oriented interpretation, prioritising the needs and participation of the public. I was struck by how this ideology parallels our thinking in the UK Web Archive where efforts are being made to embed more participation in the curatorial process. By involving communities, ensuring diverse perspectives, and including multiple voices, heritage organisations can create a more inclusive and representative platform for preserving our digital heritage.

Helena Byrne, Curator of Web Archives 

This was my second time attending a RESAW conference. The first I attended was 2017 as part of the Web Archiving Week event held in London when the IIPC Web Archiving Conference and RESAW collaborated on organising a full week of web archiving activities. At RESAW 2023 I co-presented two presentations both on day two of the conference. These were both collaborations that came out of the WARCnet network. The first was a joint presentation with Emily Maemura from (University of Illinois) where we fed back some initial findings from the series of workshops we facilitated on ‘Describing Collections with Datasheets for Datasets’. The second presentation was a joint presentation with Sharon Healy (Maynooth University) on ‘Assessing the Scholarly Use of Web Archives in Ireland’. In this presentation we highlighted a section from a much larger report that will be published as part of the WARCnet Papers and Special Reports

A key highlight for me in the programme was the session 'Building the Next Generation of Web Archive Analysis Service'. This panel gave an overview of the development of the Archives Unleashed project from 2017. The project is now winding up and will be supported by the Internet Archive who will be releasing a subscription service to Archives Research Compute Hub (ARCH) this summer. I've been lucky enough to attend Archives Unleashed events in 2017 and 2019 so it was really great to see how the project has changed over time. I wish the Archives Unleashed team all the best.

University of Edinburgh

Alice Austin, Web Archivist

The Archive of Tomorrow project team took two papers to RESAW this year. The first was a deep-dive into the Trans Health sub-category within the Talking About Health collection. The second, presented jointly with my fellow web archivist Cui Cui of the Bodleian Libraries, delivered a condensed version of the project’s Final Report, and reflected on the challenges, wins and losses of the project as a whole.

A few related themes emerged from this year’s papers. A number of speakers reflected on the value of the archived web as a source for ‘bottom-up’ perspectives on the impact of online spaces in the development of narratives at a personal and social level. Arguing that the events of 9/11 galvanised emerging web archiving efforts, Ian Milligan’s paper explored how the resultant archived pages provide a rich source for future historians wanting to understand how that day evolved; Dana Diminescu’s paper on the archive of the ‘Comme a la maison’ platform examined how changes in the language of hospitality used online can reflect changes in societal understanding of the migrant experience; and Anya Shchetvina’s paper discussed how web-based communication objects can become recontextualised as memory objects.

Another theme concerned how to do web archiving in an age of ‘platformisation’. A trio of papers by Emily Maemura, Jess Ogden, and Kate MacKinnon explored this in detail, raising important questions about how web archiving practices might better serve the communities that they draw from. Camille Riou considered the vulnerability of data in a capitalist world in the context of the withdrawal of Twitter’s API for academic research, and Cade Diehm and Benjamin Royer of the New Design Congress presented an excellent overview of the sector’s readiness to grapple with issues of the polycrisis such as colonialism, privatisation and datafication. 

The sixth RESAW Conference will be held in 2025 at University of Siegen in Germany. The theme for the conference is ‘Histories of the Datafied Web: Infrastructures, metrics, aesthetics’. More details about the conference and the call for papers will be announced in due course. 

28 June 2023

IIPC Web Archiving Conference 2023 Report from the UK Web Archive

By Nicola Bingham, Helena Byrne, Ian Cooke, Carlos Lelkes-Rarugal, Andrew Jackson, Richard Price British Library, Leontien Talboom Cambridge University Library, Mark Simon Haydn National Library of Scotland.

IIPC WAC2023 Conference Banner with details of the online and in person conference details.
IIPC WAC2023 Conference Banner

The IIPC 2023 Web Archiving Conference was hosted by the Netherlands Institute of Sound and Vision in Hilversum and co-organised by KB, National Library of the Netherlands. There was an online session held on May 3rd and the main in-person event took place on May 11th and 12th. There was a packed programme that included Q&A sessions for pre-recorded presentations for the online day and  presentations, workshops, lighting talks as well as posters for the in-person event. This was the first in-person IIPC conference since 2019 when the event was hosted  by the National and University Library in Zagreb (NSK), Croatia. 

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the conference both as delegates and presenters. In this blog post they have reported back on their conference experience.

British Library

Nicola Bingham, Lead Curator of Web Archiving

Attending the IIPC conference in person for the first time since 2019 was a great experience. The combination of reconnecting with colleagues after four long years and the (literally) colourful ambience of the Beeld & Geluid (Institute for Sound & Vision), created an atmosphere brimming with renewed energy and optimism. I will highlight just a few of the presentations and conversations that were interesting from my point of view.

I enjoyed hearing about the De Digitale Stad Herleeft (the Digital City Revived) from Marleen Stikker, founder and ‘mayor’ of DDS, Marieke Brugman of UNESCO and Tjarda de Haan, Bits and Bytes United. Presentations focused on the "webarchaeological excavations” which took place to reconstruct, preserve, store and make accessible this unique digital heritage based on KB’s XS4ALL web collection - which was listed as UNESCO Memory of the World Heritage for the Dutch list and is now under review for the worldwide list.

I enjoyed insights into diversity and co-curation from Jesper Verheof, a Researcher-in-Residence at KB working on "Mapping the Dutch LGBT+ Web Archive". Jesper's work utilises KB's collections to explore the unique web sphere formed by LGBTQ+ - or queer people - and how this evolved over time. It sparked intriguing insights and perspectives which could be applied to our own LGBTQ+ collection.

Collaboration and innovation in web archiving were recurring themes at the conference. Valuable insights were shared by the team from the Library of Congress, emphasising their investment in and education of curators to effectively participate in the web archiving process. 

Finally, I had the privilege of presenting the research by WG2 of the WARCnet project, ‘Surveying the Landscape of COVID-19 Web Collections in European GLAM Institutions’ in a session dedicated to Covid-19 collections. Our findings shed light on the scope of these collections, how they were defined, and the common challenges institutions face in making them accessible for research purposes. 

Helena Byrne, Curator of Web Archives 

I participated in both the online and in-person event as a collaborator in a presentation in the online day and co-facilitating a workshop at the in-person event. I was involved in the ‘Developing a Reborn Digital Archival Edition as an Approach for the Collection, Organisation, and Analysis of Web Archive Sources’ project with Sharon Healy (Maynooth University) and Juan-José Boté-Vericad (Universitat de Barcelona). Along with Emily Maemura (University of Illinois) we facilitated Workshop-01 ‘Describing Collections with Datasheets for Datasets’. This was part of a series of workshops we hosted to see if the Datasheets for Datasets framework could be applied to UK Web Archive collections published as data. 

As a participant there were so many great takeaways from this conference. One of the sessions that stands out most for me is the ‘Renewal in Web Archiving: Towards More Inclusive Representation and Practices’. This was on day two of the conference. The conversations in this session were really useful for me to try and ensure that we continue to try and develop more inclusive collections and opportunities to engage in the curation process. In this session we heard about the next steps for the Archiving the Black Web (ATBW) project. Although this is a USA based project, its impact will be global as they are now currently developing a training programme to improve the curation and research use of the archived black web. 

Andrew Jackson, Web Archive Technical Lead

I was involved in a couple of tool workshops during the conference, where it was great to see the interest in shared tooling, and the collaborative commitments this implies. I was also interested in how many of the presentations related to issues around information literacy. For more, see my blog post Reflections on the IIPC Web Archiving Conference 2023.

Ian Cooke, Head of Contemporary British & Irish Publications

This year’s conference was a strong reminder that web archiving is about people - the people whose lives and experiences are expressed in the collections we build; the people whose imaginations shaped the way we use, and have used, the web over time; and the people who are working across collecting, preserving and researching the archived web.

There was a great mix of presentations, blending new developments in technologies, evolving research methods, and approaches to creating and understanding collections, in ways that were accessible to all attendees. Giulia Carla Rossi and I were both pleased to talk about the development of our practice at the British Library, and legal deposit libraries, in collecting ‘emerging formats’.  

The IIPC itself is celebrating its 20th year, and the conference reflected that sense of celebration. It also demonstrated the maturing of practice, and reflection on web archiving methods and goals, at many of the organisations represented. A highlight of the conference was the presentations by Makiba Foster and Zakiya Collier on the Archiving the Black Web project, and the potential of web archiving to contribute to ‘black self-education practices, collective study and librarianship’. Foster and Collier argued for well-resourced institutions to take responsibility for providing support to community heritage organisations in building inclusive collections, and also stressed the need for ethical considerations, in particular regarding the rights of people represented within collections, when building collections.        

Overall, it was a privilege to take part in the conference and to have the time to connect in person with a community of web archive practitioners and researchers, being able to share knowledge and experience and reminding ourselves of what we have in common.

Carlos Lelkes-Rarugal, Assistant Web Archivist

I very much enjoyed my second attendance of an IIPC annual web archiving conference, 2019 was my first one, so I didn’t quite know what to expect. Sufficed to say, the 2023 WAC was just as successful and another enjoyable, unique experience.

There’s such a diverse background of people, I think this is because web archiving is approached very differently as each organisation have their particular way of going about it, which is why there is such an emphasis on sharing knowledge and information. I attended many talks and learnt about new methods of quality assurance, the infrastructure set up of institutions, policies on collecting; whichever presentation it is, you can be sure there’s something innovative going on that could be applied to your domain.

The UK Web Archive itself represents the six UK Legal Deposit Libraries, and as such, we’re inherently maintaining relationships but more importantly trying to build new relationships for new opportunities, collaborations, and potential partnerships. We’re a small team (larger than others) but still relatively small when considering the scope of our work, and I think this is exactly what the IIPC can help with. Like many organisations, the UK Web Archive does at times find web archiving to be a challenge, and as such, the IIPC helps foster a network of people who are willing to share their knowledge and expertise so that we can connect with them to tackle these emerging and ever-evolving challenges. There’s a collective effort to further web archiving, we’re trying to advance a field that has a lot of potential, so if you’re interested, please join this invaluable community.

Richard Price, Head of Contemporary British Collections

I attended this conference to reacquaint myself with web archiving in a little more detail than I have for some years. It was a privilege to attend, seeing so many different kinds of response from the international community and, if I may so, I felt especially proud of my colleagues at the British Library for their presentations and workshops. If there was a common thread through the papers it was that the problem-solving and information-sharing intrinsic to the web archiving community are values translated from the early days of the web itself – that substantial part of the early Internet that was altruistic and public-minded – and, in today’s archiving world, underpinned by layers of technical, social, and curatorial expertise. Thank you to IIPC and to Sound and Vision at Hilversum, and to all those presenting and attending!

Cambridge University Library

Leontien Talboom, Technical Analyst

This was my first time attending IIPC apart from a very brief appearance on a panel in 2022. I was fortunate enough to be a co-presenter on two talks during the conference. One was with my colleague Mark Haydn where we presented on the datasets that we were able to create during the Archive of Tomorrow project and the other was with my colleague Caylin Smith where we explored the difficulties and opportunities of capturing the University of Cambridge domain. 

Both presentations were really enjoyable and it was great to get feedback and questions from colleagues across our field. As this was my first time attending IIPC I wasn’t sure what to expect. However, I was pleasantly surprised by the wide range of topics and formats discussed. One that really stood out to me was the work of Emily Escamilla who talked about reference rot and what would happen if GitHub was to disappear. This really showcased how much as an academic sector we rely on these types of sources to be around when referencing them, but this is not necessarily a given. 

National Library of Scotland

Mark Haydn, Metadata Analyst

It has been a few years since I've been at an in-person conference, & I had forgotten how nice it can be to visit another city and spend a few days immersed in presentations and conversations with people working in the same area. Sometimes this meant hearing about something immediately relevant to my own metadata work at the National Library of Scotland, like hearing Tom Storrar of the UK Government Web Archive assess how effective their work ramping up collecting early in the pandemic to capture frequent website updates had been, or listening to members of the ResPaDon Project detail their experiences extending regional access to web archives collections across France. Other presentations served as an opportunity to better understand topics being explored further afield: there were many demonstrations of potential uses of AI, not all of them ominous, ranging from automatically producing descriptive summaries of technical metadata, for use in Library of Congress catalogue records, to generating a generic Stirring Plenary Speech at short notice.

As well as listening in, my colleague Leontien Talboom and I presented some of our work on the Archive of Tomorrow project, summarising the progress that's been possible since the development of the British Library's web archive metadata export. We heard about other institutional and international approaches and platforms for looking at web archives at scale, like Archive-It's ARCH tools, and caught fellow Archive of Tomorrow web archivist Cui Cui's discussion of knowledge sharing before heading back to the UK.

The 2024 IIPC Web Archive Conference will be hosted by the Bibliothèque nationale de France (BnF) 24-26 April. Follow the IIPC Twitter account for updates and the call for papers due out in early autumn.

26 June 2023

LGBTQ+ Connections and Community

By Ash Green, CLIP LGBTQ+ Network, and Goldsmith University

The Marlborough Pub and Theatre
The Marlborough Pub and Theatre

I was browsing through the LGBTQ+ Lives Online collection recently, and reminded myself that I had added The Marlborough Pub and Theatre to it when I first began co-curating the collection. As far as I can remember, it was one of the first sites I added to the archive. I wanted it in there because it had been an important part of my coming out around 2017. I had a personal connection to it, and I wanted there to be a record of the impact it had on me. I know future explorers of the UK Web Archive won’t know why that site is archived, but maybe they will stumble across this blog post in connection to it and understand its importance to at least one BTQ person – me.

So, why did I specifically want this site in there? Well, in 2017, when I was working out what support there was for me as a trans/gender non-conforming person, I discovered The Clare project, which is a Tran’s support group in Brighton. I went along to it, and afterwards we went to The Marlborough Theatre and Pub, which was a venue with a long history of support for the LGBTQ+ community. The pub was the sort of place where I didn’t know anyone, but just being there made me feel okay about who I was. It was the first time being in an LGBTQ+ venue had felt like that to me. And I realised that there were other people there who seemed to be on similar paths in their lives. It was a reassuring place, and it was a place where I learnt about how diverse the LGBTQ+ community was. I remember going to a queer cabaret there, and it was such an amazing, heart-warming, queer, eye-opening and fun night. The pub is still there – now called The Actors. I’m not a regular visitor, and if you mention my name in there, they won’t know who I am. But when I call in from time to time when I’m in Brighton, I still get that sense of belonging to a community even if I’m quietly sitting in a corner reading on my own. It is a place that re-energises me.

It got me wondering about other sites in the LGBTQ+ Lives Online collection focused on artistic communities that may have had a similar impact on others in the same way that The Marly did on me.

So, for example, what joy did members of South Wales Gay Men's Chorus, Songbirds Choir, or True Colours LGBT Choir feel when they first sang with these choirs?

How excited were listeners when they heard a new track on LGBT Underground that stuck a strong emotional chord with them, and has stayed with them forever?

How did filmmakers feel when their first film appeared at the Scottish Queer International Film Festival, LezDiff, or the Iris Prize? And who in the audience saw something for the first time at these film festivals that resonated strongly with them?

And what sense of connection and belonging did those in queer / LGBTQ+ art groups such as The Queer Dot, Sanctuary Queer Arts, Wise Thoughts, and VFD find within their arts communities?

And maybe there are LGBTQ+ people who attended Queen Jesus, Teatro do Mundo, or even The Marlborough theatre performances, who realised the voice on stage was talking directly to them, and they clearly understood its message in relation to who they are as an LGBTQ+ person.

I’m know I can’t possibly be the only LGBTQ+ person who feels a strong connection with a place or community like these. Maybe you have a story to share about one of the sites in the collection? Or maybe you have a site like one of these that you would like us to add. You can nominate sites for inclusion here: https://www.webarchive.org.uk/nominate

We can’t curate the whole of the UK web on our own. We need your help to ensure that information, discussions, personal experiences and creative outputs related to the LGBTQ+ community are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in the above nominations form.

If you would like to explore any of the sites mentioned in this blog post, you can find them in the Arts, Literature, Music & Culture subsection of the LGBTQ+ Lives Online collection: https://www.webarchive.org.uk/en/ukwa/collection/3090

19 June 2023

Reflections on the IIPC Web Archiving Conference 2023

By Andrew Jackson, Web Archive Technical Lead

Tessa Walsh (Webrecorder) Anders Klindt (Royal Danish Library) Ilya Kreymer (Webrecorder) & Andy Jackson (British Library ) demonstrating the new Browsertrix features in the workshop 'Browser-Based Crawling for All: Getting Started with Browsertrix Cloud'
Demonstrating the new Browsertrix features in the workshop 'Browser-Based Crawling for All: Getting Started with Browsertrix Cloud'

My main goal for the conference was to support the adoption and development of shared open source tools. I've been involved in the IIPC project Browser-based crawling for all, and at the conference I helped run a workshop where attendees could start exploring Browsertrix Cloud and give feedback to the project and to Webrecorder. There were some initial problems with the capacity of the demo system, but these were quickly resolved and the workshop was a success and provided useful feedback for future work.

I also ended up chairing the SolrWayback session, which showed many great examples of how that search interface and the underlying indexing tools (developed by UKWA) have been used by different web archives to help explore and analyse their collection. It's heartening to see more and more web archives doing this kind of thing.

There were a lot of good presentations and discussions around tools, but I'd particularly like to recommend that you all check out Warchaeology by the National Library of Norway Web Archive, and Scoop by the Harvard Library Innovation Laboratory.

Both the Scoop presentation and the Bellingcat keynote provided important insights into what it takes for web archives to be legally-admissible evidence (see also e.g. this post about Scoop and this post from Bellingcat). There are interesting questions here about our tools and workflows, like whether the WARC or WACZ formats are sufficient in their current form, and whether there are opportunities for deeper collaboration across the domains of cultural heritage, law, and open source investigation.

Finally, across a number of presentations, the conference also raised questions about the current and future role of cultural heritage institutions. Are our approaches to information literacy fit for an age of fake news and ChatGPT pollution? Is there something libraries and archives can learn from how Bellingcat and fact checkers like Full Fact are helping people find reliable information and avoid conspiracy theories? Can web archives do more to fight disinformation? I look forward to seeing more about this at future conferences!

04 May 2023

Regal Reflections: Exploring a New UK Web Archive Collection on King Charles III

Nicola Bingham, Lead Curator of Web Archiving, British Library

It has been 70 years since a new monarch was crowned in the UK. As we bear witness to a new era of the British monarchy and reflect on its role within the UK, the UK Web Archive is recording and preserving this momentous occasion by capturing websites in a special collection about King Charles III. Work started in earnest on this collection on 8th September 2022 when the late Queen, Elizabeth II, passed away and Charles became King, however, it also forms part of a larger series of collections about the British monarchy in the early 21st Century, curated by staff in the UK Legal Deposit Libraries.

Through this series of special collections, we can trace how the Royal Family has adopted the internet to communicate more efficiently with their supporters, members of the public, and other stakeholders as well as to promote their charitable causes and connect with younger generations who are more likely to engage with social media. As well as ‘official’ information, the UK Web Archive is also capturing user-generated content from a wide range of publishers including the general public, as recorded in websites, blogs, and social media posts, much of which is not available through traditional historical records.

In building this collection we have several priorities. As with all our collecting activity, our mission is to save ephemeral digital content ensuring it is preserved for the historical record. A good illustration of this is that the official website of Charles, Prince of Wales, published in his former position as heir apparent, no longer exists on the internet and is only available in the web archive.

Screenshot of the archived website of the Prince of Wales. Image of the Prince walking in a garden

Archived copy of www.princeofwales.gov.uk/ in the UK Web Archive (21/06/2019) https://www.webarchive.org.uk/wayback/archive/20190621085304/https://www.princeofwales.gov.uk/

We hope that the collection can help to provide a more comprehensive understanding of King Charles III and his impact on society, by preserving a diverse range of viewpoints and perspectives. There is a huge groundswell of affection for the new King, and the Royal Family in general, and a great sense of celebration and optimism in the lead-up to the Coronation on 6th May, however, there is of course, opposition, skepticism, and criticism, all of which is reflected online. It is important to capture all sides of the conversation to provide a balanced view of the Royal Family and create a digital legacy that will be of interest to researchers to study, and future generations to appreciate.

Another of our aims is to represent different communities across the UK and Commonwealth in the UK Web Archive. The collection will reflect how towns, cities, and villages celebrate the Coronation. Many people will be holding street parties, such as the residents of Calderdale, West Yorkshire, where residents are encouraged to get together and make the Coronation Weekend a community celebration to remember.

Seal of King Charles III - red background and white seal

In Glasgow organisations and communities are encouraged to engage in various Coronation initiatives and events in order to create a positive lasting legacy. The Big Help Out, for example, is an opportunity to highlight the positive impact of volunteering. It is hoped that the extra bank holiday for the Coronation will be remembered as a day of donating time and skills to help charities, causes, and the vulnerable.

Along with street parties, other traditions surrounding significant royal events include the manufacture and purchase of souvenirs. This article on the V&A’s website, preserved in the UK Web Archive, shows a few examples of souvenirs from past events such as the 'Jubilee' biscuit tin made in 1887 for the Carlisle-based biscuit manufacturer Carr & Co., to commemorate Queen Victoria's Golden Jubilee and the 'Coronation Coach' biscuit tin resembling the ornate coach used by King George VI and Queen Elizabeth on their Coronation Day on 11 December 1936. Of course, now that online shopping is ubiquitous any type of royal-themed memorabilia or amenity can be purchased, from the more traditional such as this mug from the National Archives shop to the more esoteric such as hiring a King Charles look-a-like.

One of the more peculiar aspects of the British monarchy is that special occasions are often associated with an official dish. Queen Elizabeth had
curried chicken for her Coronation, which was a relatively exotic choice in the Britain of the 1950s while King Charles has a ceremonial quiche (disappointingly not named Quiche l’Reign) which is intended for people to cook at home as part of the Coronation Big Lunch.

Tweet from the Prime Ministers twitter account discussing the upcoming coronation.

Image from UK Government Twitter showing Queen’s Coronation banquet UK Prime Minister (@10DowningStreet) / Twitter (webarchive.org.uk)]

In conclusion, the UK Web Archive is a collection affording a unique opportunity to witness and record unfolding historical events. As a historical figure, Charles III and the events that occur during his reign will be of significant interest to researchers, scholars, and the general public. Please do visit the King Charles III collection in the UK Web Archive, and if you know of a website that should be included in this collection, please nominate it here: https://www.webarchive.org.uk/en/ukwa/info/nominate