UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

29 May 2024

IIPC Web Archiving Spring/Summer School and Conference 2024: Report from UK Web Archive Colleagues

Nicola Bingham, Helena Byrne, Ian Cooke, Gil Hoggarth, Cameron Huggett (British Library), Caylin Smith (Cambridge University Library) and  Eilidh MacGlone (National Library of Scotland).

GAWAC2024-website-banner-v4.4-o

This year’s IIPC General Assembly and Web Archiving Conference took place at the Bibliothèque nationale de France (BnF) in Paris. Before this year's conference there was an Early Scholars Spring School on Web Archives aimed at early career researchers interested in working with web archive materials.

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Spring/Summer School and the Web Archiving Conference both as delegates and presenters. In this blog post they report highlights of their conference experience.

Nicola Bingham, Lead Curator of Web Archives, British Library

The IIPC conference lived up to its reputation for being incredibly informative, inspiring, and intense! It was wonderful to reconnect with ‘old’ friends and to meet many new colleagues who are bringing diverse skills and perspectives to the field of web archiving.

As Co-Chair of the IIPC’s Content Development Group, alongside Alex Thurman of Columbia University Libraries, I delivered the keynote speech at the Early Scholars Spring School on Web Archives, which preceded the conference. Our presentation reflected on the history, importance, and legacy of the collaborative transnational web archive collections initiated by IIPC members over the past 14 years.

It was fascinating and gratifying to hear from web archive scholars about their diverse approaches and the variety of research questions they are exploring using web archives. Having worked in web archiving for 20 years, I find the increasing use of collections by researchers, particularly through data-mining approaches, especially interesting and rewarding.

Another interesting and informative highlight was the conference opening keynote speech by Pierre Bellanger, Pauline Ferrari, Jérôme Thièvre, and Sara Aubry. Pierre Bellanger, the founder and CEO of Skyrock and Skyrock.com, emphasised that "there is no freedom without memory," setting the tone for a discussion on the archiving of Skyblogs . Sara Aubry, web archiving technical lead at BnF, detailed the challenges they faced, including working with the Skyblog technical team on short notice to archive the blogs and altering web pages to display more articles and comments before the platform went offline. They managed to collect a substantial amount of content before the closure, amassing 5 million media files and providing API access for metadata extraction. This initiative highlights the importance of preserving the vernacular web, capturing personal pages rather than corporate content. The Skybox project further explores data-oriented methods of access and structural metadata to enhance discovery, with potential future projects aiming to build large language models to analyse and identify regional content within the blogs.

Helena Byrne, Curator of Web Archives, British Library

At this year's conference I presented in the Lighting Talk and Poster sessions. The abstracts are available to read on the IIPC website. IIPC WAC 2024 was a really great conference and there were so many takeaways to help improve my practice. One session I’d like to focus on for this blog post was SESSION #10: Digital Preservation. This session focused on citation practices for researchers using web archives in their research. This is an area that is not fully understood in the academic publishing world. I particularly liked the Citation Saver tool from Arquivo.pt as this is a simple but effective tool to bulk upload online citations from an academic publication. At the British Library we support a variety of researchers and the tools and methods discussed in this session will be useful to support them using web archives in their work. 

Gil Hoggarth, Web Archive Technical Lead, British Library

I personally had not been able to attend the last few IIPC annual conferences, so it was fabulous to meet up and connect with old faces, and new, and learn about all the exciting projects going on. As I take a technical view (of most things), I found it particularly interesting that so many institutions were trying to establish, and expand, their web archiving services. Plus, the number of people involved in joint projects, with a combined aim but also with a community benefit in mind, was quite striking. Now, having returned to challenges ahead for The British Library and the UK Web Archive, I feel far more informed and aware of these community efforts - and have been in contact with many conference attendees to follow up!

Caylin Smith, Head of Digital Preservation, Cambridge University Libraries 

This was my second time attending the IIPC conference; I attended last year in Hilversum. I enjoy attending this conference for its presentations about solving operational challenges relating to web archiving and ones about how web archiving supports an institution’s strategic mission. 

I chaired a panel titled “Striking the Balance: Empowering Web Archivists and Researchers In Accessible Web Archives” whose presenters included Leontien Talboom (Technical Analyst on the CUL Digital Preservation team), Alice Austin (Web Archivist at Edinburgh University Library), Tom Storrar (Head of Web Archiving at The National Archives, UK), and Andrea Kocsis (Heritage and Digital Humanities researcher formerly at Northeastern University London; now Chancellor’s Fellow at the University of Edinburgh). 

This panel focused on different perspectives to using web archives, including as a leader of a web archiving service, as a web archivist, and as a researcher. It highlighted evolving user expectations for web archives as well as the challenges around communicating what users can and cannot do because of technical and/or legislative requirements.

Cameron Huggett, PhD Student (CDP), British Library/Teesside University

I attended the IIPC Early Scholars Spring School on Web Archives. You can read more about my reflections at this event in this event in this blog post -  https://blogs.bl.uk/webarchive/2024/05/reflections-on-the-iipc-early-scholars-spring-school-on-web-archives-2024.html 

Eilidh MacGlone, Web Archivist, National Library of Scotland

I was attending my second IIPC in Paris, the last was in 2014. This when I was a nervous first timer – so I was happy to take part in the new mentorship programme. It was a good way to share experience across different points in our professional arcs.

Planning my conference agenda, presentations on machine learning were at the top of my list. These outlined services to classify and retrieve items from large, complex stores of resources. I knew these would be interesting, as attempts to solve a problem with no complete answer.

Ben Charles Germain Lee spoke about working with born digital government publications. He introduced these ideas using a published experiment. This combination of text and visual analysis provides at least one way to organise retrieval of a very large collection. In the presented case, born digital government publications derived from the End of Term web archive. In future, these techniques could offer a way to offer information retrieval to readers for collections which are too big to catalogue.

The IIPC’s Training Working Group session, led by Claire Newing (TNA) and Ricardo Basílio (Arquivo.pt) was another highlight. It gave me a chance to speak briefly on the most important thing in training colleagues (practice!) and the group shared a lot of really good ideas for training. I had the opportunity to use the information almost immediately on my return, training a colleague to self-archive. All in all, this IIPC was a conference with many good lessons.

Ian Cooke, Head of Contemporary British & Irish Publications, British Library

This year, I was struck by how big, and how varied, web archiving has become. The conference covered a huge array of topics and approaches. Many thanks to the Programme Committee, and especially to the team at BnF for being such excellent hosts. For me, the conference got off to a great start a day early, as I attended the pre-conference workshop on appraisal strategies for web archive curated collections, led by Melissa Wertheimer (Library of Congress). The hands-on session was a very clear reminder of the importance of professional librarians and archivists in creating focused and meaningful collections. The conference was also an opportunity for me to dive into some of the more technical sessions. Kristi Mukk and Matteo Cargnelutti’s (Harvard University Library) presentation on using AI to support search in web archives was both very clear and inspiring. I particularly liked Kristi’s assertion that ‘AI literacy is information literacy’ and the importance of thinking like a librarian. Katherine Boss’ (New York University Library) paper on an experimental project to preserve dynamic and database-driven websites using server-side web archiving (not something to be done at scale!) was also brilliant. Both also emphasised the importance of working collaboratively in teams, bringing principles from librarianship to work alongside software engineering in developing and testing new responses to preservation and discovery challenges.          

Conclusion

The IIPC Web Archiving Spring/Summer School and Conference 2024 at the Bibliothèque nationale de France provided a dynamic platform for exchanging ideas, learning about innovative projects, and fostering collaborations in the field of web archiving. UK Web Archive colleagues contributed significantly through presentations and active participation. This conference highlighted the evolving landscape of web archiving, emphasising the importance of preserving the vernacular web, improving researcher access, and leveraging new technologies like AI for better archival practices. As we return to our respective roles, we carry forward new insights and strengthened connections, ready to tackle the challenges ahead with renewed vigour and informed strategies.




22 May 2024

Reflections on the IIPC Early Scholars Spring School on Web Archives 2024

By Cameron Huggett, PhD Student (CDP), British Library/Teesside University

IIPC-2024-Paris-Early-Scholars-Summer-School-banner
IIPC Early Scholars Spring School on Web Archives banner

My name is Cameron, and I am currently undertaking an AHRC funded Collaborative Doctoral Partnership (CDP) project, between the British Library and Teesside University. My research centres on racial discourses within association football fanzines and e-zines from c.1975 to the present, and aims to examine the broader connections between football fandom, race and identity. 

I attended the Early Scholars Spring School on Web Archives, prior to commencement of the conference, which allowed me to knowledge share with colleagues from a number of different countries, institutions and disciplines, offering new perspectives on my own research. Within this school, I was fortunate enough to be able to deliver a short lighting talk, outlining my own use of web archiving within my research into the history of racial discourses within football fanzines. This generated an engaging discussion around my methodologies and led me to reflect upon how quantitative techniques can be better adopted within historical research practices.

I also particularly enjoyed discovering more about the collections of the Bibliothèque Nationale de France (BNF) and Institut National de L'audiovisuel (INA). The scope of the collections and innovative user interfaces were particularly impressive. For example, INA had created a programme that allowed the user to view a collection item, such as an election debate broadcast, alongside archived tweets relating to event in real time.

 My primary takeaway was how web archives can be innovatively employed to record the breadth and depth of online communities and discourses, as well as supplement more traditional sources within a historian’s research framework.  

24 January 2024

Exploring Alternative Access: Making the Most of Web Archives During UK Web Archive Downtime

Nicola Bingham, Lead Curator of Web Archiving, British Library

The British Library is continuing to experience disruption following a cyber-attack and are working hard to restore services. Disruption to some services is, however, expected to persist for several months. In the meantime, our buildings are open and we’ve released a searchable online version of our main catalogue, which contains records of the majority of our printed collections as well as some freely available online resources. Our reference team are on hand to answer queries, advise on collection item availability and help with other ways to complete your work. Please email [email protected] or find out more. The disruption is affecting our website, online systems and services. Please see our temporary website for up-to-date information.

Despite the disruption to access to the UK Web Archive, we continue to crawl or acquire copies of websites, as well as add new websites to our acquisition process which is being undertaken with Amazon Web Services in the Cloud, ensuring that the UK Web Archive collection is updated and preserved as usual.

We appreciate that for regular users of the UK Web Archive, the temporary unavailability of this valuable resource is inconvenient and disruptive. There exist several alternative openly accessible web archives that can serve as sources of information while the UK Web Archive is offline.

Other Openly Accessible Web Archives

Internet Archive: Known as the largest and most comprehensive web archive globally, it includes the famous Wayback Machine and boasts an extensive collection of archived web pages.

Understanding the Differences

While the Internet Archive captures a broad spectrum of global content, the UK Web Archive focuses specifically on the UK web. The UK Web Archive offers comprehensive crawls, curated collections, and secondary datasets for research. However, access is primarily restricted to legal deposit libraries, with some resources available openly.

The Internet Archive allows remote access to archived websites, but its search functionalities and scope differ from the UK Web Archive.

Memento Time Travel: This innovative platform operates under the Memento protocol, allowing users to view archived websites across various openly accessible web archives. It acts as a bridge, enabling access to past versions of web resources stored in archives such as the Internet Archive, Archive-It, UK Web Archive, archive.today, GitHub, and more. While it displays links to Mementos, it doesn’t retain the content itself.

Portuguese Web Archive (Arquivo.pt): Developed by the Portuguese Foundation for Science and Technology, this archive aims to preserve and grant access to the Portuguese web domain and its contents. It also archives a significant amount of European Union and transnational content. It's a valuable resource for preserving the digital heritage of Portugal and contributing to the preservation of European and Portuguese-language online information.

UK Government Web Archive: An openly accessible archive preserving UK central government information, encompassing videos, tweets, images, and websites dating from 1996 to the present day.

UK Parliament Web Archive: This openly accessible archive covers parliamentary websites and social media content from 2009 to the present day.

National Records of Scotland Web Archive: Offering open access, this archive allows browsing and searching of websites related to Scotland’s people and history.

Seeking Information and Resources While the UK Web Archive is offline, the UK Web Archive blog remains accessible and serves as a useful source of information about the archive.

Additionally, although the UK Web Archive itself might be temporarily inaccessible, its information pages have been preserved by the Internet Archive, accessible [here] (https://web.archive.org/web/20240000000000*/https://www.webarchive.org.uk).

For those keen on delving deeper, the British Library Research Repository houses supporting documents related to the UK Web Archive, such as collection scoping documents, annual reports, statistics, and research publications. The repository can be accessed [here](https://doi.org/10.23636/hj5v-3c07).

While the UK Web Archive takes a brief hiatus, we hope these alternative resources help. And perhaps embracing these other openly accessible archives might even unveil new avenues and perspectives for exploration.

While we work hard to recover all our online services you can find regular updates on progress published on our Knowledge Matters blog.