UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

31 July 2024

If websites could talk (part 6)

By Ely Nott, Library, Information and Archives Services Apprentice

After another extended break, we return to a conversation between UK domain websites as they try to parse out who among them should be crowned the most extraordinary…

“Where should we start this time?” asked Following the Lights. “Any suggestions?”

“If we’re talking weird and wonderful, clearly we should be considered first.” urged Temporary Temples, cutting off Concorde Memorabilia before they could make a sound.

“We should choose a website with a real grounding in reality.” countered the UK Association of Fossil Hunters.

“So, us, then.” shrugged the Grampian Speleological Group. “Or if not, perhaps the Geocaching Association of Great Britain?”

“We’ve got a bright idea!” said Lightbulb Languages, “Why not pick us?”

“There is no hurry.” soothed the World Poohsticks Champsionships, “We have plenty of time to think, think, think it over.”

“This is all a bit too exciting for us.” sighed the Dull Men’s Club, who was drowned out by the others.

“The title would be right at gnome with us.” said The Home of Gnome, with a little wink and a nudge to the Clown Egg Gallery, who cracked a smile.

“Don’t be so corny.” chided the Corn Exchange Benevolent Society. “Surely the title should go to the website that does the most social good?”

“Then what about Froglife?” piped up the Society of Recorder Players.

“If we’re talking ecology, we’d like to be considered!” the Mushroom enthused, egged on by Moth Dissection UK. “We have both aesthetic and environmental value.”

“Surely, any discussion of aesthetics should prioritise us.” preened Visit Stained Glass, as Old so Kool rolled their eyes.

The back and forth continued, with time ticking on until they eventually concluded that the most extraordinary site of all had to be… Saving Old Seagulls.

Check out previous episodes in this series by Hedley Sutton - Part 1Part 2, Part 3 and part 4.

 

29 May 2024

IIPC Web Archiving Spring/Summer School and Conference 2024: Report from UK Web Archive Colleagues

Nicola Bingham, Helena Byrne, Ian Cooke, Gil Hoggarth, Cameron Huggett (British Library), Caylin Smith (Cambridge University Library) and  Eilidh MacGlone (National Library of Scotland).

GAWAC2024-website-banner-v4.4-o

This year’s IIPC General Assembly and Web Archiving Conference took place at the Bibliothèque nationale de France (BnF) in Paris. Before this year's conference there was an Early Scholars Spring School on Web Archives aimed at early career researchers interested in working with web archive materials.

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Spring/Summer School and the Web Archiving Conference both as delegates and presenters. In this blog post they report highlights of their conference experience.

Nicola Bingham, Lead Curator of Web Archives, British Library

The IIPC conference lived up to its reputation for being incredibly informative, inspiring, and intense! It was wonderful to reconnect with ‘old’ friends and to meet many new colleagues who are bringing diverse skills and perspectives to the field of web archiving.

As Co-Chair of the IIPC’s Content Development Group, alongside Alex Thurman of Columbia University Libraries, I delivered the keynote speech at the Early Scholars Spring School on Web Archives, which preceded the conference. Our presentation reflected on the history, importance, and legacy of the collaborative transnational web archive collections initiated by IIPC members over the past 14 years.

It was fascinating and gratifying to hear from web archive scholars about their diverse approaches and the variety of research questions they are exploring using web archives. Having worked in web archiving for 20 years, I find the increasing use of collections by researchers, particularly through data-mining approaches, especially interesting and rewarding.

Another interesting and informative highlight was the conference opening keynote speech by Pierre Bellanger, Pauline Ferrari, Jérôme Thièvre, and Sara Aubry. Pierre Bellanger, the founder and CEO of Skyrock and Skyrock.com, emphasised that "there is no freedom without memory," setting the tone for a discussion on the archiving of Skyblogs . Sara Aubry, web archiving technical lead at BnF, detailed the challenges they faced, including working with the Skyblog technical team on short notice to archive the blogs and altering web pages to display more articles and comments before the platform went offline. They managed to collect a substantial amount of content before the closure, amassing 5 million media files and providing API access for metadata extraction. This initiative highlights the importance of preserving the vernacular web, capturing personal pages rather than corporate content. The Skybox project further explores data-oriented methods of access and structural metadata to enhance discovery, with potential future projects aiming to build large language models to analyse and identify regional content within the blogs.

Helena Byrne, Curator of Web Archives, British Library

At this year's conference I presented in the Lighting Talk and Poster sessions. The abstracts are available to read on the IIPC website. IIPC WAC 2024 was a really great conference and there were so many takeaways to help improve my practice. One session I’d like to focus on for this blog post was SESSION #10: Digital Preservation. This session focused on citation practices for researchers using web archives in their research. This is an area that is not fully understood in the academic publishing world. I particularly liked the Citation Saver tool from Arquivo.pt as this is a simple but effective tool to bulk upload online citations from an academic publication. At the British Library we support a variety of researchers and the tools and methods discussed in this session will be useful to support them using web archives in their work. 

Gil Hoggarth, Web Archive Technical Lead, British Library

I personally had not been able to attend the last few IIPC annual conferences, so it was fabulous to meet up and connect with old faces, and new, and learn about all the exciting projects going on. As I take a technical view (of most things), I found it particularly interesting that so many institutions were trying to establish, and expand, their web archiving services. Plus, the number of people involved in joint projects, with a combined aim but also with a community benefit in mind, was quite striking. Now, having returned to challenges ahead for The British Library and the UK Web Archive, I feel far more informed and aware of these community efforts - and have been in contact with many conference attendees to follow up!

Caylin Smith, Head of Digital Preservation, Cambridge University Libraries 

This was my second time attending the IIPC conference; I attended last year in Hilversum. I enjoy attending this conference for its presentations about solving operational challenges relating to web archiving and ones about how web archiving supports an institution’s strategic mission. 

I chaired a panel titled “Striking the Balance: Empowering Web Archivists and Researchers In Accessible Web Archives” whose presenters included Leontien Talboom (Technical Analyst on the CUL Digital Preservation team), Alice Austin (Web Archivist at Edinburgh University Library), Tom Storrar (Head of Web Archiving at The National Archives, UK), and Andrea Kocsis (Heritage and Digital Humanities researcher formerly at Northeastern University London; now Chancellor’s Fellow at the University of Edinburgh). 

This panel focused on different perspectives to using web archives, including as a leader of a web archiving service, as a web archivist, and as a researcher. It highlighted evolving user expectations for web archives as well as the challenges around communicating what users can and cannot do because of technical and/or legislative requirements.

Cameron Huggett, PhD Student (CDP), British Library/Teesside University

I attended the IIPC Early Scholars Spring School on Web Archives. You can read more about my reflections at this event in this event in this blog post -  https://blogs.bl.uk/webarchive/2024/05/reflections-on-the-iipc-early-scholars-spring-school-on-web-archives-2024.html 

Eilidh MacGlone, Web Archivist, National Library of Scotland

I was attending my second IIPC in Paris, the last was in 2014. This when I was a nervous first timer – so I was happy to take part in the new mentorship programme. It was a good way to share experience across different points in our professional arcs.

Planning my conference agenda, presentations on machine learning were at the top of my list. These outlined services to classify and retrieve items from large, complex stores of resources. I knew these would be interesting, as attempts to solve a problem with no complete answer.

Ben Charles Germain Lee spoke about working with born digital government publications. He introduced these ideas using a published experiment. This combination of text and visual analysis provides at least one way to organise retrieval of a very large collection. In the presented case, born digital government publications derived from the End of Term web archive. In future, these techniques could offer a way to offer information retrieval to readers for collections which are too big to catalogue.

The IIPC’s Training Working Group session, led by Claire Newing (TNA) and Ricardo Basílio (Arquivo.pt) was another highlight. It gave me a chance to speak briefly on the most important thing in training colleagues (practice!) and the group shared a lot of really good ideas for training. I had the opportunity to use the information almost immediately on my return, training a colleague to self-archive. All in all, this IIPC was a conference with many good lessons.

Ian Cooke, Head of Contemporary British & Irish Publications, British Library

This year, I was struck by how big, and how varied, web archiving has become. The conference covered a huge array of topics and approaches. Many thanks to the Programme Committee, and especially to the team at BnF for being such excellent hosts. For me, the conference got off to a great start a day early, as I attended the pre-conference workshop on appraisal strategies for web archive curated collections, led by Melissa Wertheimer (Library of Congress). The hands-on session was a very clear reminder of the importance of professional librarians and archivists in creating focused and meaningful collections. The conference was also an opportunity for me to dive into some of the more technical sessions. Kristi Mukk and Matteo Cargnelutti’s (Harvard University Library) presentation on using AI to support search in web archives was both very clear and inspiring. I particularly liked Kristi’s assertion that ‘AI literacy is information literacy’ and the importance of thinking like a librarian. Katherine Boss’ (New York University Library) paper on an experimental project to preserve dynamic and database-driven websites using server-side web archiving (not something to be done at scale!) was also brilliant. Both also emphasised the importance of working collaboratively in teams, bringing principles from librarianship to work alongside software engineering in developing and testing new responses to preservation and discovery challenges.          

Conclusion

The IIPC Web Archiving Spring/Summer School and Conference 2024 at the Bibliothèque nationale de France provided a dynamic platform for exchanging ideas, learning about innovative projects, and fostering collaborations in the field of web archiving. UK Web Archive colleagues contributed significantly through presentations and active participation. This conference highlighted the evolving landscape of web archiving, emphasising the importance of preserving the vernacular web, improving researcher access, and leveraging new technologies like AI for better archival practices. As we return to our respective roles, we carry forward new insights and strengthened connections, ready to tackle the challenges ahead with renewed vigour and informed strategies.




22 May 2024

Reflections on the IIPC Early Scholars Spring School on Web Archives 2024

By Cameron Huggett, PhD Student (CDP), British Library/Teesside University

IIPC-2024-Paris-Early-Scholars-Summer-School-banner
IIPC Early Scholars Spring School on Web Archives banner

My name is Cameron, and I am currently undertaking an AHRC funded Collaborative Doctoral Partnership (CDP) project, between the British Library and Teesside University. My research centres on racial discourses within association football fanzines and e-zines from c.1975 to the present, and aims to examine the broader connections between football fandom, race and identity. 

I attended the Early Scholars Spring School on Web Archives, prior to commencement of the conference, which allowed me to knowledge share with colleagues from a number of different countries, institutions and disciplines, offering new perspectives on my own research. Within this school, I was fortunate enough to be able to deliver a short lighting talk, outlining my own use of web archiving within my research into the history of racial discourses within football fanzines. This generated an engaging discussion around my methodologies and led me to reflect upon how quantitative techniques can be better adopted within historical research practices.

I also particularly enjoyed discovering more about the collections of the Bibliothèque Nationale de France (BNF) and Institut National de L'audiovisuel (INA). The scope of the collections and innovative user interfaces were particularly impressive. For example, INA had created a programme that allowed the user to view a collection item, such as an election debate broadcast, alongside archived tweets relating to event in real time.

 My primary takeaway was how web archives can be innovatively employed to record the breadth and depth of online communities and discourses, as well as supplement more traditional sources within a historian’s research framework.