UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

01 December 2022

History on the move: Curating a collection on the Queen’s Platinum Jubilee

By Daniela Major, PhD Student, School of Advanced Studies, University of London

Note: This blog post was written before the death of Her Majesty Queen Elizabeth II. The Jubilee collection has documented the end of an extraordinary reign and will hopefully serve as a basis for future researchers to understand this historical moment.

Before I started my placement at the UK Web Archive, my project idea was to build a collection about the History of London. I had thought it would give me an opportunity to delve into history blogs and history websites, and to explore how people interpret historical events; it was, however, a Jubilee year, and the opportunity came up instead to curate a collection about this very modern event, which would, moreover, unfold as I built the collection.

Queen's Platinum Jubilee 2022 logo in english and welsh

The particular challenges of this exercise were very attractive to someone who still considers herself an historian. It is fairly straightforward to build a collection about events that have gone past and that have been analysed by countless historians. It is a very different thing to curate a collection about events that are happening, whose consequences remain unknown. In this sense, the Queen’s Platinum Jubilee was a great opportunity because in many ways Queen Elizabeth II already belongs to History. It is entirely possible to historicise her existence and her years in power. It is also possible to use her reign as a way to look into the making of modern Britain and modern Europe, as she was present through many key historical moments in the last 70 years.

A priority which was defined early on was representing different parts of the UK, rather than focusing only on the big cities. We looked into how towns, villages and cities were celebrating the Jubilee, what events they were organizing, where street parties would take place and how councils involved local communities in the celebrations. From a geographical representation came the necessity to represent different voices and opinions, both from the UK and the Commonwealth. It was vital the collection didn’t turn out to be laudatory. Future researchers would be interested in knowing whether there was resistance to the monarchy and whether consensus was real or fabricated.

As with so many questions in History, the answer is both yes and yes. Yes, there is resistance, but yes there is genuine and even widespread appreciation for the Queen.

For the majority of my academic career, I have looked to the past to study it. Historians are used to question the archives. We have to question the silences and the omissions, we have to remember who created records, who kept them, and why. Curating this collection placed me firmly on the other side of these interrogations. I was the one deciding what should go into the collection, what should be kept for posterity. The web is vast, content is being produced every minute of every hour. It is not conceivable to include everything. The responsibility is enormous, but it made me all the more aware of the need to hear different sides, so as to not exclude voices which have often been silenced in the past.

The Web affords researchers the possibility to glimpse into facets of life and points of view that many previous historical records have omitted. It is a rich source with enormous democratic potential, and one which will become even more essential in the years to come; it must be protected and looked after. The work that web archivists do, and that I have been privileged enough to take part in, is vital to safeguard the history of the present and the future.

View the Queen's Platinum Jubilee, 2022 collection

Also the Queen's Diamond Jubilee, 2012 collection 

Queen's platinum jubilee collection screenshot

30 November 2022

If Websites Could talk - Part 5

By Hedley Sutton, Team Leader, Asian & African Studies Reference Services

Check out previous episodes in this series - Part 1Part 2, Part 3 and part 4.

Over a year has passed since we last eavesdropped on the ongoing debate among U.K. domain websites as to which of them deserves to be recognised as the most extraordinary site of all. 

“We think we should be considered,” said *Heritage Cast Iron Radiators*. “We’re not a site that you come across every day.”

Screenshot of the Carrotworkers collective website

“Agreed, but you could surely say the same about us,” retorted the *Carrotworkers’ Collective*. “What do you reckon, *Angelfish Opinions*?”

There was no response, the latter being in deep conversation about matters piscine with the *Catfish Study Group*.

“Let’s hear it for the mammals!” cried *Platypus Research*. “You’re with us, *Led by Donkeys* , are you not? And you, *Absolute Dogs*? Not quite sure if you count, *Hatching Dragons*”?

“We insects always get overlooked,” muttered the *British Bee Veterinary Association*.

“We know how you feel,” commiserated *Polly Parrot Rescue UK*.

“What about us?” said the *UK Soft Power Group*. “Our charm, our intelligence …”

“Look, we want to take this tired debate to a whole new dimension,” said the *Quantum Communications Hub*. “With the help of the *Cosmic Shambles Network*, nothing can possibly stop us!”

“That’s not quite fair,” said the *Tuneless Choir*. “If you’re going to work together on your bid, then we might well hook up with the *London Vegetable Orchestra*”.

“Wait a minute – two can play at that game,” said the *Museum of Human Kindness* , “Can’t they, *Empathy Museum*?”

Fortunately at this point the *Centre for Effective Dispute Resolution* made a useful suggestion. It was decided that the fairest way forward was for candidate sites to first contact the *UK Anonymisation Network*, and then let the *Academy of Experts* make the final choice.  

And thus it came to pass that the chosen site was … *Much Better Adventures*.

03 November 2022

Calling All Digital Preservers!

By Andy Jackson, Web Archive Digital Lead, British Library

Calling All Digital Preservers!

World Digital Preservation Day logo -WDPD2022

The digital preservation community is small and under resourced. This means we must work together if we want to make the biggest impact. To this end, a small group of us have been attempting to help the members of the digital preservation community better support each other. As it is World Digital Preservation Day  (https://www.dpconline.org/events/world-digital-preservation-day), we'd like to encourage you all to (re)discover what we've built so far:

If you'd like to help, we'd love to hear from you....

  • What have we missed from the Awesome List?
  • Can you answer any of the unanswered DigiPres questions? Do you need to ask questions of your own? Are there old questions and answers on mailing lists that need a more visible home, so others can find them again?
  • Can you contribute to the COPTR Tool Registry?
  • Are these resources useful? Should we change our approach?

The last one is really important. We've been in digital preservation long enough to see a lot of portals and projects come and go, and we recognize that making it possible to build on past work sometimes requires changing what we've built so far.

Please get in touch if you have any questions. You could talk us directly via Twitter or Mastodon (e.g. https://digipres.club/), or use the digipres.org discussion forums. We're happy to hear any and all ideas!

In particular, in the last few weeks, the digipres.org homepage has been modified and the Awesome List has been set up, based on community feedback (https://github.com/orgs/digipres/discussions/34). Now would be a great time to get some feedback on what we've been doing!

Thanks for reading, and thanks to everyone who has contributed so far.

Andy Jackson (@anjacks0n/@[email protected]) & Paul Wheatley (@prwheatley), on behalf of all the digipres.org contributors.

With thanks to the Open Preservation Foundation for hosting many of these resources, and to the Digital Preservation Coalition for their support.

18 October 2022

UK Web Archive Technical Update - Autumn 2022

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the update at the start of the summer.

Website Refresh
On 16 August 2022 we relaunched the UK Web Archive website, although you might not have noticed!

The previous version of the website treated page content like it was software, so updating what the pages said was far too difficult. This quarter, we finally got to release some changes we’d made so that most of the website pages are statically generated from Markdown source held on GitHub, using Hugo. This means we could add in a content management system called NetlifyCMS, which should make editing and translating the pages of our site much easier.

We’ve taken care to match the old website presentation and carefully overlay the new system while falling back on the old system for more complex dynamic pages. You might notice some minor differences to the styling between the two, if you look closely…

An important part of this was our automated accessibility testing. While accessibility evaluation cannot be fully automated, these tools help us manage the process of making changes to our website and minimise the risks of making things worse in time periods between full accessibility evaluations.

Computer server and cables

2022 Domain Crawl Launch
As the British Library networks are in the final stages of being upgraded, 2022 is the last year we expect to run the domain crawl on Amazon Web Services.

We launched the 2022 crawl on the 17th August 2022, and since the British Library is now a member of Nominet we were able to use an up-to-date list of UK domains as our starting point.

So far, we’ve processed nearly over 500 million URLs, totaling over 20TiB of data (uncompressed).

However, we’ve noticed what seems to be an uptick in systems like fail2ban automatically mis-reporting our crawler activity as abusive behaviour. This means we have to put more work into managing our relationship with AWS, and has slowed things down a bit. Nevertheless, we expect the crawl to run successfully until the end of the year, as in previous years.

Hadoop Replication
After many weeks of steady progress, our replica Hadoop storage service is now pretty much at capacity. Filling the thing up with about one petabyte of content took a while, but it’s been taking us a bit longer to be sure we’ve double-checked the transfer worked.

We are now awaiting a decision on whether we can purchase another server for this cluster, so we can make sure there’s room for the most recent crawls, and for content we expect to get in the near future. Either way, we’ll then start to plan shifting the hardware up the the National Library of Scotland.

Exporting Collection Metadata
Working with the Archives of Tomorrow project, we’ve been developing a way to export our collection metadata so it’s more suitable for reuse.

Having real use cases drive the work has been useful, and over the next weeks we’re hoping to integrate the outputs into the UKWA API so anyone can use that data.

Legal Deposit Access & NPLD Player
Working with Webrecorder we’ve seen some good progress on a new version of PyWB that supports direct rendering of PDFs and ePubs, and on the secure player application that will be used to provide access in some reading rooms.

Much of the work has focussed on the challenges around testing and preparation for a new version of a service that works across multiple independent institutions. But it’s been good to start to get some user feedback on how the system works in practice, which has already flushed out some additional requirements for the first release.

iPres 2022
As covered in this dedicated blog post, iPres 2022 included a presentation partly based on lessons learned from managing the technical aspects of the UK Web Archive. The plan is to publish a longer version of that work later in the year.

Major Outage
After the successes of the iPres conference, we were quickly brought back down to earth by a severe hardware failure on the 25th of September. One of the network switches failed, and the whole UKWA dedicated network locked-up in a way that made it difficult to understand and route around the failure.

This took a while to diagnose and resolve, so we moved some critical components onto other machines so our curators and users could use our services. While this was relatively successful, it also showed that some of our automated tasks need breaking down so that different functions can be managed independently. For example, we need crawl launches to be able to proceed even if nothing else is running. These problems meant that our daily crawling activity was delayed and patchy for most of last week.

These complications mean it’s taken a bit longer than expected to undo all the interim changes that were made during the hardware outage. However, as of last week, everything is back to normal

07 October 2022

The UEFA Women’s EURO 2022 Arts and Heritage Programme

by Caterina Loriggio, UEFA Women’s EURO Arts and Heritage Lead

Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme
Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme

The UK Web Archive has been collaborating with the UEFA Women’s EURO 2022 Arts and Heritage Programme to develop the UEFA Women's Euro England 2022 web archive collection. In this guest blog post, we hear about the wider arts and heritage programme around the tournament from Caterina Loriggio.

The UEFA Women’s EURO 2022 arts and heritage programme was designed to promote community engagement, develop cultural leadership, support health and wellbeing, reinforce civic pride and to support local economies post-pandemic. Host City partners (Rotherham, Sheffield, Trafford, Wigan, Manchester, Milton Keynes, Brent, Hounslow, Brighton, and Southampton) were all keen to amplify the opportunity the tournament provided to engage and inspire their residents and visitors.

The £3m programme was supported by National Lottery players through Arts Council England and National Lottery Heritage Fund grants and through funding from the Host Cities. It included four arts commissions, eight museum/archive exhibitions, eight outdoor exhibitions, heritage outreach and education programmes, 45 memory films and new online content covering the history of the women’s game. The project also researched for the first time the full line-up of all the women who have played for England over the past 50 years. Many of those women will be honoured at Wembley Stadium on October 7th in front of a sell-out crowd when they will take a lap of honour during half time in the England USA match.

It was the first time The FA had ever delivered a cultural programme. A key priority for The FA is to establish female role models for both girls and boys. When Host City partners requested a cultural programme to support the tournament the Association saw that this could be a great opportunity to further fulfil this objective. It was also clear that partnering with cultural organisations in Hosts Cities, and national institutions such as the UK Web Archive and British Library would also be a great way to promote the UK’s cultural sector and would be a very effective tool to capture, for the first time on a national scale, the hidden history of women’s football.

Prior to writing funding applications, I led, with the support of the Football Supporters’ Association, four online fan consultations to ensure the programme spoke to the wants of women’s football fans. We also commissioned the organisation ‘64 Million Artists’ to lead half-term virtual workshops for young people aged 12 – 18 in Host Cities (many of whom played football). The fans and young people’s feedback was shared with artists, archivists and curators and was clearly reflected in all elements of the programme. The fans were clear that they could ‘never get enough history’.

Archives and contemporary collecting played an important part in the heritage programme. It was apparent many stories of women’s football (fans as well as players) had been lost already and that women who had played during the ban (1921-1970) were of an age that if we did not collect their stories now, then there was a real risk that they might never be captured. As well as collecting physical objects for museums and archives like caps, pennants, and programmes, there was a significant degree of online archiving. Many of the Host Cities created online exhibitions, hosted films, and imagery on digital archive platforms and digitally captured objects which retired footballers were happy to loan but not donate. Nationally we made 36 memory films live on The FA website. These will be moved to EnglandFootball.com in time for the 50th Anniversary of the Lionesses in November, plus there will be some new content made especially for the anniversary. We were greatly supported in our programme by The National Football Museum and Getty Images who gave us access to their photography archives, which greatly enriched all our work. We also sought to create content for the future by commissioning Getty photographers and by running fan and young people’s photography campaigns to capture the atmosphere of match day and the fan experience beyond the pitch. Some of these images will be shared in an online Getty Images Gallery to be launched in November.

It is hoped that the learnings from this programme will help to secure cultural content in future UK bids for major sporting events. I hope that archiving and collecting will remain important components in all these future projects.

Related Links
This is the ninth blog post published so far about the women’s Euros, the others can be found on the UK Web Archive blog under the 'sports' tag.

There is still an active call for nominations for the UEFA Women's Euro England 2022 web archive collection. Anyone can suggest UK published websites to be included in the archive by filling in our nomination form.

06 October 2022

WARCnet Special Report: Skills, Tools and Knowledge Ecologies in Web Archive Research, 2022

by Sharon Healy, Maynooth University (Project Lead)

WARST report image - skills, tools and knowledge ecologies in web archive research

The WARST team are delighted to announce the publication of a WARCnet Special Report, titled: Skills, Tools and Knowledge Ecologies in Web Archive Research. This study is part of a collaborative project by researchers from Maynooth University, the British Library, the International Internet Preservation Consortium, Bayerische Staatsbibliothek, and the University of Siegen. The research team are all members of Web ARChive studies network researching web domains and events (WARCnet).

The study focuses on individuals around the globe who participate in web archive research, in the context of web archiving, curation, and the use of web archives and archived web content for research or other purposes. We consider web archive research to be representative of the processes and activities described in Archive-It’s web archiving life cycle model from appraisal, acquisition, and preservation, to replay, access, use and reuse (Bragg & Hannah, 2013).

The methodology for the study entailed desk research, participation in WARCnet meeting discussions, and an online questionnaire. The study sought to identify and document the skills, tools, and knowledge required to achieve a broad range of goals within the web archiving life cycle and to explore the challenges for participation in web archive research, and the interludes of such challenges across communities of practice. We suggest that there is a perpetual need to examine the roles of skills, tools, and methods associated with the web archiving life cycle as long as internet, web and software technologies keep advancing, upgrading, and changing.

The Executive summary offers an overview of the findings, and is translated into Danish, French, Spanish and Catalan.

The Report is available to download from WARCnet website:

https://cc.au.dk/fileadmin/dac/Projekter/WARCnet/Healy_et_al_Skills_Tools_and_Knowledge_Ecologies.pdf

A section of the Report that focused on the software, tools and methods used in the web archive research life cycle was presented in a poster at iPres 2022.

05 October 2022

iPres 2022 Conference Report from the UK Web Archive

By Helena Byrne, Nicola Bingham, Dr Andrew Jackson, British Library, Eilidh MacGlone, National Library of Scotland and Caylin Smith, Cambridge University Libraries

IPres2022-logo

iPres is the largest international conference on digital preservation. The conference has been held every year since 2004. The 2022 edition was hosted by the DPC in Glasgow. This meant that the official conference website ipres2022.scot was within scope for the UK Web Archive to preserve. You can view the archived version of the website here: 

https://www.webarchive.org.uk/wayback/archive/20220914105705/https://ipres2022.scot/ 

Screenshot of the iPres 2022 conference website

iPres 2022 was held from Monday 12 to Friday 16 September. There were a mix of presentations over the week with workshops, long papers, short papers, poster presentations and lightning talks as well as show and tell sessions in the form of a ‘Bake Off’. On the final day of the conference, there were a number of site visits to organisations that are running a digital preservation programme. 

This year’s conference also coincided with the 20th anniversary celebrations of the DPC, as well as the DPC Preservation Awards that are held every two years. In 2020, the UK Web Archive won The National Archives (UK) Award for Safeguarding the Digital Legacy at the virtual Digital Preservation Awards 2020 ceremony.

There are also a number of awards given at iPres in various categories. This year’s winner of the Angela Dappert Memorial Award established in 2021, was Dr Andrew Jackson, Technical Lead for the UK Web Archive for his presentation ‘Design Patterns in Digital Preservation: Understanding Information Flows’. 

Many UK Web Archive colleagues from the British Library, National Library of Scotland and Cambridge University Library attended the conference both as delegates and presenters. In this blog post they have reported back on their conference experience.

British Library

Dr Andrew Jackson
As well as presenting my Design Patterns paper, I was also involved in a workshop on format registries in digital preservation. Both sessions were well-attended and seemed to go well, and I’m planning to post about both in more detail in the future. 

I particularly enjoyed the session on DNA storage, especially because of Euan Cochrane’s approach: working with a DNA lab at Yale University to independently verify the work being done by Twist Bioscience.  It’s still a long way from being a storage option we can depend on, but it’s starting to look like it might actually happen!

There were a lot of good quality papers but I particularly enjoyed “Monitoring Bodleian Libraries' Repositories with Micro Services” presented by James Mooney. The overall approach was very similar to how I like to work, from the design of the overall architecture (federated monitoring of resources in situ rather than centralised and ingest-driven) to the style of implementation (microservices combined with best-in-class open source service components).

Nicola Bingham
This was the first iPres conference I have attended. I wish I could have been there in person but due to practicalities, I attended online. Some of my highlights were the presentation from William Kilbride in which he stated that one of the aims of the DPC was to build “the social infrastructure of digital preservation” (as opposed to focussing on technical aspects), which I think has always been true but is now more so than ever especially when it comes to diversifying our archives and enabling communities to have agency in telling their own stories, as articulated by Tamar Evangelista-Dougherty in her keynote. 

Other highlights were hearing from Garth Stewart, Head of Digital Records at National Records Scotland. Garth presented on NRS’s two year project to ingest and make available Scottish Government Cabinet Records and had practical advice for negotiating the transfer of good quality metadata from the depositors - it’s all about gaining trust and explaining to depositors that the quality of metadata provided impacts the experience of the end users. I was also intrigued that they had the challenge of building and maintaining two access solutions, one for journalist access and one for the public. 

A final highlight for me was the long paper, “A Digital Preservation Wikibase” by Kenneth Seals-Nutt of Yale University. Kenneth’s presentation set down the practical steps taken by Yale University Library’s department of digital preservation to implement a Wikibase instance and how this was used to transform a data set related to software into a knowledge base using technologies of the Semantic Web. This is particularly useful to us at the UK Web Archive as we consider the next steps in our web archiving roadmap. 

Helena Byrne
This was my first time attending iPres but I wasn’t able to make it in person so I was delighted that they had an option to join the conference remotely. I was also involved in a collaborative poster presentation with Katharina Schmid (Bayerische Staatsbibliothek) and Sharon Healy (Maynooth University). Our poster ‘Exploring Software, Tools and Methods used in Web Archive Research’ was part of a bigger study that will be published through WARCnet in the coming weeks. 

There were so many great talks, especially around inclusion and diversity in the wider digital preservation field. This along with activism was also a common theme in the three keynotes. These were all very different in scope so it is hard to pick one over the other but I will definitely be watching back over these in the coming weeks and I will share them with colleagues when they are published online.

National Library of Scotland

Eilidh MacGlone
I was grateful to have the opportunity to attend iPres this year. This was my first experience of the conference, and it was a happy one. There were lots of opportunities to meet up with new people and catch up with those I knew from the preservation world. And it was useful! The continuous improvement models are a very handy way to set achievable targets to professionals who are often the only preservationists in their organisation. I know this will be useful to me, even though I am not on my own. I was fascinated to hear about DNA data storage, which although not yet operating at scale, has interesting properties of robustness at room temperature.

You can read more about one of Eilidh’s takeaways from iPres in her blog post - iPres report: a simple workshop exercise using Robust Links.

Cambridge University Library

Caylin Smith
Glasgow 2022 was the second in-person iPres I’ve attended; I previously attended in 2019 when the conference was held in Amsterdam. I was grateful to attend again this year to present about ongoing research as well as catch up with friends and colleagues in the field and meet some new faces. 

Along with Sara Day-Thomson (Edinburgh University Library) and Patricia Falcao (TATE), I led a workshop on the first day of the conference. Titled “Preserving Complex Digital Objects: Revisited”, this workshop picked up on the workshop we gave at iPres in 2019 and focused on supporting the collection management of digital materials for which few or no solutions currently exist. 

There were many great submissions to iPres this year. One paper on the topic of web archiving that stood out to me was “These Crawls Can Talk. Context Information for Web Collections” by Susanne van den Eijkel and Daniel Steinmeier from the KB (National Library of the Netherlands). I’m looking forward to thinking further about their research in the context of web archiving activities at Cambridge University Libraries. 

The next iPres conference will be held in Champaign-Urbana, Illinois in the U.S.A. from September 19-22, 2023.

04 October 2022

iPres report: a simple workshop exercise using Robust Links 

By Eilidh MacGlone, Web Archivist, National Library of Scotland

Inspiration at iPres
I had the opportunity to attend iPres 2022, an international conference dedicated to digital preservation. One of the sessions - Robust Links - run by the Digital Preservation Coalition (DPC), really sparked ideas for me. Robust Links offers anyone the opportunity to make links more permanent and less susceptible to 'link rot'. You add a link and it offers several options, one being to link to a 'memento' version of the web page.

It initially seemed out of reach, a bit too technical; but, listening, I recalled using glitch. It is a platform which can handle JavaScript and style sheets. I have known about Robust Links for a few years, but it delighted me to have it function in a page I built. This step was valuable to me: it helped me phrase the question I need to ask within my own organisation. 

NLS workshop
I was therefore inspired to include Robust Links in this workshop exercise for National Library of Scotland staff. I asked attendees to create another category for an imaginary "Scottish Music collection". I built this with websites we already collect. I was going to share this as a document file, but it became a web page following a quick refresher on HTML. 

Screenshot of the 'scottish music collection' website 

In this way, Robust Links create a kind of distributed collection through “archived near” links without the risk of cutting each other off. Legal deposit items have to be read by one person at a time, which can make a task that shares the same titles a little tricky. It also gives us the chance to talk about how the new categories interact with the original list. Here were our results: 

Screenshot of the results section of the 'scottish music collection' website

It was also a starting point for retrieving information through public directories. These included OSCR, the charities register for Scotland and the Companies House register. Finally, it is a kind of crowd sourcing exercise. More than a quarter (six out of twenty one) were not in the archive. 

Colleagues gave positive feedback about our workshop, and this exercise. I plan to continue developing the idea and would love to hear from anyone making their own version.