UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

26 July 2022

Birmingham 2022 Commonwealth Games

By Helena Byrne, Curator of Web Archives, The British Library

a screenshot of the Commonwealth Games logo used in an article by Sport England on their website. The article was archived by the UK Web Archive on 4/20/2022, 4:44:51 AM. You can view the article here: https://www.webarchive.org.uk/wayback/archive/20220420034451/https://www.sportengland.org/campaigns-and-our-work/birmingham-2022-commonwealth-games]

Introduction
The Birmingham 2022 Commonwealth Games are taking place from July 28th to August 8th. There is also an extensive cultural programme running alongside the event till the end of September 2022.

The first Commonwealth Games was held in 1930 and the 2022 event is the twenty second edition of the competition. This is the sixth time that Britain has hosted the Commonwealth Games, Scotland have hosted it three times and including Birmingham 2022, England has hosted it three times. However, this is the second time that Britain has hosted this event since the formation of the UK Web Archive in 2005. 

Sport collecting in the web archive
In late 2017, the UK Web Archive started to formally curate sports websites by establishing three main collections on sport. They are the
Sports Collection, Sports: Football and Sports: International Events. The final collection in this series is Sports: International Events, documents major sporting events mostly hosted in the UK. It is in this collection that the Commonwealth Games Glasgow 2014 and the Commonwealth Games Birmingham 2022 collections sit.

You can view the Glasgow 2014 collection here:  https://www.webarchive.org.uk/en/ukwa/collection/22 

You can view the Birmingham 2022 collection here: www.webarchive.org.uk/en/ukwa/collection/4228

The Birmingham 2022 collection overview
We’ve broken this collection down into six areas:

  • Competitors: Athletes' websites and social media collected during the Games
  • Cultural Programme: Any websites and social media accounts related to the cultural programme during the Games
  • Organisational Bodies/Venues: UK national Commonwealth Games bodies' sites, local government sites etc.
  • Press Media and Comment: News and comment, including the Commonwealth games, interest groups and others
  • Sponsors: UK Websites and news articles relating to some of the official sponsors of the Games
  • Sports: The Sports subsection has twenty subsections, all governing body websites and club websites related to these sports and the Commonwealth Games will be tagged under their relevant sport

Get involved 
The UK Web Archive works across the six UK legal Deposit Libraries and with other external partners to try and bridge gaps in our subject expertise. But we can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussions and creative output related to the Commonwealth Games Birmingham 2022 are preserved for future generations.

Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form.

14 July 2022

Web Archiving the UEFA Women’s Euro England 2022 tournament in Northern Ireland

By Rosita Murchan, Web Archivist, Public Record Office of Northern Ireland (PRONI)

Black and white photo of Female footballer in a black and white striped shirt in motion of keeping up the ball
Thanks to the Deputy Keeper of the Records, Public Record Office of Northern Ireland and the Northern Ireland Women’s Football Association for the photo

The Public Record Office of Northern Ireland (PRONI) is the official archive of Northern Ireland and is situated in the historic Titanic Quarter in Belfast. PRONI was established by the Public Records Act (Northern Ireland) in 1923 which means in June next year we look forward to celebrating our centenary. PRONI has been collecting websites for over ten years, focusing on Government departments, local councils and websites deemed historically or culturally important to Northern Ireland. Over the years our collection has grown in both size and scope and we now capture one terabyte of data per year. PRONI does not have legal deposit status, so working with the UK Web Archive enables us to widen the scope of our collections, and ensure that other relevant content is captured.

PRONI has a rich history of celebrating women in sport having previously curated ‘A Level Playing field – Women in sport’ an exhibition from the archives held by PRONI. With images from the late nineteenth century onwards, this exhibition reminds us that women actually have a long history of participation in a wide range of sporting activities. PRONI also holds the papers of the Northern Ireland Women’s Football Association which includes official minutes and documents, as well as scrapbooks, programmes, newspaper clippings and other ephemera (PRONI Reference: D4633).

We are delighted to be working in partnership once again with the British Library and adding a Northern Irish perspective to their UEFA Women’s Euro England 2022 collection.

The Northern Ireland team has defied the odds to book their place in this summer’s tournament, and PRONI’s collaboration with the British Library will enable us to capture web content documenting the progress of the players who are set to make history for Northern Ireland this summer.

We plan to select as much of the news and media coverage as we can, capturing the local views, hype and excitement of Northern Ireland’s historic qualification to the Euros as well as content from Northern Ireland women’s official home page within the IFA (NI Women's Football) detailing all fixtures, news, team profiles and updates throughout the tournament. We will also include social media content about the tournament, twitter feeds of organisations and team members, and general social media coverage of the competition.

In recent years, PRONI has developed a number of creative and digital engagement projects that put the public at the heart of archives, making archives more welcoming and inclusive. We plan to use our social media channels to put out a call for nominations for sites from PRONI followers but anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form: www.webarchive.org.uk/nominate

PRONI Logo white background

13 July 2022

Web Archiving the UEFA Women’s Euros in Scotland and Wales

By Eilidh MacGlone (National Library of Scotland) and Aled Betts (National Library of Wales) 

a blue banner image with the UK Web Archive, British Library, Inspired by England 2022 and the National football Museum. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive: https://www.webarchive.org.uk/en/ukwa/info/nominate

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the 2022 UEFA Women’s EURO from around the UK.

You can view the UEFA Women’s Euro England 2022 collection here:  

https://www.webarchive.org.uk/en/ukwa/collection/4278 

Although Scotland and Wales didn’t qualify for this year's tournament, football fans in both countries will be getting involved in the celebrations. In this blog post we hear about what content the National Library of Scotland and the National Library of Wales have added to the UK Web Archive collection. 

National Library of Scotland 
As Scotland supporters are aware, we won’t be competing in this year’s Euros (ah, the 95th minute!). Yet, our collecting has captured a valiant qualifying effort, through news sites and the national team's social media. Caroline Weir is one player keeping an eye on the competition, writing an online column with some collegiate support for old Man City teammates. Also evident is that the writing we are collecting describes a sport reaching larger audiences.

Teams are competing in national stadiums – a departure from the smaller arenas we found collecting the last Women’s World Cup. It can be seen in the team taking advantage of this to share a broad message with more football fans. Captain Rachel Corsie giving an example, of wearing pride colours on her captain’s armband at Scotland’s game with Hungary. The national team are now looking to plans for its second World Cup next year, players are looking to a new Scottish Women's Premier League, starting in August. We will continue to preserve Scottish women's football, preserving growing interest in the sport.

National Library of Wales 
Wales were agonisingly close on qualifying for UEFA Women's Euro England 2022, which would have been a historic moment as would have meant Wales reaching their first ever major Tournament. Northern Ireland narrowly secured the play-off place at the expense of Wales as their head-to-head away goal count was superior! As the Euros are being held in England, National Library of Wales focus will be archiving sites looking at the competition from a Welsh perspective.

Women’s Football in Wales has never been stronger as the National team maintain their push for qualification for next year’s World Cup and the huge rise in domestic clubs over the last 20 years providing opportunities to so many. This is reflected by the many websites and twitter feeds that have been archived by the National Library of Wales. For instance, we archive the FAW website and Twitter account, the Twitter feeds of our most famous players, archive Premier clubs websites as well as delving into grassroots football by archiving Domestic League websites and we will look at adding many more sites to the rich collection that we already have.

Get involved with preserving women’s football online with the UK Web Archive
The UK Web Archive works across the six UK legal Deposit Libraries and with other external partners to try and bridge gaps in our subject expertise. But we can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussions and creative output related to the UEFA Women’s Euro England 2022 are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form.

11 July 2022

UK Web Archive Technical Update - Summer 2022

By Andy Jackson, Web Archive Technical Lead, The British Library

Following on from the spring quarterly update, we’ve been able to make some good progress despite being understaffed during this period.

Hadoop storage and replication
We are still in the process of replicating content onto a second Hadoop cluster, to be moved to the National Library of Scotland. The cluster capacity is 1PB, and it’s now about 70% full. Next steps will involve double-checking the files have been replicated correctly, and planning the relocation of the servers.

Legal Deposit Access Solution
There has been significant progress on developing the new reading room access system for the UK Web Archive and other Legal Deposit content. The Webrecorder team has delivered and initial version of the NPLD Player app, which will be needed to access Legal Deposit material on some reading room access terminals. Once some final issues have been addressed, and some documentation added, we can start to plan the roll out in detail.

Before that, we need the centralized services deployed, which use our PyWB system to render PDFs and ePubs as well as archived web pages. The Webrecorder team have implemented most of the necessary changes to PyWB, and we have been working towards deploying the new access services, in partnership with the British Library’s Application Support team.

The whole project team has been busy planning, capturing use cases and test cases, considering security issues, publishing internal communications about the work, and responding to feedback from those communications. There’s still a few areas of uncertainty, which means we don’t yet have a solid time-scale for the full transition to the new services, but this should become clear over the next few months.

Web Crawlers
While the core crawl system has not been changed in the last quarter, we have made improvements to how the crawls are launched and how the current Document Harvester is implemented.

Specifically, all services have now been moved from our older workflow system to our new Airflow platform (as mentioned in the 2022-01 technical update). This means these automated tasks are now easier to monitor and manage. In particular, the older workflow system has been struggling for some months due to the large number of tasks involved in the Document Harvester workflow. The underlying tools have been heavily refactored to make sure the document identification and extraction processes are much more efficient and reliable.

W3ACT (Annotation and Creation Tool)
While W3ACT itself has not been updated during the last quarter, the version of PyWB it uses has been updated to the latest 2.6.7 release.

UKWA Website
The new searchable Topics & Themes page is now live, making it much easier to explore our curated collections. We’ve found a few minor issues, such as some collections not appearing on the page, but we’ll work on ironing these out over the next weeks.

To help us update our website with confidence, we’ve made a number of improvements to our automated testing system. This has been refactored to make it easier to run, and extended to cover almost all critical web services and APIs. As well as making changes easier to implement, this also means we can automatically run the test suite every morning, and will be alerted if anything isn’t working as expected (UKWA staff and partners can access the most recent test report at this URL).

This new test suite includes experimental support for running the Pa11y accessibility evaluation tool, and including the results in the test report. In time, this will help us ensure any changes we make to the website do not negatively affect the accessibility of the site (at least to the extend that automated testing can determine).

Archive of Tomorrow
Finally, we’ve enjoyed starting to get into some detailed conversations with our Archives of Tomorrow project colleagues. Among other things, these conversations will help drive our nascent UKWA API work, by helping us explore how best to make our curated collections and other data and metadata available for re-use. These discussions also reminded us to polish off some updates to our screen-shotting services, which means the Twitter and Open Graph social card support we’ve added to our playback pages should now be significantly more responsive and reliable.

To find out more about the Archives of Tomorrow project, you can check out this IIPC blog post: Archive of Tomorrow – Capturing online health (mis)information.

05 July 2022

What to expect on the UK Web Archive blog during UEFA Women’s Euro England 2022

By Helena Byrne, Curator of Web Archives, British Library

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the UEFA Women’s Euro 2022 from around the UK

You can view the UEFA Women’s Euro England 2022 collection here:  https://www.webarchive.org.uk/en/ukwa/collection/4278

a blue banner image with the British Library, Inspired by England 2022, the National Football Museum and the UK Web Archive. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive QR code and link to the nomination form: https://www.webarchive.org.uk/en/ukwa/info/nominate

Over the next few weeks there will be a number of guest blog posts from the UK Web Archive and collaborators from around the UK. 

First up, we will have a blog post from the National Library of Scotland and the National Library of Wales. Neither Scotland nor Wales qualified for this edition of the tournament, but as part of the UK Web Archive, both national libraries will be contributing to the collection and ensuring that any fan events taking place are preserved. 

From the 18th July there will be a number of blog posts published each week in July.  There will be a guest blog post from the Public Records Office of Northern Ireland (PRONI) who will be contributing a range of content from Northern Ireland. The team from Northern Ireland made history by qualifying for their first UEFA Women’s Euro tournament. 

There will be a series of blog posts from the tournament’s Arts and Heritage partners in the host cities. There were three specially commissioned projects to celebrate the rich history of women’s football and its players and to encourage more people to be inspired by the tournament. These blog posts will also include updates from across the UEFA Women’s Euro England 2022 host cities. These blog posts will give a summary of their local cultural programme activities, as well as an overview of what websites they nominated to the collection that are important for telling the story of the UEFA Women’s Euro England 2022 tournament in their area.

The final blog post in the series will be published in late September, this will be a reflection on the collection activities and give an overview of some personal favourites from the curator of the web archive collection, Helena Byrne. 

Get involved 
Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form: https://www.webarchive.org.uk/en/ukwa/info/nominate 

29 June 2022

What content should I nominate on the UEFA Women’s Euro to the UK Web Archive?

By Helena Byrne, Curator of Web Archives, British Library

a blue banner image with the UK Web Archive, British Library, Inspired by England 2022 and the National Football Museum. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive:

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the 2022 UEFA Women’s EURO from around the UK. You can view the collection here:  

https://www.webarchive.org.uk/en/ukwa/collection/4278 

This blog post runs through some examples of the type of content you might like to nominate to the collection. 

We archive websites: 1. That are on a .uk or other UK geographic top-level domain such as .scot or .cymru. 2. That are published in the UK.  We do not archive: 1.Online Sound or Video platforms, in which audio-visual material is the predominant content. 2. Private Intranets and Emails. 3. Personal data in social networking sites or websites only available to restricted groups.

We archive as much openly available online content that we can identify as being published in the UK. Archiving is carried out through a mix of automated processes such as an annual domain crawl or through manual selection by the UK Web Archive teams, as well as the public nomination form.

UEFA Women’s Euro England 2022
For the UEFA Women’s Euro England 2022 we want content that specifically refers to the tournament. Some websites might only have a subsection or even just one page dedicated to the tournament so you can nominate that specific URL. 

We add the following type of web content to the collection:

  1. Full website
  2. Subsection of a website
  3. Individual page from a website
  4. Event page
  5. Twitter accounts

Unfortunately due to technical challenges, the only social media content we can successfully archive is Twitter. If you know of any high-profile Twitter accounts -  that aren’t personal accounts of ordinary people - then please nominate them. 

Examples of some website content we have added so far include:

Full website
Have you seen any new websites set up just for the UEFA Women’s Euro 2022 tournament? Most websites will, at most, just have a dedicated subsection or page for the tournament. Some websites such as the official sponsor, Visa, highlight the tournament on their home page in the run-up to and during the tournament. This is why we have added the whole website to the collection, as it is easy for the user to navigate from the home page of the archived website during the tournament to the dedicated section for the tournament. 

Subsection of a website
The FA website has a subsection dedicated to UEFA Women’s Euro 2022. The earliest captures of this subsection are from July 2020 which you can view here:

https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022 

a screenshot of the UEFA Women’s Euro 2022 subsection of the FA website from July 26 2020. The text reads Women’s Euro set for 2022. The UEFA Women’s Euro 2021 in England is postponed until the summer of 2022] https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022

Link to archived website: https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022 

Individual page from a website
In some cases there is just one page on a website relevant to the collection subject. When thinking about women’s football, the Royal Philharmonic Orchestra (RPO) doesn’t always come top of the list of potential websites. However, they have partnered with the FA to ‘engage fans in a range of musical opportunities and public events celebrating the history, ethos and future of women’s football’. What other websites have you seen that have posted an article about the UEFA Women’s Euro 2022 tournament? 

You can listen back to the archived versions of the anthems on the RPO website here: https://www.webarchive.org.uk/wayback/archive/20220621111257/https://www.rpo.co.uk/rpo-resound/womens-euro-anthem 

Event pages:
There are lots of events going on around the UEFA Women’s Euro 2022, these range from official events, fan-led events or venues organising their own events such as talks, book launches or watch parties for the matches. Eventbrite is one of the most popular platforms for ticketing these events, but have you seen any other platforms or websites?

A search on Eventbrite for Euro 2022 in the United Kingdom on the day of writing comes back with 500 pages

Twitter accounts:
Archived copies of Twitter accounts are only accessible through a reading room, but you can view what we have selected here: https://www.webarchive.org.uk/en/ukwa/collection/4284

We have already added the Twitter accounts of the players for England, Northern Ireland and other players based in the UK. However, we may have missed some, so please let us know through the nomination form.

Get involved 
Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form.

15 June 2022

Breaking the News - News collections in the Web Archive

By Jason Webber, Web Archive Engagement Manager, British Library

The British Library is currently running the wonderful ‘Breaking the News’ exhibition. If you’ve not seen it yet, make sure you check it out. It is open until Sun 21 Aug 2022. The exhibition explores how the News has impacted and influenced our society. This exploration includes modern digital forms of news, much of which are contained in the UK Web Archive (UKWA).

Breaking The News

The ‘News’ collection in UKWA contains over 2700 news sites that we archive. The scope ranges from major national news outlets - BBC, Guardian, Daily Mail etc. as well as many local and even hyper-local news websites. The collection includes one newspaper, The Independent, that ceased being a print paper to become exclusively a digital one.

The majority of these archived news sites and twitter accounts can only be viewed in reading rooms of UK Legal Deposit Libraries. Many, however, are openly available to view from home, lets see some examples:

Local news
In addition to major national news outlets we collect thousands of local and hyper-local news websites. Many towns, suburbs and villages maintain a local news website and we do our best to archive them.

Brixton blog

Bristol cable

Archived website - Bristol Cable

Cranfield and Marston Vale Chronicle 

International
Whilst the focus of the our collection is for UK based news, we do also collect some international or overseas publications. Tristan da Cunha, one of the remotest places on earth maintains a news website for its residents.
Irish news - TheJournal.ie

Tristan da Cunha News 

News-tristan

About journalism
As well as news outlets aimed at us the public, we also collect websites for journalists themselves.

The Bureau of Investigative Journalism

Media helping media

News-media-helping

You can discover everything we have collected in the News collection via our website.

If you know of a UK news website (this might be about your local area), nominate it to the UK Web Archive.

31 May 2022

Can you help the UK Web Archive preserve football history?

By Helena Byrne, Curator of Web Archiving, British Library

image of a female footballer kicking a ball on a blue background

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We’re collecting websites about the 2022 UEFA Women’s EUROs. Nominate a website for us to archive – it’s free and easy to do.

Since the launch of the UK Web Archive in 2005, this is the second time that England has hosted the Women’s European Championships. England hosted the 2005 edition of the tournament, but this is the first time that the UK Web Archive has a dedicated collection on the event. In late 2017, the UK Web Archive started to formally curate sports websites by establishing three main collections on sport. They are the Sports Collection, Sports: Football and Sports: International Events

The Sports: Football collection is divided into subsections based on the code of football and was given its own collection as football is the most popular sport in the UK. The final collection in this series is Sports: International Events, documents major sporting events mostly hosted in the UK. It is in this collection that the UEFA Women's Euros England 2022 collection will sit.

The British Library is working in partnership with the official Women's Euros cultural programme led by the FA, the National Football Museum and the five other UK Legal Deposit Libraries that make up the UK Web Archive to curate this collection but we also want fans to get involved. 

text that says in partnership and then the logos for the British Library, Inspired by England 2022, National Football Museum and the UK Web Archive

This collection has six subsections that cover events both on and off the playing field:

Cultural Programme: Any websites and social media accounts related to the cultural programme during the tournament. This includes arts, heritage and learning events.

Fans: Websites, blogs and social media accounts written by fans of the sport.

Organisational Bodies/Venues: Football Association, Irish Football Association, match stadiums and local government websites.

Press Media & Comment: News and comment, including the UEFA Women's Euro England 2022 landing pages on BBC and other media websites etc..

Sponsors: UK Websites and news articles relating to some of the official sponsors of the UEFA Women's Euro England 2022.

Teams: Websites and social media accounts of players' based in the UK. This will mostly be made up of players from England and Northern Ireland but also a few players from the other countries that qualified for the competition and live in the UK.

We need your help to ensure that information, discussion and creative output related to women’s football are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form: www.webarchive.org.uk/en/ukwa/info/nominate