UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

04 September 2013

Scaling up to archive the UK web

The non-print legal deposit legislation became effective on 6 April 2013, which has fundamentally changed the way we archive the UK web. We are now allowed to collect many more websites, enabling us to preserve the nation’s digital heritage at scale, in partnership with the other five legal deposit libraries for the UK (LDLs).

You may have noticed that not much new content has been added to the UK Web Archive recently. But we have been busy behind the scenes -  crawling billions of URLs, establishing new workflows and adapting our tools. The archived websites are being made available in LDL reading rooms and some of them will also be added to the open UK Web Archive as we progress.

Our strategy consists of a mixed collection model, allowing periodical crawls of the UK web in its entirety coupled with prioritisation of the parts which are deemed curatorially important by the six LDLs. These will then receive greater attention in curation and quality checking. The components of the collection model are: 

  • the annual / biannual domain crawl, intended to capture the UK domain as comprehensively as possible, providing the overview and the “big picture”;
  • key sites - those representing UK organisations and individuals which are of general interest in a particular sector of the life of the UK and/or its constituent nations;
  • news websites, containing news published frequently on the web by journalistic organisations; and
  • events-based collections, which will capture political, cultural, social and economic events of national interest.


LD collection framework
Broad collection framework under non-print legal deposit

The legal deposit regulations allow us to archive in this way on the proviso that users may only access the archived material itself from premises controlled by one of the six LDLs. However, we are also working to provide greater access to high-level data and analytics about the archive, and we will also be seeking permission from website owners to provide online access to selected websites in the UK Web Archive.

Look out for blog posts about the collection based on the reform of the NHS in England and Wales, and our first broad UK domain crawl.

Helen Hockx-Yu is head of web archiving at the British Library

Comments

The comments to this entry are closed.

.