UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites


News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

17 May 2016

Saving BBC Recipes Website

There's been much coverage today of plans to remove the recipe pages from the BBC website.


The UK Web Archive has been collecting selected pages from the BBC, mainly news, for over ten years and since 2013 we have attempted to capture the entirety of the BBC web estate. A small number of pages are available on the Open UK Web Archive website. Most of the BBC's online presence, however, is only available in the reading rooms of UK Legal Deposit libraries, including both of the British Library sites at St. Pancras and Boston Spa in Yorkshire.

We have today instigated a further crawl of the BBC website with the specific aim of ensuring that we save the recipes from the food pages. We can also report that the Internet Archive, Library of Alexandria and the National Library of Iceland have also captured these pages so their future is assured.

Polly Russell, British Library Curator and Food Historian says 

"Cookery books, like cookery websites, obviously serve a practical purpose but that is not all. For historians, sociologists and anthropologists they also tell us about people's culinary aspirations and anxieties, cultural tastes and trends, dietary preoccupations, social expectations and economic conditions. They are, therefore, a rich source for researchers. So while it's sad news to hear about plans to close the much trusted and well-loved BBC Food website, it's a relief that the British Library is going to be able to archive the website for posterity."



26 April 2016

Easter Rising 1916 Centenary in Print and Digital

Ireland has been gripped by  commemorations of the Easter Rising in the last month. The Rising took place from the 24th April to the 29th April 1916 in Dublin. A packed programme of events and activities took place across Ireland and in Irish communities further afield to commemorate this centenary.

In March 2016, addressing a colloquium at the Bodleian Library, Oxford, the Irish Ambassador to the United Kingdom, his Excellency Daniel Mulhall, emphasised the transnational and inclusive nature of the commemoration programme in his opening remarks. The 1916 Rising had a global impact with ripples felt as far as Asia and India. This is reflected in the range of events taking place in the United Kingdom, supported by the Irish Embassy.

In military terms the Rising was a failure and had consequences for the people of Dublin with 415 people killed, the majority of whom were civilians.

Turning to the documentation of the Rising, there are a number of interesting documents within the Library’s collections relating to the Rising. The British Library does not hold an original broadside of the Proclamation of an Irish Republic. Nevertheless, later examples of the document were acquired retrospectively.

The earliest example of a version of the proclamation in the British Library’s collections, can be found at C.S.A.24/3.(1.). This is interesting from a bibliographical stand point because it is the first entry under the new heading in the British Library Printed Catalogue to 1975:


Provisional Government of the Irish Republic 1916. Miscellaneous Public documents. 

That the Library classified this proclamation as a public document and gave the document the C.S.A., official publication pressmark prefix, which originates from the 1890s, is of particular interest.  The third factor which is of interest is that this version of the proclamation is the only item in the green bound guard-book which is embossed on the spine in gold.

Poblacht na heireann1916


Although the red (purchase) stamp appears on the reverse of the document, because of the way it has been mounted in the volume it is unclear when the item was acquired. It appears to read 15 May ‘59. The volume itself bears the British Museum binders stamp B.M.1961 on the inside of the rear board. These dates indicate that this item, as with other ephemera relating to 1916 Rebellion, was acquired retrospectively. 

Poblacht na heireann 1941

The second example of the proclamation is a more ornate affair. It is a single sheet dating from 1941, measuring approximately 325mm x 255mm. The text of the document is laid out in the same fashion as the original, but the type face has been standardised, removing the anomalies from the original, and the list of signatories has been centred rather than justified to the right as in the original. What is most striking about this item are the portraits of the seven signatories surrounding the text and connected by the decorative boarder. At the bottom centre surround in a circle is the Irish Army sunburst emblem, designed by Eion MacNeill, and interestingly it is reproduced without the inscription "Óglaigh na hÉireann" or Irish Volunteers.

Irish War News Irish War News p4

The third document is a piece of contemporary ephemera which traces its lineage to the focal point of the rebellion. Dated Tuesday April 25 1916, on the last page of the first issue of Irish War News it is an article headed:

“Stop Press (Irish) ‘War News’ is published to-day because a momentous thing has happened. The Irish Republic has been declared in Dublin and a Provisional Government has been appointed to administer it is affairs.”

 The article goes on to name the signatories of the proclamation as the Provisional Government while outlining the situation in Dublin from the rebel prospective.       

The Rising, or more particularly the centenary of the events in Dublin a hundred years ago, is being explored and represented in new ways thanks to technology and the work of colleagues at Trinity College Dublin and the Bodleian Library Oxford. In the last year they have built and curated a collection of websites related to the commemoration.

These have been archived as part of the open UK Web Archive.  To have the opportunity to build this collection of Irish and UK websites is an exciting prospect for the future of web published content. This endeavour illustrates how the internet is not confined by national boundaries. The work on the Easter Rising collection exemplifies how archivists working together can build a contemporary collection which provides a range of perspectives from all corners of the .uk and .ie domains.   

Archiving websites about anniversaries and centenaries such as Easter 1916 is of prime importance because such sites can be transient and are soon overwritten or taken down. Archiving them creates a research resource for the future which offers scholars and anyone interested the opportunity to explore and examine the response to this centenary on the published web.

The Easter Rising collection is currently a growing part of the UK Web Archive special collections where it can be freely consulted online.

By Jeremy Jenkins, Curator Emerging Media, The British Library


Further Reading

Bouch, Joseph J. “The Republican Proclamation of Easter Monday, 1916,” Bibliographical Society of Ireland, Publications vol.5. no.3 1936. General Reference Collection: Ac.9708/2 [A reissue].

The Easter Proclamation of the Irish Republic, MCMXVI
Dublin : Dolmen Press, 1960. General Reference Collection: Cup.510.ak.37

The Easter Proclamation of the Irish Republic 1916,
[S.l.] : Dolmen Press, 1976. Document Supply Shelfmark: D76/23312



15 February 2016

Introducing SHINE 2.0 - A Historical Search Engine

Add comment Comments (0)

In 2015, as part of the Big UK Domain Data for the Arts and Humanities project, we released our first ‘historical search engine’ service. We’ve publicised it at IDCC15, the 2015 IIPC GA and at the first RESAW conference, and so far has been very well received. Not only has it lead to some excellent case studies that we can use to improve our services, but other web archives have shown interest in re-using the underlying open source code. In particular, some of our Canadian colleagues have successfully launched, which lets users search ten years worth of archived websites from Canadian political parties and political interest groups (see here for more details).

Even bigger data!
But we remained frustrated for two reasons. Firstly, when we built that first service, we could not cope with the full scale of the 1996-2013 dataset, and we only managed to index the two billion resources up to 2010. Secondly, we had not yet learned how to cope with more than one or two users at a time, so we were loath to publicise the website too widely in case it crashed. So, over the last six months, and with the guidance of Toke Eskildsen and Thomas Egense at the State Library of Denmark, we’ve been working on resolving these scaling issues (their tech blog is definitely worth a look if you’re into this kind of thing).

Thanks to their input, I’m happy to announce that our historical search prototype now spans the whole period from 1996 to the 6th April 2013, and contains 3,520,628,647 distinct records.


Broken down by year, you can see there’s a lot of variation, depending on the timings of the global crawls from which this collection was drawn. This is why our trends visualisation plots query results as a percentage of all the resources crawled in each year rather than absolute figures. However, the overall variation and the fact that the 2013 chunk only covers the first three months should be kept in mind when interpreting the results.

Time travel?
You might also notice there seem to be a few data points from as early as 1938, and even from 2072! This tiny proportion of results correspond to malformed or erroneous records, although currently it’s not clear if the 1,714 results from 1995 are genuine or not. No one ever said Big Data would be Clean Data.

De-duplication of records
Furthermore, we’ve decided to change the way we handle web archiving records that have been ‘de-duplicated’. When the crawler visits a page and finds precisely the same item as before, instead of storing another copy, we can store a so-called “revisit record” and refer to the earlier copy rather than duplicating it. This crude form of data compression can save a lot of disk space for frequently crawled material, and it’s use has grown over time. For example, looking at the historical dataset, you can see that 30% of the 2013 results were duplicates.


However, as these records don’t hold the actual item, our indexing process was not able to index these items properly. Over the next few weeks, we shall scan through these 65 million revisit records and ‘reduplicate’ them. This does mean that, for now, the results from 2013 might be a bit misleading in some cases. We also failed to index the last 11,031 of the 515,031 WARC files that make up this dataset (about 2% of the total, likely affecting the 2010-2013 results only), simply because we ran out of disk space. The index is using up 18.7TB of SSD storage, and if we can find more space, we’ll fill in the rest.

Do try it at home
In the meantime, please explore our historical archive and tell us what you find! It might be slow sometimes (maybe 10-20 seconds), so please be patient, but we’re pretty confident that it will be stable from now on.




By Andy Jackson, British Library Web Archiving Technical Lead