There's been some news coverage in the last few weeks of the decision of the Conservative Party to reorganise their website, removing an archive of speeches up to 2010. The original report appeared in Computer Weekly (here) and subsequently the story was picked up by media including The Guardian, the Financial Times and Channel 4 News. In the subsequent debate there were a few factual inaccuracies, and so we thought it worth blogging about archival copies of these pages, and of other UK political party content.
Firstly, the copies held by the Internet Archive (archive.org) were not erased or deleted - all that happened is that access to the resources was blocked. Due to the legal environment in which the Internet Archive operates, they have adopted a policy that allows web sites to use robots.txt to directly control whether the archived copies can be made available. The robots.txt protocol has no legal force but the observance of it is part of good manners in interaction online. It requests that search engines and other web crawlers such as those used by web archives do not visit or index the page. The Internet Archive policy extends the same courtesy to playback.
At some point after the content in question was removed from the original website, the party added the content in question to their robots.txt file. As the practice of the Internet Archive is to observe robots.txt retrospectively, it began to withhold its copies, which had been made before the party implemented robots.txt on the archive of speeches. Since then, the party has reversed that decision, and the Internet Archive copies are live once again.
Whatever the details of this particular case, it's worth noting that the Internet Archive's playback policy is not widely known. Most webmasters only consider search engine crawlers when they configure their robot rules. For example, it is not uncommon to use this mechanism in order to prevent crawlers from creating lots of 'Not Found' errors as they follow incoming links to content that is not longer available.
For our own part, we had been archiving the whole Conservative Party site since 2004, by the express permission of the party, and those archived copies are available in the public UK Web Archive (UKWA). We have also archived the sites of the Labour party and the Liberal Democrats since around the same time. In contrast with the Internet Archive, we do not use recent changes to robots.txt to determine access to archived sites.
There are many other sites for which we do not have the same permission. However, since the advent of Non-Print Legal Deposit in April 2013, we may archive any site from within the UK, although users must visit one of the six legal deposit libraries for the UK in order to see the archived copy.
It isn't only the sites of the main political parties that we archive. Also in UKWA are extensive collections for the 2005 and 2010 general elections and the 2009 elections to the European Parliament. As well as the sites of the main parties, these include the sites of local party organisations and individual candidates, as well as news media coverage, opinion polls and the contributions of interested groups and individuals. There are also many websites of sitting MPs, many of which have since disappeared from the live web as the member lost their seat. Examples of these include Kitty Ussher, minister in the Labour government between 2007 and 2009, and the Conservative former minister Peter Bottomley.
We have also archived materials relating to major changes in public administration, such as the abolition of the police authorities in England and Wales in 2012, and the reorganisation of the NHS (also in England and Wales) in April 2013.