THE BRITISH LIBRARY

UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

29 May 2015

Beginners Guide to Web Archives Part 1

Add comment Comments (0)

Arriving at the British Library as an intern, one of the tasks laid out before me was to create and curate a special collection for the UK web archive. To some readers of this blog this activity may seem fairly self-explanatory. However, before arriving at the library I had never even heard of web archiving, let alone considered why we do it and who it could be useful for. In a short series of blogs I will explore these questions from the novice’s point of view, both my own and that of academic researchers hoping to use the resource. I hope to convey the new user’s perceptions of the challenges and opportunities of the archive, as well as providing an introduction for interested beginners.

Spiders spinning furiously

The web is a vast resource. In 2008 Google had found 1012 URLs online. It has been suggested that the web represents a rapid expansion in human knowledge. Certainly it enables greater access to human knowledge for billions of people. It is also a place where a huge range of opinions are openly expressed. However, the content of the web has a very rapid turnover, with around 40 % of websites changing their content within a week. Without web archiving (the practice of collecting and storing websites), many human writings are inevitably - often accidentally - lost.

The UK web archive now collects almost the entire UK web-space. One of the problems facing users of the archive is the astounding amount of data through which to sift. One way of getting around this problem is to create so-called ‘special collections’, groups of websites that fall under a particular theme. This enables the curator to provide the user with a set of data that is easier to sort and search.  

My special collection

WalkAgainstWarming
https://www.flickr.com/photos/erlandh/270904893/

As a science PhD student, I felt my special collection should be built with the aim of answering research questions related to a scientific topic. I specialise in oceanography and past climate changes and I am aware of the almost constant debate that occurs on hundreds of climate related websites about climate science, the social impacts of climate change and the policies that should be enforced. A special collection on these issues might be useful for answering questions such as: How has the web influenced public opinion on climate change? As new science rolls in, how do viewpoints expressed on the web change? How do different organisations use the web as a platform for promoting their beliefs?  

Global warming in perspective
https://www.flickr.com/photos/wheatfields/4688140998/in/photolist-2XsBdQ-92Bik-7u74nu-a55CZL-s6TSND-89gWZd-8FJLyQ

To provide a resource for answering these questions I plan to select webpages from organisations including environmental charities, climate sceptic think-tanks, energy companies and government; and yet more pages of blogs, articles and discussion. I hope that this collection will become a useful resource for anyone interested in the climate change issue. But would this resource be something researchers might actually use? And how might they go about using it? Find out in my next post.

PeteSpooner
Peter Spooner, Science Policy Intern

 

23 April 2015

Web archiving as a challenging business

Add comment Comments (0)

My internship here at the British Library’s Web Archiving team comes to an end and I try to sum up my impressions. I would say, I have been somewhat stricken by how a daunting task web archiving is, and how much challenges it creates for professionals.

Displaying an open collection

The British Library provides the public with an open collection of websites, accessible from anywhere. These open collections are resource heavy, being enriched with metadata and descriptions. This task is done by web curators and web archivists. The latter are also in charge of quality assurance, they check if the harvest was done properly by the web crawling software. Giving open access means asking permission from the website owners. This is a very labour intensive and slow process, which would easily require two or three times the current available resources. To face the emergency of some events, such as next General Election, the selection is done now, while the permission requests have to be postponed to a less busy time. For some resources, open access is not an option as for example some news websites who charge for access to their own archives.

  French in London
Providing searching tools

You’d think things should get easier since the 2013 Legal Deposit Libraries (Non-Print Works) Regulations have allowed British Library to collect and preserve UK websites without asking permission. But new issues arise: collecting a huge quantity of data, indexing it, preserving it on a long term perspective, dealing with the fact that the appearance of an archived website may not be the same as its live version. And then all this content must be made available for users (restricted to the reading rooms for websites without permission).

  LDUKWA-AT

But how does one search a web archive? Anyone who tried once probably had this annoying sense that there is definitely too much data to deal with. One of the challenges is consequently to provide users with efficient tools enabling them to find their way through this maze of data. Consequently users need to learn how to use these tools, bearing in mind their expectations may be shaped by the habit of using Google. Yet, using the web archive for scholarly purposes is a completely different approach. A historical search engine must meet specific requirements. No Google-like relevance sorting here but a mere chronological ranking enhanced with powerful results refine functionalities like events or time line. This research project from the L3S Research Centre in Germany is one amongst other involving web archive, showing that the tool building is made hand in hand with researchers who use web archive as a material for their work.

  Graph

Being involved in web archiving today is really fascinating. It means observing and being part of an emerging field. This was also discussed at the opening presentation of 2014 IIPC General Assembly.

A new job?

Web archiving is not really part of librarians’ training yet, and professionals have to learn by doing. At this moment in time web archiving only concerns few people, not more than a handful mostly based in national libraries (this becomes less true over time as can be seen in the composition of IIPC).

  Gallica

But issues arising with web archiving are in line with general trends for libraries. It concerns electronic journals management, mostly bought and displayed as packages, or mass digitisation projects. The new challenge consists in dealing with scale matters. The core business of librarians is seemingly shifting from selecting to highlighting resources. Social media channels are one of the new librarian’s tricks to do so. Most of digital libraries have a twitter account (see the often humorous @GallicaBnF) as well as the web archives (@internetarchive, @UKWebArchive@DLWebBnF). 

BLReadingRoom

 

 

 

 

 

 

 

 

 

 

Apart from archiving work these teams of specialists are doing, one other task is the promotion of web archives inside the libraries themselves. The reference staff may not be comfortable yet with this new material, and still very few readers use the web archive. Another challenge to come!

Clémence Agostini (intern at the BL Web Archiving team from ENSSIB)

25 March 2015

Political parties in the UK Web Archive

Add comment Comments (0)

As there’s only six weeks to go until the General Election, it might be a good time to look back on the previous elections web sphere, through the 2005 and 2010 General Election websites collected by the UK Web Archive.

Political parties’ websites currently in Parliament

The Conservative party: http://www.webarchive.org.uk/ukwa/target/101940/source/search
The Labour Party: http://www.webarchive.org.uk/ukwa/target/101311/source/search
Liberal democrats: http://www.webarchive.org.uk/ukwa/target/102621/source/search
UKIP: http://www.webarchive.org.uk/ukwa/target/109998/source/search
Green Party: http://www.webarchive.org.uk/ukwa/target/108088/source/search

  Conservatives

Scottish parties

Scottish National Party (SNP): http://www.webarchive.org.uk/ukwa/target/30441472/source/search Scottish Socialist Party: http://www.webarchive.org.uk/ukwa/target/99112/source/search

Welsh parties

Plaid Cymru - The Party of Wales: http://www.webarchive.org.uk/ukwa/target/102036/source/search

Northern Ireland parties

Democratic Unionist Party (DUP): http://www.webarchive.org.uk/ukwa/target/106592/source/search
Sinn Fein : http://www.webarchive.org.uk/ukwa/target/106020/source/search
Ulster Unionist Party (UUP): http://www.webarchive.org.uk/ukwa/target/105944/source/search
Social Democratic and Labour Party (SDLP): http://www.webarchive.org.uk/ukwa/target/107880/source/search
Alliance Party of Northern Ireland: http://www.webarchive.org.uk/ukwa/target/106002/source/search

  SinnFein

Other parties

Respect Party: http://www.webarchive.org.uk/ukwa/target/40632374/source/search
British National Party (BNP): http://www.webarchive.org.uk/ukwa/target/106040/source/search
The Liberal party: http://www.webarchive.org.uk/ukwa/target/40632386/source/search
Socialist labour party: http://www.webarchive.org.uk/ukwa/target/107243/source/search

LoonyParty

English Democrats: http://www.webarchive.org.uk/ukwa/target/29261833/source/search
The Christian party: http://www.webarchive.org.uk/ukwa/target/43810817/source/search
Health Concern (Independent Community & Health Concern): http://www.webarchive.org.uk/ukwa/target/37617688/source/search
Monster raving loony party: http://www.webarchive.org.uk/ukwa/target/110017/source/search

Candidates

You can also find former candidacy websites on the UK Web Archive. This might be interesting to check if old promises have been fulfilled. Below are some examples, but you can also try any other candidate by typing his or her name in the quick search box:  http://www.webarchive.org.uk/ukwa/subject/89/page/1

David Miliband (2010) http://www.webarchive.org.uk/ukwa/target/49905672/source/search
Nick Clegg (2010): http://www.webarchive.org.uk/ukwa/target/43188235/source/search
Nigel Farage (2010): http://www.webarchive.org.uk/ukwa/target/44695591/source/search
Caroline Lucas (2010): http://www.webarchive.org.uk/ukwa/target/44695599/source/search
David Cameron (2005): http://www.webarchive.org.uk/wayback/archive/20050524120000/http://www.votedavidcameron.com/index.html

  NicholasClegg

 Enjoy !

Clémence Agostini (intern at the BL Web Archiving team from ENSSIB)