UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

12 February 2013

What’s in a name ? Domain names and website longevity

Add comment Comments (3)

I wrote about how to make websites more archivable in a previous post. Having websites archived and making an effort to make websites “archive-friendly” are all good steps which can help increase their longevity. This blog post is about domain names, the name you use to call your website and the address which identifies it on the Web.

To obtain a domain name, you need to pay an annual fee with a registrar for the right to use it. The rented nature of domain names means that they are not permanent and the same domain name could host completely different content at different times if it changes hands.

When planning the take-down or replacement of a website, the question of what to do with the domain name requires some thought. As well as being relevant to record-keeping, it is an important part of (business) continuity.

CyboRoz 404

In most cases the existing domain name is used to host the new version of the website. This is usually the right thing to do – users expect it and (if you chose the right one) a domain name often becomes a part of the identity of the website and/or the brand. Unless there are good reasons to switch to a new one, most domain names are kept when changing websites. Many websites also provide users with the option to view historical versions of the website by linking to a web archive or putting in place a landing page which points to old versions as well as new.

When a website is taken out of service, keeping the domain name and redirecting it to the archival version is also an option. This will incur a small charge in retaining the domain name; but this is much less than paying for the hosting fee and technical support to keep a website live. The advantage of this approach is seamless continuity: users are automatically referred to an archival version of the website without having to be aware of the existence of the web archive. For example, www.oneandother.co.uk, the domain name of the One and Other Project, featuring artist Antony Gormley’s commission for Trafalgar Square’s ‘empty’ fourth plinth in July 2009, points directly to the archival version in the UK Web Archive. Users can type the same web address or click on a link as they used to do and get to the website, despite the fact that it disappeared from the live web years ago.

Keeping the domain name may not be the right solution for everyone but it’s a possibility well worth considering.

Helen Hockx-Yu

[Image courtesy of Roberto Zingales, Creative Commons CC-BY 2.0, via Flickr]

07 February 2013

Archiving social media: a workshop report

Add comment Comments (0)

I was very pleased to be invited to a recent workshop on social media archiving. It was organised by Laura Lannin and colleagues at the Museum of London, to whom many thanks for a wide-ranging and stimulating afternoon.

The day saw a cluster of diverse and useful presentations. Among them was our very own Helen Hockx-Yu, on the potential and problems relating to social media archiving on a national scale, as we experience them at the UK Web Archive. Web archiving is always a technological arms race, with the archiving technologies having to adapt constantly as the way the web works continues to change.

The other presentations between them showed the wide variety of perspectives from which the whole issue needs to be approached. Two projects examined the way in which Twitter can be used as a means of identifying content on the wider web that should be preserved, as well as an archive resource in itself. Both projects came from within specialist museums, and both were concerned with the Olympics. The Victoria and Albert Museum (represented by Catherine Flood) had monitored Twitter to identify graphically significant visual resources, shared on Flickr as the Collect London 2012 collection. The Museum of London (in partnership with Peter Ride of the University of Westminster) had gone a step further, bringing together a team of Citizen Curators to keep eyes and ears open during the Games for important resources, and to identify them by means of the Twitter hashtag #citizencurators for later harvesting.

In contrast, Ruth Page (University of Leicester, or @ruthtweetpage) gave us the perspective of a linguist interested in the analysis of large corpora of tweets, for the patterns of language usage within them. And although there was not a presentation from this perspective, several of those present were responsible for social media engagement between museums and their users, and are faced with working out how best to archive their own social media output.

In a previous post, Nicola Johnson reported on the difficulties of implementing web archiving activity in national libraries charged with archiving the web outside their own walls. This workshop neatly showed the different concerns of a wider group of interested parties. Whether it is national libraries, museums or users; whether it is social media content itself or the other resources they link to, there is much to think about when it comes to social media archiving. 

Peter Webster (@pj_webster)

30 January 2013

Surfing the web in time: Mementos

Add comment Comments (0)

Have you ever needed to see a copy of a now-lost website, and didn't know where to start ? Help is at hand, with Mementos.

Mementos search

The Memento protocol has been around for a while (since 2009). It's a way of adding a time dimension to our common HTTP-based way of browsing the web, and has been available as a plug-in for Firefox. (See mementoweb.org for details.)

On the UKWA site, we have launched an alternative web-based way of delivering Memento, without needing to amend your browser. Mementos allows you to search across multiple different web archives around the world at once - particularly helpful if you don't know by which territorial web archive a site is most likely to be kept. It gives a breakdown of how many versions each archive holds, and from when, and leads users through to the archived versions themselves.

Get started with the search page; or, see it in action for the Google homepage (over 4,000 snapshots in four archives since 1999) and the BBC homepage (more than 5,000, in five archives, since 1996).

For those interested in the detailed workings and in reusing the web client, the source code is hosted on Github.

Peter Webster

 

24 January 2013

Web archiving: how to fit it in ? A workshop report

Add comment Comments (0)

[A report by Nicola Johnson, Web Archivist at the British Library]

I attended a workshop “How to fit in – integrating a web archiving program in your organization” at the Bibliotheque Nationale de France, in Paris, 26th – 30th November 2012. It was sponsored by the International Internet Preservation Coalition.

The workshop was intended for curators, archivists and managers involved in (or about to embark on) web archiving at their institutions. The BnF has been archiving websites since late 1999 and has a vast amount of expertise. France was an early adopter of legal deposit for websites, with legislation in August 2006 meaning that websites from the French national domain can be collected by the BnF for preservation and public use. I was particularly interested in the transition that they have made to this large-scale operation, as Legal Deposit legislation is expected in the UK this April and we will have the task of integrating large scale archiving with our current selective undertaking.

BNF

Several IIPC member organisations attended the workshop, hosted in one of the four ‘towers of open books’ at the BnF’s main site. The Francois Mitterrand building was one of the grands projets of the former president and is one of the largest and most modern libraries in the world. Participants included the British Library, the national libraries of Germany, Slovenia, Estonia, Spain and the Netherlands. Also represented were the Bavarian State Library, the California Digital Library, the National Library and Archives of Quebec, the Bibliotheca Alexandrina and the Library of Congress. Participants represented a range of experience in web archiving and were at different stages of national legal deposit legislation.

A wide range of topics were covered, including the integration of web archiving in acquisition practices; the role of subject librarians in selecting websites; and how web collections should align with general collection development policies. As the business of web archiving involves several parts of a library, we also heard representatives of various departments at the BnF speak of their role, including IT, conservation, legal deposit, collections co-ordination and digital and bibliographic information. There were subject specialists from the music, literature and art departments, who spoke about their collection development policies and how to incentivise staff to select websites when they have a multitude of other duties to perform. Given my role as Web Archivist I was particularly interested in the role of the 70 or so curators or “recommending officers” who select websites for the focussed crawls undertaken by the BnF.

A presentation was also made by the Internet Memory Foundation, a non-profit institution based in Amsterdam and Paris. The foundation provides a shared platform for institutions to collect websites and is archiving dozens of terabytes of data every month. They are also involved in various research projects with institutions and are developing a new crawler and architecture for web-scale crawling. Later in the week we also had the opportunity to visit the National Audiovisual Institute (INA), a repository containing 70 years of French radio programmes and 60 years of TV. The INA shares responsibility for collecting legal deposit online content with BnF and began collecting broadcast-related websites in February 2009. It holds approximately 10,000 websites, employing multiple crawlers for different types of content. Access is available at six sites in France, but some material under open licence is available online.

Our hosts succeeded in creating an atmosphere that was relaxed and stimulating (see the pictures); a great many ideas were exchanged and the commonality of purpose among the participants was encouraging. I have returned to work with a renewed vigour and positivity towards web archiving and I know the other participants have after reading their messages after the event. Positive changes are being made in our respective institutions as a result of the workshop.

[Image of the BNF (Creative Commons BY-NC-SA) from Images et Voyages ]

21 January 2013

What could you do with an archive of the UK web, 1996-2010 ?

Add comment Comments (0)

The Analytical Access to the Domain Dark Archive (AADDA) project has brought together a group of scholars to help us formulate which analytical tools users will need to make the most of the JISC UK Web Domain Dataset, a dataset of all the holdings of the Internet Archive for the UK from 1996 to 2010.

A (very large) geo-index of the data is already available for download, and the dataset can also be visualised using the Ngram. But this group of scholars of the humanities and social sciences are beginning to imagine the projects they would like to pursue using the data. I myself began to sketch an answer in a previous post on the AADDA blog. Wikimedia_Servers-0051_17

Since then, summaries of those projects have been appearing on the project blog. Here are some of them.

(i) Dr Richard Deswarte will be Exploring and uncovering Euroscepticism in the Dark Archive.

(ii) Saskia Huc-Hepher (University of Westminster) will be exploring the spatial dimensions of the French community in London.

(iii) Professor Gemma Moss (Institute of Education) will be examining the use of statistical data in setting agendas for education change, and the PISA rankings in particular.

(iii) Carole Taylor is investigating the decline of parliamentary political engagement and its implications.

(iv) Helen Taylor (Royal Holloway, University of London) will be examining the reception of the Liverpool poets

Watch out for more posts here on this project as it unfolds. It is a collaboration between ourselves at the British Library, the Institute of Historical Research (University of London) and the University of Cambridge, and is funded by the JISC.

Creative Commons image courtesy of Wikimedia Commons.

14 January 2013

Religion, politics and the law: a new special collection

Add comment Comments (0)

It has been over two years in the making, but I am delighted to be able to say that my own special collection in the UK Web Archive is now online.

A couple of years ago, long before coming to the BL, I joined the Researchers and the UK Web Archive project at the Library which brought together a group of scholars to guest-curate special collections on our own particular research interests. As an historian, I was interested in the marked sharpening of the terms of discourse about the place of religion in British public life, particularly since 9/11 and the London bombings in 2005. It struck me that a good deal of this debate had already shifted online, and so new ways and means of capturing and preserving it were going to be needed. And so, the ‘politics of religion collection’ (as it was then known) was born. Religion politics law thumbnail

As has been noted many times in this blog, the problem for web archiving is that we’re dealing with other people’s copyright work, and so an individual permission is needed for each site. I have a long list of sites which I would dearly love to add to the collection, but for which (for various reasons) we’ve had no response. So, if you are the owner of Protest the Pope, or Holy Redundant, or Christians in Politics, please get in touch. For now, even if the collection cannot be anything like comprehensive, I do hope that it is at least coherent.

There are particular strengths, and some gaps. It includes many campaigning organisations, both secularist and religious, and is heavy on the conservative Christian organisations about which I myself know most. It is relatively light on non-Christian faiths, since I know the field much less well. It is still very much open, however, and so suggestions of sites that ought to be included are very welcome, via this blog or via the UK Web Archive site.

See a previous post about my progress in 2012.

Peter Webster

07 January 2013

Oral history in the UK: a new special collection

Add comment Comments (1)

[A guest post from Elspeth Millar, Oral History Archive Assistant in the National Lifestory Collection at the British Library.]

I have been involved in the the pilot Curators' Choice project, led by the Digital Curator team. The Curators' Choice project is helping curators within the British Library to establish collections in the UK Web Archive, based on the subject expertise of their curatorial department. As Archive Assistant for Oral History and National Life Stories at the British Library my natural topic of choice was going to be websites relating to 'Oral History in the UK'. I have nominated organisational or individual project websites which give information about a project (project background, participants, funding information), and websites which provide access to finding aids for oral history interviews, but ideally sites which provide direct online access to oral history archive material (either clips or full interviews).

Oral history thumbnail

I was lucky to have existing resources at my disposal to discover relevant websites, in particular our own Oral History section resources page, the Oral History Society website and the Oral History Journal; the journal includes a 'Current British Work' section which helpfully lists current oral history projects around the UK.

Oral History in the UK was traditionally concerned with community history and uncovering 'history from below' although it is now widely used within many academic disciplines.  I hope that the websites so far included in the 'Oral History in the UK' collection demonstrate the variety of ways in which Oral History is now used - from use by community and local history groups, charities but also universities. The range of websites in the collection includes those which document local history (Durham in Time, St. Helier Memories); the experiences of people who have emigrated to the UK (such as the Birmingham Black Oral History Project); disability history (Speaking Up For Disability); health (Testimony - inside stories of mental health care); industry (Songs of Steel); and memories of war (The Workers' War, Captive Memories).

The websites vary widely in the way they present oral history. Many websites, although not all, provide access to extracts from oral history audio or video archive material; and most sites also provide information on the project background, participants and funding arrangements.

There are many more websites I would love to include in the collection; indeed many more  websites have been nominated for inclusion within the collection but the Web Archive team is awaiting permission from the website owners to include the site.  We'll carry on nominating sites for inclusion, but we welcome nominations from the public as well - if you think there is an important UK oral history website that is not being included in the UK Web Archive at the moment contact the Web Archive team.

02 January 2013

Slavery and Abolition in the Caribbean: a new special collection

Add comment Comments (0)

[A guest post by Dr Philip Hatfield, Curator for Canadian and Caribbean Studies at the British Library.]

Back in July I added a short post to this blog about the first stages of selecting material for the UK Web Archive Special Collection, ‘Slavery and Abolition in the Caribbean’. Now, after much trawling of the web and selection of sites, and brilliant work from my colleagues from the UK Web Archive (whose determination and technical wizardry know no bounds) I’m delighted to say that the first iteration is now live for public use. You can access it here, and I hope you find it of use.

Before I go though, some further thoughts about web archiving in the context of this collection. The first thing to note is how important this kind of work is for maintaining a record of not just the Web but writing, publishing and commemoration more generally in the early 21st century. There Plan of the slave ship Brookes are many websites and pages produced during the 2007 bicentenary of the abolition of the slave trade that have either disappeared or no longer have a contactable administrator who can grant the Web Archive rights to collect and display the site. And so, valuable resources for understanding the UK’s engagement with the history of slavery and the politics of remembrance are lost to us.

Following on from this it is impossible to overstate the importance of permissions to the construction of viable collections within the UK Web Archive. Permissions allow sites to be archived and made available to the public and are key to providing a comprehensive research resource. Without them, a collection may not reflect completely the selections of the curator or material that is live on the Web, which is partly the case with the ‘Slavery and Abolition in the Caribbean’ collection. We’ll keep trying with those sites for which we have not yet got permission;  but I am very grateful to those institutions and individuals who have taken the time to consider our request and grant permission.

Highlighting these problems brings me to my main point: this is an evolving collection driven by the need to continue to archive what already exists on the Web and also relevant sites created in future. This is where, hopefully, readers of this post and users of the collection come in. I hope the process of building and maintaining this collection can become a dialogue between users, myself and the UKWA. If you know or moderate a site you think should be part of the collection please do get in touch with me, at [email protected].

[The image is part of a plan of the slave ship Brookes, found in various archived sites, including that of Brycchan Carey . Originally from Thomas Clarkson’s, ‘The history of the rise, progress and accomplishment of the abolition of the African slave trade by the British Parliament’ [BL Shelfmark: 522.f.23]