UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

22 August 2012

Visualising the UK Web Domain

Add comment Comments (0)

The UK Web Archive is a selective archive containing Websites selected and preserved by the British Library and partners since 2004.

  “.uk” is one of the largest country-code top level domains in the world with 10 million registrations in March 2012. Selective archiving has many advantages but is costly and fails to capture a comprehensive picture of the national domain. The Legal Deposit Libraries in the UK will be able to collect Web resources at scale when the non-print Legal Deposit legislations are in place, expected sometime in 2013.

The benefits of archived Web resources can only be realised when these are actively used, for research, learning and teaching.  This was the impetus for us to work with the Joint Information Systems Committee (JISC) and the Internet Archive on a collaborative project which extracted a copy of UK Websites from the Internet Archive’s collection. This research dataset , supported by JISC funding, contains Websites crawled between 1996 and 2010 by the Internet Archive and is the largest historical dataset of the UK domain in existence.  One of the objectives of the project is to develop visualisations and services to demonstrate how large scale Web archive collections can be used for analytics, showing embedded trends and patterns which would not have been possible by just consulting historical copies of Websites individually.

The visualisations and secondary datasets are now released on the UK Web Archive http://www.webarchive.org.uk/ukwa/visualisation. The N-gram search is a phrase-usage visualisation tool which charts the monthly occurrence of user-defined search terms or phrases over time, as found in the JISC UK Web domain dataset (1996-2010). The link visualisation shows the relationship between domain suffixes over time.  The format profile is a visualisation of the format analysis, summarising the data formats (MIME types) contained within all of the HTTP 200 OK responses.  We have also released two downloadable secondary datasets which can used to develop further applications, a list of MIME types and a postcode index.

The JISC has also funded two additional projects, using the JISC UK Web domain dataset (1996-2010) to develop analytical access to large scale Web archive collection. These are  Analytical Access to the Domain Dark Archive  and Big Data: Demonstrating the Value of the UK Web Domain Dataset for Social Science Research.  We are running a joint workshop at Digital Research 2012 Conference: Digital Research Using Web Archives.  If you would like to find out more about our projects and Web archiving in general, please come along and join us.

01 August 2012

Diamond Jubilee Collection live

Add comment Comments (0)

We are pleased to announce that our new web collection about the Queen’s Diamond Jubilee is now live. This collection represents an important historical record of online resources which is hoped will provide a lasting legacy of the event and fulfil our aim to prioritise selection of websites that feature political, cultural, social and economic events of national importance.  

The collection, comprising over 130 titles, was initiated in late 2011 by the British Library in collaboration with the Royal Archives and the Institute of Historical Research. Content has been selected by subject specialists from a variety of sources including the Twittervane tool developed by the British Library which enables curators to identify sites frequently shared on social media relevant to specified search terms. Websites were also selected by members of the public who submitted nominations on the UK Web Archive’s online nomination form.

Archiving of websites commenced in January 2012 with a focused period of high-frequency and intensity crawls in the weeks directly before and after the Jubilee weekend on June 2nd – 5th. All harvested websites were checked for quality and completeness before submission to the archive. We will continue to collect websites until December 2012 in order to capture analysis and debate on the issues around the Jubilee.

The aim of the collection was to cover the event as comprehensively as possible and to reflect a multiplicity of strands and themes including official events, the economic impact, public sentiment and political and constitutional debate. Staff at the Royal Household nominated sites of official interest such as the website of the British Monarchy and the official website of The Queen’s Diamond Jubilee.

Websites of official events initiated by Buckingham Palace have been archived including the Thames Diamond Jubilee Pageant, the Queen’s Diamond Jubilee Beacons, the Big Lunch and the BBC Concert at Buckingham Palace.

The Jubilee inspired local, unofficial celebrations such as street parties and other community based events and a selection of their websites have been captured, for example Newry Drama Festival, the Horsted Keyes Diamond Jubilee Organising Committee and Wetherby’s Diamond Jubilee Website.

Beginning in March 2012, The Queen, accompanied by The Duke of Edinburgh, conducted a series of royal tours throughout the UK to mark the Diamond Jubilee year. We have captured samples of local press coverage to cover Her Majesty’s regional visits. See for example the Queen’s visit to Ebbw Vale, Gwent and the Blog by photographer Chris Seddon capturing the Queens Diamond Jubilee Tour of Leicester.

As much of the UK geared up to celebrate the Diamond Jubilee, the occasion also impelled debate about the future of the monarchy. Dissenting voices and opposition to the monarchy have been captured in the archive, see for example the website of the Jubilee Protest ‘Protest at the Pageant’ and Republic: campaigning for a democratic alternative to the monarchy.

The Mass Observation Project worked with us to record online observations from members of the public about the Diamond Jubilee. The observations were hosted on a blog which has been harvested as part of the Diamond Jubilee collection.

New content will continue to be added until December 2012. The British Library would be delighted to receive your nominations for this collection via our online form.  

Nicola Johnson, Web Archivist 1st August 2012

 

 

 

 

 

25 July 2012

Archiving the history of the British slave trade, from the web

Add comment Comments (0)

The following is a guest post by Dr Philip Hatfield, Curator for Canadian and Caribbean Studies at the British Library, who is curating a special collection on ‘Slavery and Abolition in the Caribbean’ for the UK Web Archive.

BoilingHouse
Exterior of an Antigua Boiling House, William Clark 1823 (BL Shelfmark: 1796.c.9). From the Library’s ‘Caribbean Views’ gallery 

When I started working as a curator (only in 2010) one thing I did not expect was how much time I would spend using the Web as part of my work. However, as Curator for Canadian and Caribbean Studies it makes sense, not least because the Internet connects me to the international audience who use the Library’s collections. Also, the Internet contributes to the Library’s collections and, through the UK Web Archive, is becoming part them too. 

This means Library curators are now trialling the development of special collections for the UK Web Archive and when I was invited I jumped at the chance, knowing exactly what required my attention. Back in 2007 the Internet was an important engagement space for museums, archives, libraries and various other institutions to relay the history of slavery and mark the bicentenary of the abolition of the British slave trade. Since then a number of websites featuring online galleries, teaching resources and other materials related to the bicentenary have disappeared from the web. Moreover, this is not the only situation in which such valuable work has been lost to the general public.

So, a special collection focussing on ‘Slavery and Abolition in the Caribbean’ seemed a suitable framework through which to preserve relevant parts of the contemporary UK Web from being lost. I’m currently in the process of selecting websites from a range of UK government, heritage institution, local history and other sites for the collection, which is developing nicely. However, there are a number of stages (permission to archive being but one) before the sites can be collected and the selection goes live. My hope is that, once it does, it will be a useful resource to specialist and general users of the UK Web Archive.

I’m also beginning to realise that there is much more material out there than even I had anticipated and this has a couple of consequences. First, the title needs to change; there are a number of sites which deserve adding to the collection but don’t quite fit with ‘Slavery and Abolition in the Caribbean’. Second, I’m increasingly aware that despite my best efforts the chances are I will miss some excellent material; meaning that if anyone wants to suggest sites from the UK Web please get in touch.

19 July 2012

UK Web Archive in the eyes of scholars

Add comment Comments (4)

We commissioned IRN Research earlier this year to gather a scholarly perspective on the UK Web Archive. This work has now completed and we have received feedback on the Archive’s perceived research value, and particularly on the content and access mechanisms which should be further developed to support research use.

The feedback came from two groups of users: those who already use the Archive for research (26%) and those who have not used the Archive (74%). The overwhelming majority are from Arts and Humanities or Social Sciences disciplines. The participants were interviewed over the telephone and a small group also undertook a second phase where they searched the Archive based on specific case studies, detailing each step of the search and results.

All participants appreciated the potential scholarly value of the Archive. Those interested in web history, statistics and digital preservation research highly value the Archive in particular. However, the selective nature of the Archive seems to impact the perception of those using it for the first time, in that they could not find content relevant to their research. This is further related to the search tool, which has been seen by some as complex with  the presentation of the search results perceived as unstructured. On the contrary, existing users are generally satisfied with the search tool, suggesting that increased familiarity with the Archive may help overcome the perceived weakness.

Special Collections were thought by all users to be useful. However, users would like to understand our selection criteria and how the themes for Special Collections are established. There is a desire to see more Special Collections and the facility to nominate themes. “UK politics” and “Contemporary British History” are the 2 broad themes which have been suggested. All users expressed the requirement for including more images and rich media, as well as more blogs.

Many first-time users are unsure about the usefulness of the visualisation tools, especially the N-gram search. However a small group of users are extremely enthusiastic about this. Again there is more interest in visualisation tools from existing users, suggesting the need to add better explanations about the functions and features of the Archive.

The study has given us some insight on how the UK Web Archive is perceived by scholars, which will direct us through the next stage of development. Things to consider for improvement or adjustment include not only the user interface, but also the underlying search and the scope of our collection.

Many thanks to IRN and those took part in the project.

Helen Hockx-Yu, Head of Web Archiving

 

04 July 2012

Religious Websites and the Diamond Jubilee

Add comment Comments (0)

The following is an edited version of two posts on Peter Webster's blog: one in March before the main Jubilee weekend, and a second in June. They are mainly concerned with sites relating to the Jubilee produced by or in connection with the mainline Christian denominations in the UK.

Although we are still a couple of months away from the event itself, I thought it would be worth starting to pull together some of the various sites for the Queen's jubilee that come from within or relate to the Christian churches. This will include press sources that the UKWA don't ordinarily take. I thought I'd make a start with some of the more predictable and national ones.

Official church resources

As you would expect, the several denominations have made various preliminary statements. The Church of England's site refers to several linked ventures: the Big Jubilee Lunch, with a specially composed grace; there will also be a special service at St Paul's on June 5th, and also the Big Jubilee Thankyou, where Anglicans are invited to sign a copy letter displayed in churches, all of which will then be combined and presented to the Queen - a petition, as it were, without demands. The lunch is being coordinated by HOPE, a pan-church organisation which is evangelical in origin, but has partnerships in place with most of the Protestant denominations in the UK.

See also the Bishop of London's sermon on the accession (Feb 6) in his role as Dean of the Chapels' Royal.

The Catholic bishops in England and Wales have urged parishes to pray for the Queen on Sunday June 3 (which is also Trinity Sunday), as reported in the Catholic Herald. (The press release is here.)

Churches Together in England are assembling resources as they appear here, and there is a joint presidential statement from Canterbury, Westminster, the Free Churches Group, and the Lutheran church, although it is rather lost amongst references to the Olympics.

The Jubilee Churches Festival is looking to co-ordinate celebrations at a local level.

Oppositional voices

One has to dig quite deep to find many Christians voicing opinions critical of either the event or the monarchy itself. Ekklesia noted the beginnings of the campaign of protest by Republic, and complaints about the BBC's coverage, but refrained from comment. (Incidentally, Republic's position on the established church is also interesting.) However, one would expect this type of comment to appear more reactively, and nearer the event; and so watch this space for later posts.

My earlier post looked at some of the preparatory statements from official church sources, and some very early oppositional voices. Here are some examples of reportage and comment after the event.

Rowan Williams' sermon at St Paul's

Perhaps predictably, the archbishop did not allow the pieties of the situation to restrict his thinking on the subject, making some robust comments about aspects of current economic life. See the full text, and the reactions of the Daily Mail (negative) and the Guardian and Nelson Jones in the New Statesman (rather more positive).

Local events

The Church Times gave a useful digest of local events, including a street party in the nave of Ripon Cathedral and various sermons, including that of the Dean of Belfast.  Events in local communities includes an inter-faith Family Fun Day in Tooting, south London.

The 'real meaning' of Jubilee

A good few campaigning sites sought to draw a distinction between the biblical concept of jubilee and the pattern of the celebrations, often making a more or less explicit connection with the current climate of austerity. See Christianity Uncut, Ekklesia and Symon Hill. The work of the Jubilee Debt Campaign predates this year's events, although their site did draw attention to the connection.

Dr Peter Webster

14 June 2012

Crowdsourcing and Web Archiving

Add comment Comments (0)

There has been a long history of members of the public acting as volunteers to refine, enhance and improve the collections of cultural heritage institutions for the benefit of others. Crowdsourcing can be seen as a continuation of this tradition.

The term crowdsourcing can be problematic as it is not necessarily about massive numbers of people or about outsourcing labour but rather about inviting participation from interested and engaged members of the public.

A workshop at the IIPC General Assembly in Washington DC in May 2012 addressed issues around applying crowdsourcing to web archiving. A paper by Trevor Owens, entitled The Crowd & the Library – the Agony and the Ecstasy of “Crowdsourcing our Cultural Heritage" was used as a framework for the workshop and a number of use case scenarios were evaluated by participants on the day.  

A number of key observations were made and extracted from the overall discussion. It was observed that there are advantages and disadvantages in engaging ‘the crowd’ in web archiving both for the institutions carrying out the initiatives and those members of the public involved.

There may be sensitivity around areas where there is already professional expertise within the organisation (e.g. cataloguing). It is important to design the project in such a way that the crowd and the expert each do what they are best at. Advanced users and regular users should be given different tasks, fully utilising the wisdom of the crowd.

Humans are capable of processing information and making judgements in ways that computers cannot. It is a waste of time to ask the public to do tasks that a computer can.

Putting the right tools in place will magnify the user’s effort by making it easier to accomplish tasks. Trade-offs quite often emerge between richer functionalities on a crowdsourcing website and forming barriers to participation by users. Requesting users to login for example has the advantage of being able to store information to enable personalised services but being able to start immediately without login is appealing.

It was pointed out that people feel motivated by doing something that matters to them and get a sense of belonging to something bigger than themselves. Crowdsourcing should be engaging, especially when users are asked to carry out repetitive tasks. It is important to provide feedback to users on how they are doing and how their contribution is furthering the overall progress of the project. This helps to keep users engaged.

Key challenges include devising an appropriate project and attracting an audience sizable enough to participate in the work. It was felt that crowdsourcing within web archiving would suit smaller discrete projects rather than ongoing open ended challenges. Suggested areas for the involvement of the crowd could apply to elements of the web archiving workflow such as identifying websites, quality assurance and cataloguing.

 

28 May 2012

Associations and Citizenship: Researching the ‘Big Society’ on the Web

Add comment Comments (0)

The following is a guest post by Tom Hulme a final-year PhD researcher at the Centre for Urban History, University of Leicester. His research interests are in associational culture, citizenship, local government, and education in the interwar period, focusing on Manchester and Chicago. He has further interests in contemporary civil society and citizenship. He is also currently undertaking an internship at the British Library in the Social Sciences department.

 

With the 2010 Coalition Government’s vision for a ‘Big Society’ of volunteers, supporting or even running public services, the need for researchers to study associations and civil society has never been more pressing. Volunteering, after all, makes up a large part of the social picture of Britain in the twenty-first century; a 2003 Home Office citizenship survey, for example, found that almost a third of people were engaging in ‘active community participation’, a trend that seems to be rising rather than falling. I am interested in examining the organisations that contribute to this culture, especially the ways in which it contributes to the imagined or real vision of a ‘community’ of citizens. What are their motivations and methods? How do they interact with local and national government? Does volunteering and associating with others for the ‘civic good’ create better citizens?

 

Some of the best known associations are, of course, the biggest and longest running. They are the bedrock of civil society, performing duties that have a never-ending purpose: educating the citizenry, fundraising, and acting as local, national, and even international pressure groups. Charities like the Royal Society for the Prevention of Accidents (1917), or non-governmental organisations like Amnesty International (1961), are still operating today because the problems they tackle are probably interminable. Giants of civil society like these can, however, obscure the myriad of local associations that have fleeting appearances, living for just a couple of years, before fading and sometimes vanishing altogether.

 

For the study of this type of parochial civil society, the web archive is an indubitable boon. Pressure groups have long realised the importance of web interaction for engagement and campaigns, and there is a wide variety of such sites in the Web Archive, like the Bengali Cultural Association of London, the Fernherst Society in West Sussex, or the Stop Norwich Urbanisation Blog. Websites like these are a treasure-trove of information and opinion; preserving them gives us a vital glimpse into the way that civil society ‘works’. Not just for contemporary researchers but for the historians of the future, it is vital that these small windows into civic life are recorded and maintained for generations to come. After all, their disappearance from the ‘real web’ can tell us as much about their purpose and operation, as it does about their demise.

11 May 2012

Scholarly value of the UK Web Archive? (correction)

Add comment Comments (0)

Tell us what you think about the UK Web Archive

Question-markIf you are a postgraduate researcher or a university lecturer we would like to get your feedback on the research value of the UK Web Archive. It doesn’t even matter if you have already used the archive or not.

We have commissioned an independent research agency – IRN Research – to gather information on the needs of archive users and potential users. If you would like to help shape the future development of the archive please register your interest.

In the next few weeks you will be contacted by a researcher and emailed an online facilitated walkthrough of the archive which will explain how the site works in just a few minutes. Using this walkthrough, you will be asked to answer questions about the content, functions, and tools available and your interest in, and likely use, of the archive.

All your answers will be treated in the strictest confidence and all those taking part will have the chance to win one of a number of £20 book tokens.

To take part in the research, please register.