UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

24 August 2018

How is the UK Web Archive documenting the ‘bodily autonomy’ debate online?

This blog post follows on from Kelly Burchmore’s post - Building collections on Gender Equality at the UK Web Archive, if you’ve not done so, we would encourage you to read it first.

Background
The UK Web Archive (UKWA) aims to collect online material connected with nationally important issues and debates. Recently this has included the long running discussions around bodily autonomy. Much of this material is via social media, that can be very challenging to collect.

Archivingthe8th

See the trend online.

Why is UKWA #Archivingthe8th?1
Although the UK Web Archive only collects material related to the UK, many individuals and groups connected with the referendum on the 8th amendment1 campaigned in the UK, therefore much of the material falls within our remit.

In Britain there are many sections of the Irish based Abortion Rights Campaign group set up in various cities starting with the London Irish Abortion Rights Campaign, in the lead up to the referendum date they ran a home to vote campaign through the website hometovote.com. The pro-life group London Irish United For Life also ran a similar campaign through the website hometovote.uk. All of these websites and many more websites on any subject related to this subject are archived in the Bodily Autonomy subsection of the Gender Equality collection.

The UK Web Archive only archives content published in the UK, but other web archives also collected content on this subject. The National Library of Ireland built a special collection on the referendum and George Washington University archived over 2 million tweets that used popular hashtags related to the referendum.

How to get involved?
If there are any UK websites or Twitter accounts that you think should be added to the Bodily Autonomy subsection of the Gender Equality collection, then you can take up the UK Web Archive’s call for action and nominate content by following this link:

beta.webarchive.org.uk/en/ukwa/info/nominate

By Helena Byrne, Curator of Web Archives, The British Library

1#Archivingthe8th
On the 25th of May 2018 the Republic of Ireland had a referendum on the 8th Amendment, if repealed this would make way for government to implement legislation on access to abortion services. Although, the referendum on the 8th Amendment only impacted on the laws of the Republic of Ireland its significance spread across the world and it received a lot of international media attention. Both pro-choice and pro-life solidarity campaign groups formed around the world, mostly made up of the Irish diaspora and other campaigners passionate about the subject. After the result was announced the hashtag #archivingthe8th started trending on Twitter as people wanted to know how this part of public history was going to be preserved for future generations.

06 August 2018

Building collections on Gender Equality at the UK Web Archive

This is a guest blog by Kelly Burchmore, a graduate trainee digital archivist on the Bodleian Libraries’ Developing the Next Generation Archivist programme. The Bodleian is one of the 6 legal deposit libraries in the UK. One of her projects this year is to help curate special collections in the UK Web Archive. Since May she’s been working on the Gender Equality collection.

Why are we collecting gender equality websites?
2018 is the centenary of the 1918 Representation of the People’s Act. UK-wide memorials and celebrations of this journey, and victory of women’s suffrage, are all evident online: from events, exhibitions, commemorations and campaigns. Popular topics being discussed at the moment include the hashtags #timesup and #metoo, gender pay disparity and the recent referendum on the 8th Amendment in the Republic of Ireland. These discussions produce a lot of ephemeral material, and without web archiving this material is at risk of moving or even disappearing. Web Archives are able to demonstrate that gender equality is increasingly being discussed in the media and these discussions have been developing over many years.

Through UK Web Archive SHINE Interface we can see that matching text for the phrase ‘gender equality’ increased from a result of 0.002% (24 out of 843,204) of crawled resources in 1996, to 0.044% (23,289 out of 53,146,359) in 2013.

SHINEgenderequality

If we search UK web content relating to gender equality we will generate so many results; for example, organisations have published their gender pay discrepancy reports online and there is a lot to engage with from social media accounts of both individuals and organisations relating to campaigning for gender equality. It becomes apparent that when we browse this web content gender equality means something different for so many presences online: charities, societies, employers, authorities, heritage centres and individuals such as social entrepreneurs, teachers, researchers and more.

What we are collecting?
The Gender Equality special collection, that is now live on the UK Web Archive comprises material that provides a snapshot into attitudes towards gender equality in the UK. Web material is harvested under the areas of:

• Bodily autonomy
• Domestic abuse/Gender based violence
• Gender equality in the workplace
• Gender identity
• Parenting
• The gender pay gap
• Women’s suffrage

100 years on from the introduction of limited women’s suffrage, the fight for gender equality continues. The collection is still undergoing curation and growing in archival records - and you can help too!

How to get involved?
If there are any UK websites that you think should be added to the Gender Equality collection then you can take up the UK Web Archive’s call for action and nominate.

Fawcett_teachingequalrights.jpeg

03 August 2018

Work Experience at the UK Web Archive

By Emily Mahoney

Upon hearing that I had a work experience placement in the British Library, I immediately thought of books and reading, a main passion of mine from a young age. When I found out about the many other sides to working in such an immense organisation, (the British Library employs just over 1,500 people) I realised it would be far more fascinating than I had imagined.

Photo-1457369804613-52c61a468e7d

I was assigned a position in Web Archiving with Helena Byrne for the week. Coming into a week of work experience in Web Archiving seemed overwhelming to me as someone with no previous experience in the topic, however, the team working in the department made me feel reassured immediately. Instead of being nervous, I could then focus on the multitude of interesting new information coming my way.

Photo-1454165804606-c3d57bc86b40

My first task was to identify images for the covers of the newer Special Collections on the UK Web Archive website. I was then informed that I would be working on a project with Leila Nassereldein, a PhD placement student focused on archiving a collection of online zines that are independent, self-published, and authored by Asian, African or Caribbean people in the UK. This was extremely exciting to me as this is an area most people don’t necessarily think of when considering the British Library and Leila was keen on making a space for these zines through which the smaller, independent and sometimes radical publications could also leave their mark in our web history. While working on this project with Leila I learnt to appraise, curate and archive contemporary websites using the Annotation Curation Tool (W3 ACT) tool.

Photo-1466386460451-cbc548bf581b

Before this week I had never come across the UK Web Archive and this experience has made me aware of just how important it is that we have access to this information in years to come. The online public archive is also an area with a large number of research points that I will definitely be using during any further study. When writing this I was asked what the ‘most interesting’ part of my placement was, however, it would be too hard to choose due to the amount of things that I have learnt during this week that I had never encountered before. Overall, my experience at the British Library was an enriching one that I will never forget, and helped me consider an aspect of our online life that had never occurred to me before.

11 May 2018

Online Hours: Supporting Open Source

Encouraging collaboration
Here at the UK Web Archive, we're very fortunate to be able to work in the open, with almost all code on GitHub. Some of our work has been taken up and re-used by others, which is great. We’d like to encourage more collaboration, but we've had trouble dedicating time to open project management, and our overall management process and our future plans are unclear. For example, we've experimented with so many different technologies over the years that our list of repositories give little insight into where we're going next. There are also problems with how issues and pull-requests have been managed: often languishing unanswered, waiting for us to get around to looking at them. This also applies to the IIPC repositories and other projects we are involved in, as well as the projects we lead.

I wanted to block out some time to deal with these things promptly, but also to find a way of making it a bit more, well, fun. A bit more social. Some forum where we can chat about our process and plans without the formality of having to write things up.

Taking inspiration from Jason Scott live-streamed CD-ripping sessions, we came up with the idea of something like Office Hours for Open Source -- a kind open open video conference or live stream, where we'll share our process, discuss issues relating to open source projects and have a forum where anyone can ask questions about what we’re up to.

Who is this for?
All welcome, from lurkers to those brimming with burning questions. Just remember that being *kind* beats being right.

Furthermore, if anyone else who manages open source projects like ours is also welcome to join and take the lead for a while! I can only cover the projects we’re leading, but there are many more that would be interesting to hear from.

When?
The plan is to launch the first Online Hours session on the 22nd of May, and then hold regular weekly slots every Tuesday from then on. We may not manage to run it every single week, but if it’s regular and frequent that should mean we can cope more easily with missing the odd one or two.

On the 22nd, we will run two sessions - one in the morning (for the west-of-GMT time-zones) and one in the evening (for the eastern half). Following that, we intend to switch between the two slots, making each a.m. and p.m. slot a fortnightly occurrence.

How?
The sessions will be webcast with a slack channel available for chat. See the IIPC Trello board for more information.

The IIPC (International Internet Preservation Consortium) have kindly agreed to help support this event and further Online Hours sessions. Running this initiative in a more open manner should raise the profile of our open source work both inside and outside of the IIPC, and encourage greater adoption of, and collaboration around, open source tools.

For full details, see the IIPC Trello Board card or ask a question in the NetPreserve Slack Channel #oh-sos (ask @NetPreserve to join the Slack).

See you there!

By Andrew Jackson, Web Archive Technical Lead, The British Library

 

04 May 2018

Star Wars in the Web Archive

May the fourth be with you!

It's Star Wars day and I imagine that you are curious to know which side has won the battle of the UK web space?

Looking at the trends in our SHINE dataset (.uk websites 1996-2013 collected by Internet Archive) I first looked at the iconic match-up of Luke vs Darth.

Shine-darth-vader

Bad news, evil seems to have won this round mainly, it seems, due to the popularity of Darth Vader costume mentions on retail websites.

How about a more general 'Light Side vs Dark side'? 

Shine-lightside-v-darkside

It appears that discussing the 'dark side' of many aspects in life is a lot more fun and interesting than the 'light side'. 

How about just analysing the phrase 'may the force be with you'?

Shine-may the force be with you

This phrase doesn't seem to have been particularly popular on the UK web until it started to be used a lot on websites offering downloadable ringtones. Go figure.

Try using the trends feature on this dataset yourself here: www.webarchive.org.uk/shine/graph

Happy stars wars day!

by Jason Webber, Web Archive Engagement Manager, The British Library

@UKWebArchive

 

01 February 2018

A New Playback Tool for the UK Web Archive

We are delighted to announce that the UK Web Archive will be working with Rhizome to build a version of pywb (Python Wayback) that we hope will greatly improve the quality of playback for access to our archived content.

What is playback of a web archive?

When we archive the web, just downloading the content is not enough. Data can be copied from the web into an archive in a variety of ways, but to make this archive actually accessible takes more than just opening downloaded files in a web browser. Technical details of pages and scripts coming out of the archive need to be presented in a way that enables them to work just like the originals, although they aren’t located on their actual servers anymore. Today’s web users have come to expect interactive features and dynamic layouts on all types of websites. Faithfully reproducing these behaviors in the archive has become an increasingly complex challenge, requiring web archive playback software that is on-par with the evolution of the web as a whole.

Why change?

Currently, we use the OpenWayback playback system, originally developed by the Internet Archive. But in more recent years, Rhizome have led the development of a new playback engine, called pywb (Python Wayback). This Python toolkit for accessing web archives is part of the Webrecorder project, and provides a modern and powerful alternative implementation that is being run as an open source project. This has led to rapid adoption of pywb, as the toolkit is already being used by the Portuguese Web Archive, perma.cc, the UK National Archives, the UK Parliamentary Archive, and a number of others.

Open development
To meet our needs we need to modify pywb, but as strong believers in open source development, all work will be in the open, and wherever appropriate, we will fold the improvements back into the core pywb project.

If all goes to plan, we expect to contribute the following back to pywb for others to use:

Other UKWA-specific changes, like theming, implementing our Legal Deposit restrictions, and deployment support, will be maintained separately.

Initially we will work with Rhizome to ensure our staff and curators can access our archived material via both pywb and OpenWayback. If the new playback tool performs as expected  we will move towards using pywb to support public access to all our web archives.

23 January 2018

Archiving the UK Copyright Literacy blog

By Louise Ashton, Copyright & Licensing Executive, The British Library
Re-posted (with permission) from copyrightliteracy.org/

We were excited to discover recently that copyrightliteracy.org had been selected for inclusion in the UK Web Archive as an information resource with historical interest. However, even we faced some trepidation when considering the copyright implications of allowing archiving of the site (i.e. not everything on the site is our copyright). Firstly, this allowed us to get our house in order, contact our fellow contributors and ensure we had the correct re-use terms on the site (you can now see a CC-BY-SA licence at the footer of each web page). Secondly, this provided opportunity for another guest blog post and we are delighted that Louise Ashton who works in the Copyright & Licensing Department at The British Library has written the following extremely illuminating post for us. In her current role Louise provides copyright support to staff and readers of the British Library, including providing training, advising on copyright issues in digitisation projects and answering copyright queries from members of the public on any of their 150 million collection items!  Prior to this, Louise began her career in academic libraries, quickly specialising in academic liaison and learning technologist roles. 

Screenshot-beta-home-01

When people think of web archiving their initial response usually focuses on the sheer scale of the challenge. However another important issue to consider is copyright; copyright plays a significant role both in shaping web archives and in determining if and how they can be accessed. Most people in the UK Library and Information Science (LIS) sector are aware that in 2013 our legal deposit legislation was extended to include non-print materials which, as well as e-books and online journal articles, also covers websites, blogs and public social media content. This is known as the snappily titled ‘The Legal Deposit Libraries (Non-Print Works) Regulations 2013’ and is enabling the British Library and the UK’s five other legal deposit libraries to collect and preserve the nation’s online life. Indeed, given that the web will often be the only place where certain information is made available the importance of archiving the online world is clear.

UKWA-poster
UK Web Archive poster © British Library Board

What is less well known is that, unless site owners have given their consent, the Non-Print Legal Deposit Archive is only available within the reading rooms of the legal deposit libraries themselves and even then can only be accessed if using library PCs. Although this mirrors the terms for accessing print legal deposit, because of the very nature of the non-print legal deposit collection (i.e. websites that are generally freely available to anyone with an internet connection) people naturally expect to be able to access the collection off-site. The UK Web Archive offers a solution to this by curating a separate archive of UK websites that can be freely viewed and accessed online by anyone, anywhere, and with no need to travel to a physical reading room. The purpose of the UK Web Archive is to provide permanent online access to key UK websites with regular snapshots of the included websites being taken so that a website’s evolution can be tracked. There are no political agendas governing which sites are included in the UK Web Archive, the aim is simply to represent the UK’s online life as comprehensively and faithfully as possible (inclusion of a site does not imply endorsement).

However, a website will only be added to the (openly-accessible) UK Web Archive if the website owners’ permission has been obtained and if they are willing to sign a licence granting permission for their site to be included in the Archive and allowing for all versions of it to be made publically accessible. Furthermore, the website owner also has to confirm that nothing in their site infringes the copyright or other intellectual property rights of any third party and if their site does contain third party copyright, that they are authorised to give permission on the rights-holders’ behalf. Although the licence has been carefully created to be as user-friendly as possible the presence of any formal legal documentation is often perceived as intimidating. So even if a website owner is confident that their use of third party content is legitimate they may be reluctant to formally sign a licence to this effect – seeing it in black and white somehow makes it more real! Or, despite best efforts, site owners may have been unable to locate the rights-holders of third party content used in their site and although they may have been happy with their own risk assessments, this absence of consent negates them from being able to sign the licence to include the site in the UK Web Archive.

For other website owners this may be the first time they have thought about copyright. Fellow librarians will not be surprised to hear that some people are bewildered to learn that they may have needed to obtain permission to borrow content from elsewhere on the internet for use in their own sites! And then of course there are the inherent difficulties in tracking down rights-holders more generally; unless sites are produced by official bodies it can be difficult to identify who the primary site owners are and in big organisations the request may never make it to the relevant person. Others may receive the open access request but, believing it to be spam, ignore it. And of course site owners are perfectly entitled to refuse the request if they do not wish to take part. Information literacy plays its part and for sites where it is crucial that site visitors access the most recent information and advice (for example websites giving health advice) then for obvious reasons the site owners may not wish for their site to be included.

The reason Jane and Chris asked me to write this blog post is because the UK Copyright Literacy website has been selected for potential inclusion in the UK Web Archive. It was felt important that the Archive should contain a site that documented and discussed copyright issues given that copyright and online ethics are such big topics at the moment, particularly with the new General Data Protection Regulations coming into force next May. Another reason why the curators wanted to include the Copyright Literacy blog is, given that the website isn’t hosted in the UK and therefore does not have a UK top level domain (for example .uk or .scot), it had never been automatically archived as part of the annual domain crawl. This is an unfortunate point which affects many websites as it means that many de facto UK sites are not captured unless manual intervention occurs. To try and minimise the number of UK websites that unwittingly evade inclusion, the UK Web Archive team therefore welcomes site nominations from members of the public. Consequently, if you would like to nominate a site to be added to the archive, and in doing so perhaps help to play a role in preserving UK websites, you can do so via https://www.webarchive.org.uk/ukwa/info/nominate.

As a final note, we are pleased to report that Jane and Chris have happily agreed to their site being included which is great news as it means present day copyright musings will be preserved for years to come!

22 December 2017

What can you find in the (Beta) UK Web Archive?

We recently launched a new Beta interface for searching and browsing our web archive collections but what can you expect to find?

UKWA have been collecting websites since 2005 in a range of different ways, using changing software and hardware. This means that behind the scenes we can't bring all of the material collected into one place all at once. What isn't there currently will be added over the next six months (look out for notices on twitter). 

What is available now?

At launch on 5 December 2017 the Beta website includes all of the websites that have been 'frequently crawled' (collected more often than annually) 2013-2016. This includes a large number of 'Special Collections' selected over this time and a reasonable selection of news media.

DC07-screenshot-brexit

What is coming soon?

We are aiming to add 'frequently crawled' websites from 2005-2013 to the Beta collection in January/February 2018. This will add our earliest special collections (e.g. 2005 UK General Election) and should complete all of the websites that we have permission to publicly display.

What will be available by summer 2018?

The largest and most difficult task for us is to add all of the websites that have been collected as part of the 'Legal Deposit' since 2013. We do a vast 'crawl' once per year of 'everything' that we can identify as being a UK website. This includes all .UK (and .london, .scot, .wales and .cymru) plus any website we identify as being on a server in the UK. This amounts to tens of millions of websites (and billions of individual assets). Due to the scale of this undertaking we thank you for your patience.

We would love to know your experiences of using the new Beta service, let us know here: www.surveymonkey.co.uk/r/ukwasurvey01

 By Jason Webber, Web Archive Engagement Manager, The British Library