THE BRITISH LIBRARY

UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

11 May 2018

Online Hours: Supporting Open Source

Encouraging collaboration
Here at the UK Web Archive, we're very fortunate to be able to work in the open, with almost all code on GitHub. Some of our work has been taken up and re-used by others, which is great. We’d like to encourage more collaboration, but we've had trouble dedicating time to open project management, and our overall management process and our future plans are unclear. For example, we've experimented with so many different technologies over the years that our list of repositories give little insight into where we're going next. There are also problems with how issues and pull-requests have been managed: often languishing unanswered, waiting for us to get around to looking at them. This also applies to the IIPC repositories and other projects we are involved in, as well as the projects we lead.

I wanted to block out some time to deal with these things promptly, but also to find a way of making it a bit more, well, fun. A bit more social. Some forum where we can chat about our process and plans without the formality of having to write things up.

Taking inspiration from Jason Scott live-streamed CD-ripping sessions, we came up with the idea of something like Office Hours for Open Source -- a kind open open video conference or live stream, where we'll share our process, discuss issues relating to open source projects and have a forum where anyone can ask questions about what we’re up to.

Who is this for?
All welcome, from lurkers to those brimming with burning questions. Just remember that being *kind* beats being right.

Furthermore, if anyone else who manages open source projects like ours is also welcome to join and take the lead for a while! I can only cover the projects we’re leading, but there are many more that would be interesting to hear from.

When?
The plan is to launch the first Online Hours session on the 22nd of May, and then hold regular weekly slots every Tuesday from then on. We may not manage to run it every single week, but if it’s regular and frequent that should mean we can cope more easily with missing the odd one or two.

On the 22nd, we will run two sessions - one in the morning (for the west-of-GMT time-zones) and one in the evening (for the eastern half). Following that, we intend to switch between the two slots, making each a.m. and p.m. slot a fortnightly occurrence.

How?
The sessions will be webcast with a slack channel available for chat. See the IIPC Trello board for more information.

The IIPC (International Internet Preservation Consortium) have kindly agreed to help support this event and further Online Hours sessions. Running this initiative in a more open manner should raise the profile of our open source work both inside and outside of the IIPC, and encourage greater adoption of, and collaboration around, open source tools.

For full details, see the IIPC Trello Board card or ask a question in the NetPreserve Slack Channel #oh-sos (ask @NetPreserve to join the Slack).

See you there!

By Andrew Jackson, Web Archive Technical Lead, The British Library

 

04 May 2018

Star Wars in the Web Archive

May the fourth be with you!

It's Star Wars day and I imagine that you are curious to know which side has won the battle of the UK web space?

Looking at the trends in our SHINE dataset (.uk websites 1996-2013 collected by Internet Archive) I first looked at the iconic match-up of Luke vs Darth.

Shine-darth-vader

Bad news, evil seems to have won this round mainly, it seems, due to the popularity of Darth Vader costume mentions on retail websites.

How about a more general 'Light Side vs Dark side'? 

Shine-lightside-v-darkside

It appears that discussing the 'dark side' of many aspects in life is a lot more fun and interesting than the 'light side'. 

How about just analysing the phrase 'may the force be with you'?

Shine-may the force be with you

This phrase doesn't seem to have been particularly popular on the UK web until it started to be used a lot on websites offering downloadable ringtones. Go figure.

Try using the trends feature on this dataset yourself here: www.webarchive.org.uk/shine/graph

Happy stars wars day!

by Jason Webber, Web Archive Engagement Manager, The British Library

@UKWebArchive

 

01 February 2018

A New Playback Tool for the UK Web Archive

We are delighted to announce that the UK Web Archive will be working with Rhizome to build a version of pywb (Python Wayback) that we hope will greatly improve the quality of playback for access to our archived content.

What is playback of a web archive?

When we archive the web, just downloading the content is not enough. Data can be copied from the web into an archive in a variety of ways, but to make this archive actually accessible takes more than just opening downloaded files in a web browser. Technical details of pages and scripts coming out of the archive need to be presented in a way that enables them to work just like the originals, although they aren’t located on their actual servers anymore. Today’s web users have come to expect interactive features and dynamic layouts on all types of websites. Faithfully reproducing these behaviors in the archive has become an increasingly complex challenge, requiring web archive playback software that is on-par with the evolution of the web as a whole.

Why change?

Currently, we use the OpenWayback playback system, originally developed by the Internet Archive. But in more recent years, Rhizome have led the development of a new playback engine, called pywb (Python Wayback). This Python toolkit for accessing web archives is part of the Webrecorder project, and provides a modern and powerful alternative implementation that is being run as an open source project. This has led to rapid adoption of pywb, as the toolkit is already being used by the Portuguese Web Archive, perma.cc, the UK National Archives, the UK Parliamentary Archive, and a number of others.

Open development
To meet our needs we need to modify pywb, but as strong believers in open source development, all work will be in the open, and wherever appropriate, we will fold the improvements back into the core pywb project.

If all goes to plan, we expect to contribute the following back to pywb for others to use:

Other UKWA-specific changes, like theming, implementing our Legal Deposit restrictions, and deployment support, will be maintained separately.

Initially we will work with Rhizome to ensure our staff and curators can access our archived material via both pywb and OpenWayback. If the new playback tool performs as expected  we will move towards using pywb to support public access to all our web archives.