Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

25 September 2020

The World of Food and the UK Web Archive

By Helena Byrne, Curator of Web Archives at the British Library

Assorted sliced fruits in white ceramic bowl surrounded by more sliced fruits and some small muffins

A variety of food

Food is a subject that transcends culture, politics and leisure practices. Thus, food has always been a key part of the UK Web Archive (UKWA) since it was established in 2005.

Recipes, restaurant menus, food blogs, online reviews are just the start of food related online material that UKWA collects. Even protest and campaigning can be food related, for instance, this summer, footballer Marcus Rashford highlighted the issue of child poverty and the lack of access to food, especially during the school holidays.

For the last three years the British Library has been running a series of events around food. Due to the coronavirus pandemic, this year's Food Season moved online with a series of talks over the autumn period.

The Food Season celebrates the British Library’s extensive food-related collections and explores the politics, pleasures and history of food. UKWA, which is a partnership of the six UK Legal Deposit Libraries, including the British Library, also has an extensive collection of food related websites.

Food collections

In 2017, the Food Archive collection was established. This collection covers the following topics:

There are currently 333 websites or web pages in this collection. Some of the websites selected include Eat Like a Girl, the Good Grub Club and the Veggies Catering Campaign. Why not have a browse through the collection and nominate your favourite UK published food sites or restaurant websites to be included in the collection? Anyone can nominate a website by following this link: https://www.webarchive.org.uk/en/ukwa/info/nominate

Even though there is a dedicated collection about food, it also features as a subsection in a number of other collections. ‘Food and Drink’ is a subsection in both the Festivals and Online Enthusiast Communities in the UK collections. In addition, individual food websites appear in several other collections. Websites related to food activism appear in both the Political Action and Communication collection as well as the (soccer) fan subsection of the Sport: Football Collection, as numerous supporters clubs have organised to support their local food banks.

Social media is a very popular way to share food and micro-reviews of eateries, however, this is often challenging for us to archive. At present, Twitter is the only social media platform that we archive on a regular basis but these captures are by no means comprehensive. We have experimented with other methods of archiving social media but this is on a selective basis.

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

Some of the websites in UKWA that have already had permission granted, these include the Cake Fest Edinburgh, the Lancashire Pork Pie Appreciation Society and the Food Research Collaboration. Some examples of websites that are onsite-only access include the Biscuit Appreciation Society, the UK Menu Archive and Fans Supporting Food Banks.

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

We would especially like to see more food and drink nominations that reflect the multicultural nature of the UK and the many diaspora communities based here. Browse through what we have so far and please nominate more content here:

https://www.webarchive.org.uk/en/ukwa/info/nominate

Posted by Helena Byrne at 8:37 AM

Tags

Collections, Contemporary Britain, Crowdsourcing, Food and Drink, Humanities, Legal deposit, Web/Tech

Technorati Tags: Collection Building, Food, Food & Drink, Food Season, UK Legal Deposit, Web Archiving

17 September 2020

Arnhem75 - a special collection of websites added to the UK Web Archive

By Marja Kingma, Curator of Germanic Collections, the British Library.

Book cover of 75 Years Battle of Arnhem by Laurens van Aggelen

Introduction

The idea to create a collection of websites about the commemoration of Arnhem75 came to RAF Museum historian Harry Raffal and myself whilst attending the seminar ‘The Arnhem Spirit - 75 years of Brits in Arnhem’, on 15 May 2019, organised by the Dutch Embassy in London. The event was part of a programme in which the Netherlands, Britain and other former Allied countries commemorated Operation Market Garden, the code name for the battle for the bridge across the Rhine at Arnhem that took place in September 1944. Allied forces consisted of British, American and Polish troops, with help from Dutch resistance.

The Battle of Arnhem 1944 is of great significance to the UK and interest in it remains strong on both sides of the North Sea.

We wanted to create a lasting memory of these events and a special collection in the UK Web Archive on the subject seemed like a good idea.

What is included?

We kept the scope of the project quite narrow; only websites with a focus on the commemorations that took place in Britain and the Netherlands in 2019 are included, with the exception of some websites that deal with the historic facts regarding the Battle to give it some context.

So far over 150 individual websites within the UK web domain have been identified, of which 64 were selected to go into the collection. These sites are limited to the UK web domain, so have .uk in their domain name, or if they don’t must be hosted in the UK, or owned by UK organisations or individuals with a postal address in the UK.

Some of the websites selected for this collection include the 23 Parachute Field Ambulance, Airborne at the Bridge and Arnhem Oosterbeel War Cemetary.

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the UK Legal Deposit Libraries reading rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

For this collection you can view what has been selected through the UK Web Archive website but will need to visit a UK Legal Deposit Library reading room to view the archived content. The reading rooms across the Legal Deposit Libraries are starting to reopen now, with some restrictions, as you can read in this blog: https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

How Can I Get Involved?

You can help expand this collection by sending us a URL you think may be eligible for inclusion in the collection Arnhem75. Please go to https://www.webarchive.org.uk/en/ukwa/info/nominate to nominate a website and we’ll take it from there.

Occasionally websites from non UK domains can be included, if they have a strong link to the UK and the website owners have given their permission to be included in the collection. Dutch organisations that were involved in the Arnhem75 commemorations are encouraged to get in touch.

We look forward to your suggestions!

Posted by Helena Byrne at 10:35 AM

Tags

Collections, Humanities, Legal deposit, Modern history, Research collaboration, Web/Tech

Technorati Tags: Arnhem 75, Operation Market Garden, UK Legal Deposit, UK NPLD, Web Archiving, World War II, WWII

10 September 2020

Launching the UK Web Archive 2020 Annual Domain Crawl

By Helena Byrne, Curator of Web Archives at the British Library

Today (10th September 2020) the UK Web Archive team will be pushing the big red button to kickstart the annual Domain Crawl of the UK webspace. The current coronavirus pandemic will no doubt feature strongly in this year’s crawl. This will complement the curated collection that the web archive teams across the UK Legal Deposit Libraries are contributing. The British Library along with the National Library of Scotland are also selecting websites for the International Internet Preservation Consortium (IIPC) Content Development Group (CDG) Novel Coronavirus (COVID-19) collection.

What we collect

The UK Web Archive has been archiving UK published websites on a selective basis since 2005 and in 2020 is celebrating #15YearsOfUKWA. Domain Crawl 2020 is the seventh that has taken place. It wasn’t till after the implementation of the Non-Print Legal Deposit Regulations (NPLD) in April 2013, that we were able to run a broad crawl over the UK webspace. This includes anything with a .uk or other UK geographic Top Level Domain (TLD) such as .scot, .cymru or .london etc. It also includes websites on other TLDs that have been registered in the UK or that have been manually selected.

NPLD came into effect on the 6th April 2013 and the British Library hosted a special event to launch the first Domain Crawl. This was widely covered in the national press and you can still watch back a short video from the event on The Guardian website.

How much data is collected in the Domain Crawl?

The Domain Crawl usually runs for three months of the year and each year starts at a different time of year to avoid seasonal biases. Roughly 5-10 million hosts (websites) are archived every year. However, the amount of data collected each year varies. Also the way the data is collected and stored over time changes. We compress the data we store and as technology develops the amount of data that can be compressed into one terabyte changes. Last year 63.7 TB of compressed data was collected bringing the total collected during Domain Crawls from 2013 to 2019 to 477.62 TB.

When can I view this content?

Due to the enormous amounts of data that is collected each year from the annual Domain Crawl and our Frequent Crawls, there is a significant lag from when the content is archived and made available through the UK Web Archive website. The Frequent Crawl data collected from 2013-2019 was 250.34 TB bringing the combined total to 727.96 TB of compressed data. To make searching content easier the website allows you search across all the Selectively Crawled content from 2005 to 2013 as well as the Frequent Crawl content from 2013 to 2017 and the Domain Crawl content 2013 to 2015.

Under the Non-Print Legal Deposit (NPLD) Regulations 2013, we can archive all UK published websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission.

Due to the NPLD Regulations, access to the archived content is a mix of open and onsite access. The ‘Viewable only on Library premises’ message on individual records indicates that you have to visit one of the six UK Legal Deposit Libraries. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

Follow the UK Web Archive on Twitter for the latest updates on the domain crawl and other web archiving activities!

Posted by Jason Webber at 11:19 AM

Tags

Contemporary Britain, Legal deposit, Web/Tech

08 September 2020

UKWA available in reading rooms again

By Jason Webber, Web Archive Engagement Manager, The British Library

Much of the UK Web Archive content is only available in the reading rooms of UK Legal Deposit Libraries as current legislation regulates access. All libraries were closed for many months during the COVID-19 lockdown, however, a phased reopening has begun.

Below is some basic information of what current access is available at Legal Deposit Libraries with links to more detail. Note opening times were correct at the time of publishing this article, library websites should be checked for current opening times.

British Library

www.bl.uk/visit/opening-hours

London, St Pancras

Tuesday – Saturday 11.00 – 15.00

Boston Spa

Tuesday – Friday 11.00 – 15.00

You’ll need to pre-book online for whatever you would like to see at the Library. At the moment you can pre-book:

National Library of Scotland

www.nls.uk/using-the-library/opening-hours

Edinburgh reading rooms

Our Edinburgh reading rooms have reopened to existing and new library card holders, on a pre-booked basis only, with revised opening hours. Readers must book and preorder items 24 hours in advance.

General Reading Room and Special Collections Reading Room:

Tuesday-Saturday, 10.00-16.00

Kelvin Hall

We anticipate that the Library at Kelvin Hall in Glasgow, will reopen around mid-September.

National Library of Wales

www.library.wales/visit/before-your-visit/opening-times

The Reading Room is open to the public with a restricted service. You will have to book your place online before your visit. For more details on this and to read strict guidelines regarding the nature of the restricted service and what is expected of you go to Guidelines on re-opening.

(Reading Rooms only)

Monday - Friday: 10:00-12.30 and 13.30-16.00

Saturday: Closed

Bodleian Library

www.bodleian.ox.ac.uk/using/reading-rooms

The Bodleian Libraries have begun a phased reopening to staff, students and Bodleian Reader Card holders.

To help us keep you safe, and make sure we follow government and University guidelines, you'll need to book your visit in advance.

Weston Library (and several others)

Monday - Friday 1000-1600

Cambridge University Libraries

https://www.lib.cam.ac.uk/full-opening-hours

Cambridge University Library is now open for limited services from Monday-Friday. Book a visit to view non-borrowable material in the Main reading room or a Special Collections reading room. Please read more about our phased reopening of the UL and Faculty and Departmental Libraries.

Monday-Friday 10:15 -15:45 for limited services.

Trinity College, Dublin

www.tcd.ie/library/opening-hours/

Library reading rooms are now open for current staff and students. Face coverings are required. "Click and Collect" items will now be delivered to Library buildings. Goldsmith Hall is no longer used for collections or returns.

Monday-Friday 0930-1700

Posted by Jason Webber at 12:47 PM

Tags

Web/Tech

25 August 2020

Cats vs Dogs on the Archived Web

By Helena Byrne, Curator of Web Archives at the British Library

Cats and dogs, two of the most popular pets in the world, have international days of celebration in August. The 8th August is International Cat Day and the 26th August is International Dog Day.

How popular are cats and dogs on the archived web?

Screenshot of the search results on Shine for Cat and Dog

One way to answer this question is to use the Shine Trends feature. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

Taking the Shine graph at face value, overall it would seem that cats are more popular on the archived .uk domain than dogs.

The graph shows the percentage of resources archived for each year. In some cases the largest peak on the graph doesn’t necessarily mean the most mentions for your search; this could be attributed to a larger amount of data archived for that particular year. However, when it comes to ‘Cats vs Dogs’, the largest peak for ‘Cat’ is the most popular year while the most popular year for ‘Dog’ is slightly below the peak in the graph. In 2005, there were almost 14.2 million mentions of ‘cat’ out of 331 million resources archived. While in 2012, there were almost 13 million mentions of ‘Dog’ out of 464 million resources archived that year.

It is not possible to view every archived resource attributed to the generated stats, but you can click on markers along the plotted graph and you will be supplied with a random sample of matching records for that year. The sample displays a sentence where the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

When we review the random sample for ‘Cat’ generated for 2005, we can see that very few of the references are to our furry friends; instead, the word “Cat” mostly refers to an abbreviation for catalogue (for shopping online). This reflects a lot of the changes in how the web is used and online shopping became more popular during this period. By looking through some of the other samples we can see the use of the term ‘CAT’ as an acronym for various different systems.

On the other hand, when we look at the sample results for ‘Dog’ in 2012, most of the results are about the animal or related products such as dog food and dog accessories.

Possible big data project

After reviewing the use of the term ‘Cat’ and ‘Dog’ can we really say that the animal-related variation is the most popular on the archived .uk domain?

A possible way to truly determine which family pet is the most popular would be through an in depth analysis of the .UK domain. Something similar to the project, ‘Mining the UK Web Archive for Semantic Change Detection’ run by the Alan Turing Institute, would provide more insight into which animal is more popular in this dataset.

This project identified words whose meaning has changed over time on the archived web. For example, when the word ‘tweet’ stopped being commonly referred to as the sound a bird makes and used more often to describe the message being sent through the social media platform Twitter.

Pierpaolo Basile, a visiting researcher at the Alan Turing Institute, used the same data that is behind Shine in his research project ‘Detecting semantic shift in large corpora by exploiting temporal random indexing’. You can watch a recording of a presentation about this research on the Alan Turing Institute YouTube channel.

What cats and dogs websites are in the UK Web Archive?

The general UK Web Archive and a number of curated collections on the Topics and Themes page of the website feature many animal-related websites, and a lot of these focus on cats and dogs. Although archiving social media is very challenging, we do have a wide selection of Twitter accounts in the archive. These include many cat persona profiles; from libraries to political cats. Some of the political cats included in the archive are Larry the Cat from 10 Downing Street and Palmerston from the Foreign Office. We haven’t come across any similar UK dog persona profiles so if you know of any please nominate them to be included in the UK Web Archive. However, there are other Twitter profiles that collect images of dogs such as Non-League Dogs. This profile is included in both the soccer section of our Sport: Football collection as well as our Online Enthusiast Communities in the UK collection.

Animal welfare websites are also well represented in our UK General Election series of collections dating from 2005 to 2019, as many publish political manifestos during the election period.

As mentioned in the International Owl Awareness Day blog post, the Online Enthusiast Communities in the UK curated collection has an Animal Related Hobbies subsection. Here you can find a number of cat and dog-related sites but we know there are many more out there. Why not nominate your favourite websites and forums?

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library. Some of the sites in the collection have already had permission granted, such as the Battersea Dogs & Cats Home, Cats Protection and Library Cat. Some examples of websites that are onsite-only access include Dogs Trust, Dog Forum and Purrs In Our Hearts Forum.

As the content of the UK Web Archive has mixed access, the message ‘Viewable only on Library premises’ will appear under the title if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Get involved with preserving cats and dogs online with the UK Web Archive

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. Animals and especially pets in the UK are an important aspect of our collective national culture and are represented in several collections across the UK Legal Deposit Libraries, including the UK Web Archive.

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations. Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

Browse through what we have so far and please nominate more content!

Posted by Helena Byrne at 3:12 PM

Tags

Collections, Contemporary Britain, Crowdsourcing, Social sciences, Web/Tech

Technorati Tags: Animals, Cats, Cats vs Dogs, Crowdsourcing, Dogs, Get Involved, Save a UK Website, Shine, UK Web Archive, Web Archiving

10 August 2020

Going for gold: exploring Olympic & Paralympic resources

By Helena Byrne, Curator of Web Archives, The British Library

Screenshot of the British Library website related to social science research and the Olympics/Paralympics during London 2012 https://www.webarchive.org.uk/wayback/en/archive/20120724080955/http://www.bl.uk/sportandsociety/index.html

Originally, Sunday 9^th August, 2020 would have been the closing ceremony of the Tokyo 2020 Summer Olympics and we would have been waiting for the start of the Paralympics. However due to the coronavirus pandemic most events big and small were either cancelled, went online or were postponed till 2021. Even though Tokyo 2020 was postponed until 2021, the symposium Documenting the Olympics & Paralympics, which was supposed to be a full day face-to-face event, went online. The event was a much shorter panel session, held via Zoom on the 19^th June, 2020.

This was a collaboration between the British Library, the International Centre for Sports History and Culture (ICSHC) at De Montfort University, and the British Society of Sports History (BSSH).

The event was organised not only because 2020 was supposed to be an Olympic and Paralympic year, but also because the UK Web Archive team at the British Library were celebrating two significant anniversaries. It is 15 years since the UK Web Archive was founded. It is also 10 years since the International Internet Preservation Consortium (IIPC) started Olympic and Paralympic collaborative web archive collections.

Presentations:

Laura Alexandra Brown, Northumbria University - The heritage of the Games: Interpreting urban change in Olympic host cities

Heather Dichter, De Montfort University - Finding Olympic history in non-sport archives

Robert McNicol, Librarian, Wimbledon Lawn Tennis Museum - Researching the Olympics/Paralympics at Wimbledon

Helena Byrne, Curator of Web Archives, British Library - Preserving the Olympics/Paralympics online

Summary:

A broad mix of physical, digitised and born digital resources were covered in the presentations. You can listen back to an audio recording of this symposium on the Sport in History Podcast. While the full abstracts and some of the PowerPoint slides are available on the British Library Research Repository. The official hashtag for the event on Twitter was, #ResearchingTheGames where you can catch up with the online discussions.

Laura Alexandra Brown from Northumbria University, discussed her experience of using archives in her research that primarily relates to architectural design and reuse from the perspective of the Olympic Games.

Heather Dichter from De Montfort University, discussed her experience of using non-sporting archives to research international sport and diplomacy. The aim of this presentation was to highlight to researchers that valuable resources can be also found in non-sporting archives as well as for archivists so that they can help researchers.

Robert McNicol the Librarian at Wimbledon Lawn Tennis Museum, reviewed the history of Wimbledon and the Olympics as well as discussed their collection policy around past and future Olympic and Paralympic Games.

Helena Byrne the Curator of Web Archives at the British Library, discussed the UK Web Archive collections related to the Olympics/Paralympics as well as their general sports collection policy. Along with the ongoing collaboration with the International Internet Preservation Consortium (IIPC).

Next event:

We are still planning to hold a face-to-face event at the British Library in July 2021. This will be a full day symposium with a social event planned after the presentations. This event is sponsored by the British Library, ICSHC at De Montfort University, BSSH and the School of Advanced Studies.

We will closely monitor the guidance on coronavirus and social gatherings. Nevertheless, we are hopeful that by next summer planned events can go ahead.

For more details follow the BSSH website, social media, the International Centre for Sports History and Culture (ICSHC) Twitter, the UK Web Archive Twitter as well as the #ResearchingTheGames hashtag on Twitter. Joining details will be posted online in spring 2021.

Posted by Helena Byrne at 1:20 PM

Tags

Collections, Contemporary Britain, Olympics, Research collaboration, Sports, Web/Tech

Technorati Tags: Archives and Sport, British Society of Sports History (BSSH), International Centre for Sports History and Culture (ICSHC), Olympics, Online Events, Paralympics, Sport, Sports Archives, Sports History, Tokyo 2020, Web Archives

04 August 2020

Attending my first IIPC General Assembly

By Carlos Lelkes-Rarugal, Assistant Web Archivist, The British Library

For some, a General Assembly isn’t a well-understood thing, I for one wasn’t entirely sure what it is exactly other than a meeting of sorts. As is often the case, a General Assembly allows the representative members of an organisation to meet in what is usually a once-yearly forum to talk about activities, express opinions, make recommendations, and discuss any other relevant news. More importantly, it allows members to reconnect.

I attended my first General Assembly in mid-June. The International Internet Preservation Consortium (IIPC) has held its annual meetings for over a decade, the organisation has been around for over 17 years and comprises of members from across the world. The British Library is a founding member of the IIPC, and the British Library is part of the UK Web Archive, which itself is a collaboration between the six UK Legal Deposit Libraries. I have worked at the UK Web Archive for over 3 years now, and this was the first time I attended an IIPC General Assembly.

How it came about

Every year, the IIPC hosts events, both virtual and in-person, bringing together IIPC members and non-members. But once a year, the General Assembly (GA) takes place (this is only for IIPC members), closely followed by the Web Archiving Conference (WAC) which is open to all. The GA and WAC are hosted by IIPC members and the places alternate between different parts of the world.

IIPC General Assembly and Web Archiving Conferences 2007-2021 Map

If you haven’t attended a WAC, I highly recommend it, as the hosting venues differ year to year (New Zealand 2018, Croatia 2019 and Luxemburg 2021) and the variety of talks and workshops available are a rich source of information, both for web archiving practitioners and researchers.

The British Library’s web archiving team do try to send representatives, such as our technical lead (Andrew Jackson), our Lead Curator of Web Archives (Nicola Bingham), and our Engagement & Liaison Officer (Jason Webber).

In 2019, I was fortunate enough to attend the WAC but I missed out on the GA as I had only signed up for the WAC; the GA is open to all members and not just their representatives. And, it seemed like 2020 would have been missed too; it was to be held in Montreal in June, but had to be cancelled due to the coronavirus pandemic. Planning for both the GA and the WAC falls on to the hosting institution and the IIPC Programmes & Communications Officer, Olga Holownia, who is based at the British Library and is part of the UK Web Archive team. Unfortunately, many months of planning were made redundant as Covid-19 hit. Initially, and very early on, both events were rescheduled, however as things worsened and uncertainty around international travel loomed, the decision was taken to suspend and eventually cancel the GA and WAC. Olga was then able to rapidly reschedule a virtual alternative, albeit somewhat trimmed. Re-jigging the GA was still no easy feat, even with Olga’s experience planning previous GAs and WACs.

One of the many positives about an online GA is the accessibility, other than the set fees members pay towards the IIPC, there was no additional cost to attend the zoom call. It also means more members can take part, whereas before, the travel costs to members might have been prohibitive and so not all members would have sent representatives. This time around, given the time differences of each member’s respective countries, two online calls were organised to accommodate for as many attendees as possible. Fortunately for me, this now meant that I could attend one of the calls.

A photo of a full conference room at the British Library at the IIPC Web Archiving Annual Conference in 2017

WAC at the British Library in 2017, photo by Olga Holownia

What is the purpose of the GA

Many members, see it as a very good opportunity for networking. The agenda for the online GA was substantially shortened as the call itself was two hours in length.

This was the agenda:

Introduction
PCO Report
IIPC Budget
New Consortium Agreement
Discretionary Programme Funding (DFP)
Tools Development Portfolio Update
Updates from Working Groups

The IIPC currently consists of 58 members, and membership is growing The members share a desire to better understand the preservation of websites by developing standards and tools; through collaboration on tool development and sustainability, transnational collections, information exchange, research initiatives, workshops, training, and so on.

With so much occurring throughout the year, not just within the IIPC but within each member’s organisation, it can be quite difficult to keep atop of the main developments. The GA gives you an opportunity to:

Highlight the work being done by the IIPC and its members
Learn about current and planned outreach
Find out about the progress on Portfolios, Working Groups and the IIPC members involved in leading and running different initiatives
Development on IIPC governance such as the updated Consortium Agreement
New opportunities for project funding and progress on past and currently funded projects
Strategy goals including tools development, preservation, training, research, and more.

My highlights

2020 really should have been a more celebrated year in web archiving because many institutions have reached significant milestones; the National Library of Spain was celebrating 10 years, the UK Web Archive is 15 years old this year and the web archiving programme at the Library of Congress is now 20 years old. And though it’s a shame we aren’t all able to gather and celebrate these achievements, we can still appreciate those milestones. Here are some of my highlights from the 2020 online GA:

Launched in February, the Content Development Group (CDG) Covid-19 collection has gained a lot of attention, not only have over 30 members contributed thousands of seeds, they have also been sharing information about their Covid-collecting activities. Understanding what type of content they collect, the tools used, and collaborations with institutions and researchers will give the web archiving community and researchers a good idea of different practices and approaches when building rapid response collections.
IIPC can offer funding support through their Discretionary Funding Programme
The Training Working Group founded in 2017, worked with the Digital Preservation Coalition to create training modules, which are now available on the IIPC website. Other modules will also be developed in the future.
Research Engagement Guidelines available on the IIPC website
Bibliotheca Alexandrina (BA) and the National Library of New Zealand (NLNZ) are working together to bring to the web archiving community a tool for scalable web archive visualisation: LinkGate. This is an IIPC funded project.
Another project that has wrapped up is the Jupyter Notebooks for web archives; led by the British Library’s Andrew Jackson, working with Tim Sherratt
Python Wayback project is being actively developed, to help migrate institutions off older playback tools such as Open Wayback. The UK Web Archive adopted this last year and it greatly improved the playback of our archived websites.
The National Library of Australia Web Archive is developing a variable crawler that can re-crawl of individual webpages of a certain domain, without having to crawl entire domain. This adaptive re-scheduler, called Chronicrawl looks very promising.

Final thoughts

It’s difficult to compare a virtual GA against a face-to-face alternative, as this was the first of its kind that I’ve attended and the first GA that I’ve ever attended. IIPC members are active and collaborate through many different channels; so, having the chance to meet other members in person can’t be replicated and superseded by an entirely virtual assembly as members rarely ever get a chance to see each other in person. However, the virtual meeting did allow for broader and increased participation and a lot of very interesting information was exchanged. I’m not sure what middle ground could be achieved, but the 2020 online GA was conceived in such a short a period and was pulled off so successfully, and it seems like the format could be emulated and perhaps developed further. I can’t imagine it fully replacing face-to-face meetups, but it’s great to know that it can be done online. Given the current situation and because of the direct and indirect pressures caused by the outbreak, I do feel fortunate that alternative methods of communication are being found and maintained and even sustained. Many thanks again to Olga for making it all happen, I look forward to the next event.

Posted by Helena Byrne at 6:56 PM

Tags

Contemporary Britain, Web/Tech

Technorati Tags: Event Update, First Impressions, IIPC General Assembly 2020, International Internet Preservation Consortium (IIPC), UK Web Archive, Web Archiving

Twit twoo: International Owl Awareness Day 2020

By Helena Byrne, Curator of Web Archives, The British Library

An illustration of four owls perched on a branch with the moonlight behind them

British Library digitised image from page 271 of "Madeline Power [A novel] https://www.flickr.com/photos/britishlibrary/11121066504

The 4th of August is International Owl Awareness Day. This is the perfect time to reflect on owl related content in the UK Web Archive.

There are five native species of owls’ resident year-round in the UK, namely the Tawny Owl, Barn Owl, Long-eared Owl, Short-eared Owl and Little Owl. Also, the Snowy Owl is an is an occasional winter visitor to the Outer Hebrides, Shetland and the Cairngorms in Scotland.

Owls online

We were wondering, out of these six owl species, which one is the most popular on the archived .uk domain?

A graph showing how many mentions the six owl species have on the archived .uk web

In order to answer this question, the Shine graph may prove useful. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

The most popular owl species referenced in the Shine dataset is the Barn Owl. Despite the curve in the graph being at its peak in 2011, the most popular year for the Barn Owl was 2012. This is because the graph shows the percentage of resources archived for each year and some years have more resources than others. In 2011 there were 66,034 of 288,809,412 archived resources that mention Barn Owl, while in 2012 there were 94,990 of 463,367,189 resources. These numbers are too big to review manually but by clicking at a single point on the graph, Shine will generate a random sample of up to 100 references to the search term. The sample displays a sentence were the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

Get creative with owls at the British Library

Video created by Carlos Lelkes-Rarugal, using Tawny Owl hoots recorded by Richard Margoschis in Gloucestershire, England (BL ref 09647). British Library digitised image from page 272 of "The Works of Alfred Tennyson, etc"

Curious about what some of these owls’ sound like? Our Assistant Web Archivist, Carlos Lelkes-Rarugal, designed some short animated videos using recordings from the British Library Sound Archive and images from the British Library Flickr account. You can view these on the UK Web Archive, Digital Scholarship and the Sound Archive’s Wildlife Department Twitter accounts.

The title for this blog post was inspired by the sound made by the Tawny Owl. This and other sounds can be experienced in the Sound Archive at the British Library which has over 2,500 recordings of owls from all over the world. You can hear a selection of some these recordings on the British Library, Sound & Vision blog.

The Digital Scholarship team have also put together a useful album of digitised illustrations of owls on the British Library Flickr account. Their latest blog post encourages you to use these images for various creative projects.

Get involved with preserving owls online with the UK Web Archive

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. The biodiversity of the UK is an important aspect of our collective national culture and is represented in several British Library collections including the UK Web Archive.

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations.

Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

We already have an Online Enthusiast Communities in the UK curated collection that features some owl related websites in the Animal related hobbies subsection. Browse through what we have so far and please nominate more content!

Posted by Helena Byrne at 9:45 AM

Tags

Collections, Contemporary Britain, Crowdsourcing, Digital scholarship, Legal deposit, Sound and vision, Web/Tech

Technorati Tags: Bird Watching, Crowdsourcing, Digitised Illustrations, International Owl Awareness Day 2020, Nature Spotting, Owls, Shine, Web Archiving, Wildlife, Wildlife Sounds