Web archiving: how to fit it in ? A workshop report
[A report by Nicola Johnson, Web Archivist at the British Library]
I attended a workshop “How to fit in – integrating a web archiving program in your organization” at the Bibliotheque Nationale de France, in Paris, 26th – 30th November 2012. It was sponsored by the International Internet Preservation Coalition.
The workshop was intended for curators, archivists and managers involved in (or about to embark on) web archiving at their institutions. The BnF has been archiving websites since late 1999 and has a vast amount of expertise. France was an early adopter of legal deposit for websites, with legislation in August 2006 meaning that websites from the French national domain can be collected by the BnF for preservation and public use. I was particularly interested in the transition that they have made to this large-scale operation, as Legal Deposit legislation is expected in the UK this April and we will have the task of integrating large scale archiving with our current selective undertaking.
Several IIPC member organisations attended the workshop, hosted in one of the four ‘towers of open books’ at the BnF’s main site. The Francois Mitterrand building was one of the grands projets of the former president and is one of the largest and most modern libraries in the world. Participants included the British Library, the national libraries of Germany, Slovenia, Estonia, Spain and the Netherlands. Also represented were the Bavarian State Library, the California Digital Library, the National Library and Archives of Quebec, the Bibliotheca Alexandrina and the Library of Congress. Participants represented a range of experience in web archiving and were at different stages of national legal deposit legislation.
A wide range of topics were covered, including the integration of web archiving in acquisition practices; the role of subject librarians in selecting websites; and how web collections should align with general collection development policies. As the business of web archiving involves several parts of a library, we also heard representatives of various departments at the BnF speak of their role, including IT, conservation, legal deposit, collections co-ordination and digital and bibliographic information. There were subject specialists from the music, literature and art departments, who spoke about their collection development policies and how to incentivise staff to select websites when they have a multitude of other duties to perform. Given my role as Web Archivist I was particularly interested in the role of the 70 or so curators or “recommending officers” who select websites for the focussed crawls undertaken by the BnF.
A presentation was also made by the Internet Memory Foundation, a non-profit institution based in Amsterdam and Paris. The foundation provides a shared platform for institutions to collect websites and is archiving dozens of terabytes of data every month. They are also involved in various research projects with institutions and are developing a new crawler and architecture for web-scale crawling. Later in the week we also had the opportunity to visit the National Audiovisual Institute (INA), a repository containing 70 years of French radio programmes and 60 years of TV. The INA shares responsibility for collecting legal deposit online content with BnF and began collecting broadcast-related websites in February 2009. It holds approximately 10,000 websites, employing multiple crawlers for different types of content. Access is available at six sites in France, but some material under open licence is available online.
Our hosts succeeded in creating an atmosphere that was relaxed and stimulating (see the pictures); a great many ideas were exchanged and the commonality of purpose among the participants was encouraging. I have returned to work with a renewed vigour and positivity towards web archiving and I know the other participants have after reading their messages after the event. Positive changes are being made in our respective institutions as a result of the workshop.
[Image of the BNF (Creative Commons BY-NC-SA) from Images et Voyages ]