Big UK Domain Data for the Arts and Humanities: working with the archive of UK web space, 1996–2013
In January 2014, the Institute of Historical Research, University of London (in partnership with the British Library, the Oxford Internet Institute and Aarhus University) was awarded funding by the Arts and Humanities Research Council for a project to explore ways in which humanities researchers could engage with web archives. The main aims of ‘Big UK Domain Data for the Arts and Humanities’ were to highlight the value of web archives for research; to develop a theoretical and methodological framework for their analysis; to explore the ethical implications of this kind of big data research; to train researchers in the use of big data; and to inform collections development and access arrangements at the British Library.
For the past 15 months the project team have been working with 10 researchers, drawn from a range of arts and humanities disciplines, to address these issues and particularly to develop a prototype interface which will make the historical archive (1996–2103) accessible. The researchers came armed with a range of fascinating questions, from analysing Euro-scepticism on the web to studying the Ministry of Defence’s recruitment strategy, from examining the history of disability campaigning groups and charities online to looking at Beat literature in the contemporary imagination. The case studies that they have produced demonstrate some of the challenges posed by the archived web, but also its value and significance. They are available from the project website.
Along the way, the project has produced not only one of the largest full-text indexes of web archive (WARC) files in the world, but also a sophisticated interface which supports complex query building and gives researchers the ability to create and manipulate corpora derived from the larger dataset.
This interface is accessible as a beta version. It opens up a fascinating range of options now that you longer need to know the URL of a vanished website in order to find it in the archive.
For those less familiar with the concept of web archives, we’ve also produced two short animations, ‘What is a Web Archive?’ and ‘What does the UK Web Archive collect?’. They’re both available under a CC-BY-NC-SA licence, so do please share!
Jane Winters
Professor of Digital History
Institute of Historical Research, School of Advanced Study, University of London
@jfwinters