Digital scholarship blog

Enabling innovative research with British Library digital collections


Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

03 July 2015

Turning research questions into computational queries: outputs from the 'Enabling Complex Analysis of Large Scale Digital Collections' project

Add comment Comments (0)

'Enabling Complex Analysis of Large Scale Digital Collections', a project funded by the Jisc Research Data Spring, empowers researchers to turn their research questions into computational queries and gathers social and technical requirements for infrastructures and services that allow computational exploration of big humanities data. Melissa Terras, Professor of Digital Humanities at UCL and Principal Investigator for the project, blogged in May about initial work to align our data - ALTO XML for 60k+ 17th, 18, and 19th century books - with the performance characteristics of UCL's High Performance Computing Facilities. We have been learning a huge amount about the complexities associated with redeploying architectures designed to work with scientific data (massive yet structured) to the processing of humanities data (not massive instead unstructured). As part of this learning, in June we ran two workshops to which we invited a small, hand-picked group of researchers (from doctoral candidates to mid-career scholars) with queries they wanted to ask of the data that couldn't be satisfied by the sort of search and discovery orientated graphical user interfaces typically served up them.

The researchers were clustered into three groups by their interests, with one group looking for words/strings over time, a second for words/strings in context, and a third for patterns relating to non-textual elements. Each group rotated between three workstations. At one workstation James Hetherington worked with them realise their questions as queries that returned useful derived data. At a second they collaborated with Martin Zaltz Austwick to explore and experiment with ways in which they could represent the data visually. And at a third workstation David Beavan captured their thoughts on the process (such as, does the time taken to wait for results to return impact on your interpretation of those results?), their sense of how computational queries could enrich their research, and their learning outcomes in terms of next steps.

Librarian books and occurrencesSome very sensible best practices emerged from this work: the need to build multiple datasets (counts of books per year, words per year, pages per book, words per book) to normalise results against in different ways; the necessity of explaining and clearly documenting the decisions taken when processing the data (for example taking the earliest year found in the metadata for a given book as the publication year, even if we know that to be incorrect); and the value of having a fixed, definable chunk of data for researchers to work with and explain their results in relation to (and in turn for us, the risks associated with adding more data to the pot at a later date).

Pointmap_largeMoreover, we have outputs on our Github repos that you can work with. We have queries (written in Python) that provide a framework from which you might search for words, phrases, or non-textual elements in this or comparable collections of digital text. We have data from searches across the whole collection on occurrences of disease related words, on the contexts in which librarians appear, and on the location and relative size in the page of every non-textual element (ergo, in most cases, illustration). And we have visualisations, with associated code and iPython Notebooks, of these results. These include a graph of disease references over time per 1000 words (an interactive version is available if you download this html and open it in your browser); a point map charting the size over time of circa 1 million figures (as a percentage of the size of the page the appear in); and, moving our macroscope closer, graphs that show the size of images across the length of single books, that map the illustrative 'heartbeat' of those books, alongside hacky workflow for getting to that point.

Diseases (WEB)The next step is to package these outputs up as 'recipe books' demonstrative of the steps needed to work with large and complex digital collections. We hope that the community - Systems Architects designing services, Research Software Engineers collaborating in humanities research, Humanists dabbling with data and code - can learn from these, build them into their workflows, and push forward our collective ability to make the best of these digital collections.

James Baker -- Curator, Digital Research -- @j_w_baker

19 June 2015

My Digital Rights: Where next for our Magna Carta for the digital age?

Add comment Comments (0)

This is a guest post by Sarah Shaw, the Magna Carta: My Digital Rights Project Manager: 

What a week! For the last few days I’ve been knee deep in interviews, blogs and social media – all to promote our Magna Carta for the digital age. In the run up to the 800th anniversary of Magna Carta, we took the 522 clauses we’d received from school students around the world, and asked the public to vote for their favourites. On the 15 June 2015, we revealed these and have captured this to act as a snapshot of how people felt about the future of the Web, 800 years after Magna Carta was sealed at Runnymede.

Learning magnacarta postcard

So what have we learnt? Well, 30,000 votes were cast and of those 72% saw people agreed with the clauses that had been submitted, indicating that the wider public agreed with many of the hopes presented by the young people who participated in the project. However, the main concern for young people was a need to feel safe and protected online, even if that resulted in curbing freedom of speech. The top 10 we captured on the 15 June shows a set of very different concerns for the wider public. Over half of the top 10 results call for freedom of speech online, free from government and corporate censorship; online safety doesn’t make the cut.

So here they are, the top 10 as voted for by the public between 8 and 15 June 2015:

1. The Web we want will not let companies pay to control it, and not let governments restrict our right to information.

2. The Web we want will allow freedom of speech.

3. The Web we want will be free from government censors in all countries.

4. The Web we want will not allow any kind of government censorship.

5. The Web we want will be available for all those who wish to use it.

6. The Web we want will be free from censorship and mass surveillance.

7. The Web we want will allow equal access to knowledge, information and current news worldwide.

8. The Web we want will have freedom of speech.

9. The Web we want will not be censored by the government.

10. The Web we want will not sell our personal information and preference for money, and will make it clearer if the company/ Website intends to do so.

It was a mammoth task sorting through the 522 clauses to get them ready for the big vote. All of the clauses were uploaded verbatim. The only changes we made were to correct grammar, typos and to remove exact duplications only. That’s why we’ve ended up with two clauses in the top 10 that are quite similar – The Web we want will allow freedom of speech and The Web we want will have freedom of speech. The fact that both of these have ended up in the top 10 implies to me that this was a very hot topic for our voters at this point in time.

So what’s next, now that we have this top 10? Well this was never intended to be a definitive set of rules for the future of the Web. So instead this will be kept online for the world to see, a snapshot of the public’s hopes for the Web as captured on the 800th anniversary of Magna Carta. You can continue to vote for your favourite clauses, and we hope that the top 10 will continue to evolve to rise up and meet the changing face of the Web. Our resources will stay online, allowing teachers to continue the debate in schools, and our fantastic exhibition is open until 1 September 2015.

So keep voting, keep thinking about what you want for the future of the Web and keep debating your online rights and responsibilities.


"Visual minutes" by Sandra Howgate, created for Magna Carta: My Digital Rights with the help of Year 9 girls from Maria Fidelis School at the Web We Want Festival, hosted by the Southbank Centre

16 June 2015

British Library Labs Competition 2015 - Winners Announced!

Add comment Comments (0)

Posted by Mahendra Mahey, BL Labs Manager on behalf of Adam Crymble and Katrina Navickas.

Aquiles Alencar-Brayner, Digital Curator in the Digital Research team announced the two winners of the third British Library Labs 2015 competition as part of his conference presentation on 'Digital Scholarship and its Impact on Latin American Studies' at the Seminar of the Aquisition of South American Library Materials (SALALM) held at Princeton University on the 15 June, 2015 at 1830 BST.

AcquilesAquiles Alencar-Brayner at Princeton University

A judging panel made up of leaders in Digital Scholarship, some who sit on the British Library Labs advisory board (Melissa Terras at University College London, Andrew Prescott at Kings College London, Tim Hitchcock at University of Sussex, David De Roure at the University of Oxford and Bill Thompson from the BBC and an observer Louise Denoon from State Library of New South Wales, Australia) and members of the British Library's Digital Scholarship team met at the end of May to decide upon two winners of this year's competition. After much deliberation, we can now proudly announce that the winners for the 2015 British Library Labs competition are 'Crowdsourcing Objects': repurposing the 1980s arcade console for scholarly image classification' and 'Political Meetings Mapper: bringing the British Library maps to life with the history of popular protest.'

'Crowdsourcing Objects': repurposing the 1980s arcade console for scholarly image classification

Dr Adam Crymble, Lecturer of Digital History, University of Hertfordshire

Twitter: @adam_crymble  Website:  

Is the web a liability for those seeking our attention for scholarly means? If we put an arcade game devoted to crowdsourcing in a public place, would you focus your eyes, walk over, and play? 

Adamcrymble arcadeCrowdsourcing arcades in public places  - images courtesy of Adam Crymble

The great promise of crowdsourcing is that it captures a little effort from lots of people. To facilitate this, crowdsourcing websites are accessible from anywhere with a wi-fi connection. We tend to see the web as a great asset because it can, in theory, reach just about anyone. And yet, we all have moments of downtime that we could devote to the greater good, classifying images or transcribing text. But we don't. These sites compete for our attention with videos of cats and endless emails, which too often win out over the chance to transcribe something. 

This project experiments with ‘crowdsourcing objects’, to replace the ubiquity of the crowdsourcing website with the scarcity of a physical machine. Inspired by the 'maker' community and physical computing, this project takes the crowdsourcing experience off the web and puts it into a 1980s-style arcade game, replete with joysticks and plastic shiny buttons. This old interface put to new uses acknowledges that people increasingly associate their computers with work, and by providing a digital experience that doesn't feel like a computer, we can tap into energy currently reserved for play.

This experiment will provide new knowledge about what motivates us to participate in crowd-generated data collection, drawing upon the increasingly important field of game studies within the digital humanities, as well as the powerful force of nostalgia for a generation of scholars who came of age with physical video game machines. In the process, this crowdsourcing object will encourage users to help build a catalogue of metadata related to the British Library’s one million image Flickr collection. By asking users to help classify a subset of the collection in ways that are easy for humans but difficult for computers, scholars will provide useful datasets for future research on visual culture. This new data will be made freely available to researchers, and will form the basis of future machine learning experiments in which the categorisations of the human gamers are used to ‘train’ machines to complete the classification process.

AdamcrymbleDr Adam Crymble is a lecturer of Digital History at the University of Hertfordshire. His research looks at the history of migration and integration through an analysis of large sets of digitised records. These records make it possible to discern trends in migration and conflict that are impossible to pick up on through close reading alone.

Adam is also actively involved in teaching digital history. He is an editor of The Programming Historian, an open access, peer-reviewed monograph that provides introductory digital history lessons to those looking to learn new ways to engage with the past. He is one of the convenors of the Digital History seminar at the Institute of Historical Research in London. In 2008 he published How to Write a Zotero Translator: A Practical Beginners Guide for Humanists

Political Meetings Mapper: bringing the British Library maps to life with the history of popular protest

Dr Katrina Navickas, Senior Lecturer in History, University of Hertfordshire

Twitter: @katrinanavickas  Website:

Protest is about space and place. Opposing political groups claim the uses of public spaces, meet in places that have long histories of protest or create their own sites as symbols of their identity. Newspapers are a main source of data on when and where historical protests and political meetings occurred, but historians have so far only been able to plot the locations of small numbers of political meetings manually.

The Political Meeting MapperPolitical Meetings Mapper - Images Courtesy of Katrina Navickas

Political Meetings Mapper will develop a tool to extract notices of meetings from historical newspapers and plot them on layers of historical maps in the British Library's collections. It will visualise the locations of political events in the crucial era of the 1830s and 1840s, when Chartism, the first and largest movement for democracy in Britain, held thousands of meetings and demonstrations to campaign for the vote. By plotting the meetings listed in the Chartist newspaper, The Northern Star, from 1838 to 1844, it hopes to discover new spatial patterns in where popular politics happened, and in so doing, help answer the questions of how and why it happened.

This project showcases the British Library's collections and combines them in a way not done before: the geo-referenced maps in the BL geo-referencer and Flickr commons, the Ordnance Surveyors' drawings, and the 19th century newspaper collection. It will develop a tool for text-mining and geo-locating the records of political meetings, and enable anyone to access the maps and data on an interactive website. It will then aim to make the tool eventually adaptable to enable scholars to plot any form of event and spatial information using historical texts and maps. Geographers and urban planners have long used digital methods in their research: this project seeks to show that historians aren’t being left behind in these developments. It seeks to generate new questions as well as answers that will lead to new research not just in history and heritage but also in the sociology of social movements and urban planning.

Political Meetings Mapper will also demonstrate the relevance and legacy of the history of democracy for today’s society. Regions, towns, even streets will find a longer sense of their political heritage, enabling them to find out what meetings or events occurred in their area, and therefore encourage a continued engagement with politics among local communities. 


Dr Katrina Navickas is a Senior Lecturer in History at the University of Hertfordshire, and Director of the Centre for Regional and Local History. Her main research interests are in the history of popular politics and protest in late 18th and early-mid 19th century Britain, particularly in the North of England. She experiments with GIS and mapping protests and meetings.

She is also a co-investigator on a BA/Leverhulme funded project led by Dr Robert Poole to recatalogue and digitise the Home Office disturbance papers at The National Archives.  She has published 'Loyalism and Radicalism in Lancashire, 1798-1815'  (Oxford Univ. Press, 2009) with her second title, 'Protest and the Politics of Space and Place, 1789-1848' due for release by Manchester University Press towards the end of 2015.