THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

29 January 2015

Picaguess: a prototype crowdsourcing app from the British Library Big Data Experiment

Add comment Comments (0)

The British Library Big Data Experiment is an ongoing collaboration between British Library Digital Research and UCL Department of Computer Science (UCLCS), facilitated by UCL Centre for Digital Humanities (UCLDH), that enables and engages students in computer science with humanities research issues as part of their core assessed work.

All taught undergraduate and postgraduate programmes in UCLCS require students to undertake an industry exchange where they work in teams as clients to an industry partner. Though UCLCS has experience of developing student projects in partnership with digital humanists, industry partners have tended to come from the financial or manufacturing sectors. The British Library Big Data Experiment is an umbrella for a series of activities where the British Library is the client for assessed UCLCS project work, allowing for a rolling, responsive program of experimental design, development, and testing of infrastructure and systems. Those wanting to find out more about the British Library Big Data Experiment should look out for our forthcoming poster at DH2015 (which will - of course - be posted online).

Pic

The latest project to come out of this collaboration is Picaguess, a web and Android application developed by Jonathan Lloyd, Meral Sahin, and Divya Surendran, all of whom are studying for an MSc at UCLCS. Picaguess is an image guessing game that examines your play to help the British Library learn about our digital collections. It uses a Draw Something like mechanic to enable structured linking between the one million Public Domain book illustrations the British Library released onto Flickr in December 2013. Initially the only information we knew about these images was their size and in which book and on which page they appeared. Over time the community has added tens of thousands of semantic tags to the collection, effort which has proven enormously valuable in deepening our understanding of the collection (a dataset for all tags from December 2013 to December 2014 is available for on Figshare for unrestricted use and reuse). However this sort of free-text tagging can introduce problems and idiosyncrasies. PicaGuess aims to compliment the Flickr tag data by using a set of category words to drive community tagging. These category words are both of a descriptive and a more abstract variety (so 'dark', 'tender', and 'labour' as well as the usual 'map', 'building', 'person') and underpin two user distinct interactions with the collection. These are:

  • creating sets of four images representative of a category words. The user chooses from three category words, then browses through the image set to find suitable images for that word.
  • guessing the category word that another user has assigned to a four image set. The user is presented with the four images and a selection of letters (with some extra letters thrown in to make it more tricky) from which the category word can be spelt. If stuck users can ask for a hint or give up all together.

At the back end, confidence scores for the relationship between a word and an image are updated based on the time it takes a user to solve a game, whether or not they need a hint to do so, and whether they fail to do so. Together then these two simple games provide a platform for the distributed determination of links between illustrations where the parameters can be controlled by the collection 'owner' and rich data accompany those links.

Pic2

You can find PicaGuess at picaguess.herokuapp.com, from where you can download the associated Android application. All code from the project is available for inspection and reuse on GitHub and periodically we hope to release the confidence scores for use and reuse.

In line with the spirit of the British Library Big Data Experiment, PicaGuess is not an official British Library service but very much a prototype that we hope can be improved and refined over time. And so we encourage you to share, to comment (here or via an email to digitalresearch@bl.uk), and to build on the hard work, creativity, and achievements of the Jonathan, Meral, and Divya.

Next up for the British Library Big Data Experiment is a machine learning project. More here when there is something to share.

James Baker, Curator, Digital Research

@j_w_baker

16 January 2015

Meet Rob Sherman in the Lines in the Ice gallery tomorrow

Add comment Comments (0)

Tomorrow Rob Sherman, the British Library’s current interactive writer-in-residence funded by CreativeWorks London The Eccles Centre for American Studies, will be hanging out  in the Lines in the Ice exhibition all day (well during building opening hours from 9:30am to 5pm - we don't make him sleep here!), talking to visitors about his work and answering questions. So if you will be in London and are interested in Rob's project, then please do drop in, information about visiting the Library is here.

RobshermanRob Sherman standing next to his installation in the Lines in the Ice exhibition

During his residency, Rob has created original art, both physical and digital, in response to the exhibition; including a book hand bound with support from the Library's Collection Care department (you can read about the making of the book in their blog here). This book was artificially aged to look like it is a real traveller's journal from the mid 19th Century and is on display in the gallery, you may have seen it and curiously looked through the pages!

6a00d8341c464853ef01bb07b3bf73970d
Isaak Scinbank's diary being artificially aged with a little help from Library conservators

 

Also in the exhibition is a "digital cairn", which is a piratebox, i.e. a little web server which can only be accessed in the exhibition itself and which, when explored using a wireless device, will reveal some of the secrets about the travels of a man called Isaak Scinbank between the years 1852 and 1853 (note this is a fictional character created by Rob, but a character whose narrative is related to the real historical accounts of the lost Franklin expedition).

Cairnsketch

Rob Sherman's sketch of a cairn

If you can't make it to the Library to meet the man himself, then please do take a look at Rob's fascinating research blog at http://onmywifesback.tumblr.com/

 

Stella Wisdom

Curator, Digital Research

@miss_wisdom

08 January 2015

Help #bldigital to help you do better digital research

Add comment Comments (0)

The Jisc Research Data Spring is a project that aims to find tools, software, and service solutions that will improve how researchers work, in particular how they use and manage data.

The British Library Digital Research team are confident that infrastructures that deliver flexible and scalable access to large digital collections as data can enable better research. Last year we spoke about this at Digital Humanities 2014 (Farquhar, Adam; Baker, James (2014): Interoperable Infrastructures for Digital Research: A proposed pathway for enabling transformation. figshare. dx.doi.org/10.6084/m9.figshare.1092550) and we continue to work with student teams at UCL Computer Science to experiment with platforms for access to and interrogation of British Library digital collections.

Building on these activities, we are involved in two initial project proposals for the Jisc Research Data Spring:

'Dissecting digital humanities data with biomedical tools' is a collaboration with the School of Social and Community Medicine, University of Bristol, and seeks to adapt DataSHIELD, originally developed to co-analyse numerical patient data from different sources without disclosing identity or sensitive information, to include a proof-of-concept for supporting a range of text analyses across datasets that present divergent challenges to access and interpretation. In short, whether the barrier to computational text analysis is ethical-legal, IP or licensing related, or just the physical size of the data this project will be a step towards helping you work across those data and derive from your analysis meaningful, high-level, comparative results. For more on DataSHIELD see 'DataSHIELD: taking the analysis to the data, not the data to the analysis' (2014).

'Enabling complex analysis of large scale digital collections'  is a collaboration with UCL Centre for Digital Humanities and UCL Research Computing that seeks to use the resources of the latter in combination with our data to investigate the needs and requirements of a service that would allow researchers to undertake complex searches of digital content. By enabling both the research community and the public to propose problems and taken an active role in understanding how those problems are translated into complex queries that UCL Research Computing could perform on the data, the project aims to generate better understanding of the demands in the processing of large scale cultural data and inform us about user requirements in reusing, analysing, and facilitating searches of digital content.

For these discrete but complimentary projects to become reality, we need your help. For only by you commenting on and voting for the projects on IdeaScale (head to pages for 'Dissecting digital humanities data with biomedical tools' and 'Enabling complex analysis of large scale digital collections' respectively) can they advance to the next stage and perhaps - eventually - secure substantial funding.

James Baker

Curator, Digital Research

@j_w_baker