29 January 2015
Picaguess: a prototype crowdsourcing app from the British Library Big Data Experiment
The British Library Big Data Experiment is an ongoing collaboration between British Library Digital Research and UCL Department of Computer Science (UCLCS), facilitated by UCL Centre for Digital Humanities (UCLDH), that enables and engages students in computer science with humanities research issues as part of their core assessed work.
All taught undergraduate and postgraduate programmes in UCLCS require students to undertake an industry exchange where they work in teams as clients to an industry partner. Though UCLCS has experience of developing student projects in partnership with digital humanists, industry partners have tended to come from the financial or manufacturing sectors. The British Library Big Data Experiment is an umbrella for a series of activities where the British Library is the client for assessed UCLCS project work, allowing for a rolling, responsive program of experimental design, development, and testing of infrastructure and systems. Those wanting to find out more about the British Library Big Data Experiment should look out for our forthcoming poster at DH2015 (which will - of course - be posted online).
The latest project to come out of this collaboration is Picaguess, a web and Android application developed by Jonathan Lloyd, Meral Sahin, and Divya Surendran, all of whom are studying for an MSc at UCLCS. Picaguess is an image guessing game that examines your play to help the British Library learn about our digital collections. It uses a Draw Something like mechanic to enable structured linking between the one million Public Domain book illustrations the British Library released onto Flickr in December 2013. Initially the only information we knew about these images was their size and in which book and on which page they appeared. Over time the community has added tens of thousands of semantic tags to the collection, effort which has proven enormously valuable in deepening our understanding of the collection (a dataset for all tags from December 2013 to December 2014 is available for on Figshare for unrestricted use and reuse). However this sort of free-text tagging can introduce problems and idiosyncrasies. PicaGuess aims to compliment the Flickr tag data by using a set of category words to drive community tagging. These category words are both of a descriptive and a more abstract variety (so 'dark', 'tender', and 'labour' as well as the usual 'map', 'building', 'person') and underpin two user distinct interactions with the collection. These are:
- creating sets of four images representative of a category words. The user chooses from three category words, then browses through the image set to find suitable images for that word.
- guessing the category word that another user has assigned to a four image set. The user is presented with the four images and a selection of letters (with some extra letters thrown in to make it more tricky) from which the category word can be spelt. If stuck users can ask for a hint or give up all together.
At the back end, confidence scores for the relationship between a word and an image are updated based on the time it takes a user to solve a game, whether or not they need a hint to do so, and whether they fail to do so. Together then these two simple games provide a platform for the distributed determination of links between illustrations where the parameters can be controlled by the collection 'owner' and rich data accompany those links.
You can find PicaGuess at picaguess.herokuapp.com, from where you can download the associated Android application. All code from the project is available for inspection and reuse on GitHub and periodically we hope to release the confidence scores for use and reuse.
In line with the spirit of the British Library Big Data Experiment, PicaGuess is not an official British Library service but very much a prototype that we hope can be improved and refined over time. And so we encourage you to share, to comment (here or via an email to [email protected]), and to build on the hard work, creativity, and achievements of the Jonathan, Meral, and Divya.
Next up for the British Library Big Data Experiment is a machine learning project. More here when there is something to share.
James Baker, Curator, Digital Research
Great to see this kind of experimentation. I'm not sure what kind of feedback is useful on the prototype itself, but a few of points:
* It's a shame that (in the web app at least) you can't just keep playing - each time you complete a puzzle you get thrown back to the initial screen meaning an extra click to 'Play the game' each time, rather than just getting another puzzle to complete
* The abstract concepts are often very hard to guess from pictures
* As soon as you know a category exists it becomes much easier to solve the puzzles. I suspect this is particularly true with the abstract concepts - once I know 'Tender' is a category I start looking for this in the letters available - thus in this case my ability to solve the puzzle and speed of solving the puzzle is much less strongly related to my interpretation of the pictures - for example I just had one where the category 'tender' was illustrated with four (pretty random) pictures featuring women. I would not classify these as relating to 'tender', but I got the answer quickly because I saw the letter combination
* Perhaps inevitable at this stage but in a very brief go at the game I saw the same puzzle multiple times - perhaps something needs to track what I've seen and not repeat it to me - at least within a single game