Digital scholarship blog

Enabling innovative research with British Library digital collections

4 posts from April 2015

29 April 2015

The British Library Machine Learning Experiment

The British Library Big Data Experiment is an ongoing collaboration between British Library Digital Research and UCL Department of Computer Science, facilitated by UCL Centre for Digital Humanities, that enables and engages students in computer science with humanities research and digital libraries as part of their core assessed work.

The experiment plays host to undergraduate and postgraduate student projects that provide the Digital Research team with an experimental test-bed for developing, exploring and exploiting technical infrastructure and digital content in ways that may benefit humanities researchers. Enables Computer Science students to develop skills in a new (and often foreign) domain encourages critical thinking and questioning of their assumptions about the role of library and humanities scholars through real-world, complex projects that stretch and develop both their technical abilities and understanding of user requirements. Further, having Computer Science students engage with Humanities scholars as a routine part this work creates deeper mutual understanding of research needs and discipline specific practices.

The 'big data' in question here is a collection of circa 68k 16th – 19th century Public Domain digitised volumes. The data contains both optical character recognition derived text and over 1 million illustrations of which little is known apart from the size of the images and in which and on which page they appear (for more on the dataset see Ben O'Steen 'A million first steps').

The latest output from the project - the British Library Machine Learning Experiment - is led by a BSc systems engineering module team (Durrant, Rafdi, Sarraf). Together the team designed a public service built around a range of open source services and software (MongoDB, Heroku, Node.js, Weka). This services indexes a subset of the 1 million image collection using tags generated by two public image recognition APIs (Alchemy and Imagga) and a bespoke algorithm. Confidence values are returned and features implemented that allow users to not only search for tags but also browse by tag and by frequently co-occurring tags. The interface even allows a user to tag a random image themselves to see how quickly image recognition APIs can assign tags to images.

Screenshot 2015-04-07 15.30.46 - Copy

The British Library Machine Learning Experiment can be found at http://blbigdata.herokuapp.com/. A video demonstration detailing the service functionality is embedded below. It is clear from using the experimental service that machine learning approaches to image recognition remains a maturing field. Nevertheless, as was underscored by a British Library Labs event last year on large scale image analysis (see my notes from the event), significant advances have been made in recent years. Searches of the British Library Machine Learning Experiment for the tags 'animal', 'bird', or 'church' confirm this trend.

Code from the British Library Machine Learning Experiment is available for reuse under a MIT licence. As this project is very much an experiment, we welcome your feedback via this blog, an email, or GitHub.

Rafdi, Muhammad; Sarraf, Ali; Durrant, James; Baker, James (2015). British Library Machine Learning Experiment. Zenodo. 10.5281/zenodo.17168

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

17 April 2015

Looking back on Digital Conversations: A Web of Rights

In February, we welcomed Dr Martin Paul Eve, Jim Killock, Professor John Naughton, and Dr Joss Wright to the British Library to discuss how and in what ways the web has complicated, enhanced, and changed the rights of citizens for better or for worse as part of our Digital Conversations series.

P2191171 - Copy

The ensuing discussion was wide-ranging, provocative, and highly stimulating, touching on themes that included digital reproduction and labour, the assumptions behind the internet, privacy and anonymity, the intersections between online and offline experience, and the likely Barons were a digital Magna Carta to arise.

P2191195 - Copy

A recording of the panel and open discussion sections of the event are now available on Soundcloud. We thank the speakers for agreeing to share this under a Creative Commons Attribution licence. My live, partial, and incomplete notes from the events are on GitHub Gist.

Our next Digital Conversations event will take place on 21 May and will examine the state of the art with regards to Digital Music Analysis. Tickets can be booked via Eventbrite. These are free but demand is high, so book now to avoid disappointment

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

15 April 2015

Library Carpentry: call for volunteers, call for participants

There is demand in the library community for skills that enable librarians to automate tasks, integrate versioning into workflows, and to clean and manipulate data. Whilst Software Carpentry offers this training with a focus on the needs and requirements of research scientists, the needs and requirements of library professionals are different.

In February I asked what Library Carpentry might look like and how best to spend my Software Sustainability Institute Fellowship figuring it out. The response from you, the community, was wonderful: you discussed the idea on Twitter, commented on this blog, sent links to resources and further reading, and offered to discuss further on Skype and in person (I've put together a Zotero library capturing much of this).

Following on from these discussion I am delighted to announce that Library Carpentry will take place in November 2015 at the Centre for Information Science, City University London. Library Carpentry will take the form of four three-hour sessions, open to around 40 to 50 participants. Each session will be lead by a session leader and will primarily involve participants working in groups from a worksheet with troubleshooting handled by skilled volunteers. The programme is likely to cover the counting and mining in the Unix shell, versioning with Git, and cleaning data with Open Refine, though this is subject to change based on the interests of participants and their pace of working. Places are free but booking will be essential.

Full details are available at the Library Carpentry website.

I am also delighted to announce that two calls are now open: a Call for Participants and a Call for Volunteers.

Call for Participants

If you want to come along or bring a group to Library Carpentry please register your interest by contacting James Baker or if you use GitHub by commenting on our Call for Participants thread.

Call for Volunteers

Library Carpentry needs skilled volunteers to lead sessions, support participants as they work through tasks, help answer questions participants may have, and to contribute to lessons plans as they develop. If you are interested in being a volunteer trainer at Library Carpentry contact James Baker or if you use GitHub comment on our Call for Volunteers thread.

Many thanks to Lyn Robinson and Ernesto Priego for agreeing to host Library Carpentry at Centre for Information Science, City University London, to Software Carpentry for their advice and for allowing us to use the 'Carpentry' moniker, and to the Software Sustainability Institute for supporting this activity. The Software Sustainability Institute cultivates world-class research with software. The Institute is based at the universities of Edinburgh, Manchester, Southampton and Oxford.

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

10 April 2015

Rob Sherman's End of Residency Seminar

All good things must come to an end and sadly this month that includes Rob Sherman’s residency attached to the Lines in the Ice exhibition and funded by CreativeWorks London & The Eccles Centre for American Studies. However, before we wave goodbye to Rob, we are having a free seminar at 2:30pm on Friday 17th April, to celebrate what he has achieved, there will also be special guests, tea and cake.  If you would like to come along, then please book a place here.

Rob has documented his residency at http://onmywifesback.tumblr.com/ and it has been an interesting exercise to reflect on what he has created since last August. Rob has made original art, both physical and digital, in response to the exhibition; including a book hand bound with support from the Library's Collection Care department (you can read more about this in their blog). The book was artificially aged to look like it is a real traveller's journal from the mid-19th Century and it has been on display in the gallery: baffling, delighting and amusing visitors in equal measure!

Rob also crafted a "digital cairn" for the gallery; this is a piratebox, i.e. a little web server which can only be accessed in the exhibition itself and which, when explored using a wireless device, reveals information about the travels of a man called Isaak Scinbank between the years 1852 and 1853 (a fictional character created by Rob, whose narrative is related to the real historical accounts of the lost Franklin expedition, which features in Lines in the Ice).

2014-11-01 10.03.25 for blog
Components used by Rob to build the "digital cairn"

Furthermore, in addition to the diary and the digital cairn, Scinbank’s story is also told via a Twine game, available at http://www.bl.uk/eccles/onmywifesback/ (best accessed using the Chrome browser).

Isaaksketch for blog

Portrait of Isaak Scinbank, by Rob Sherman

There have been previous events too; on the 17th January Rob hosted a public drop-in session in the gallery, where he invited visitors to talk to him about his work, this was very popular and one of the most visited days of the exhibition.

Also, he organised and performed at an evening event for the residency on the 5th February, which included readings by Nancy Campbell, J.R. Carpenter and Kate Pullinger. In the audience, was one of John Franklin’s descendants (unbeknown to Rob or myself); a week later this lady sent Rob a lovely handwritten letter telling him about the Franklin correspondence in her family’s archive. A magical end to what has been a very creative, ambitious and innovative project.