THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

29 April 2015

The British Library Machine Learning Experiment

Add comment Comments (0)

The British Library Big Data Experiment is an ongoing collaboration between British Library Digital Research and UCL Department of Computer Science, facilitated by UCL Centre for Digital Humanities, that enables and engages students in computer science with humanities research and digital libraries as part of their core assessed work.

The experiment plays host to undergraduate and postgraduate student projects that provide the Digital Research team with an experimental test-bed for developing, exploring and exploiting technical infrastructure and digital content in ways that may benefit humanities researchers. Enables Computer Science students to develop skills in a new (and often foreign) domain encourages critical thinking and questioning of their assumptions about the role of library and humanities scholars through real-world, complex projects that stretch and develop both their technical abilities and understanding of user requirements. Further, having Computer Science students engage with Humanities scholars as a routine part this work creates deeper mutual understanding of research needs and discipline specific practices.

The 'big data' in question here is a collection of circa 68k 16th – 19th century Public Domain digitised volumes. The data contains both optical character recognition derived text and over 1 million illustrations of which little is known apart from the size of the images and in which and on which page they appear (for more on the dataset see Ben O'Steen 'A million first steps').

The latest output from the project - the British Library Machine Learning Experiment - is led by a BSc systems engineering module team (Durrant, Rafdi, Sarraf). Together the team designed a public service built around a range of open source services and software (MongoDB, Heroku, Node.js, Weka). This services indexes a subset of the 1 million image collection using tags generated by two public image recognition APIs (Alchemy and Imagga) and a bespoke algorithm. Confidence values are returned and features implemented that allow users to not only search for tags but also browse by tag and by frequently co-occurring tags. The interface even allows a user to tag a random image themselves to see how quickly image recognition APIs can assign tags to images.

Screenshot 2015-04-07 15.30.46 - Copy

The British Library Machine Learning Experiment can be found at http://blbigdata.herokuapp.com/. A video demonstration detailing the service functionality is embedded below. It is clear from using the experimental service that machine learning approaches to image recognition remains a maturing field. Nevertheless, as was underscored by a British Library Labs event last year on large scale image analysis (see my notes from the event), significant advances have been made in recent years. Searches of the British Library Machine Learning Experiment for the tags 'animal', 'bird', or 'church' confirm this trend.

Code from the British Library Machine Learning Experiment is available for reuse under a MIT licence. As this project is very much an experiment, we welcome your feedback via this blog, an email, or GitHub.

Rafdi, Muhammad; Sarraf, Ali; Durrant, James; Baker, James (2015). British Library Machine Learning Experiment. Zenodo. 10.5281/zenodo.17168

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

17 April 2015

Looking back on Digital Conversations: A Web of Rights

Add comment Comments (0)

In February, we welcomed Dr Martin Paul Eve, Jim Killock, Professor John Naughton, and Dr Joss Wright to the British Library to discuss how and in what ways the web has complicated, enhanced, and changed the rights of citizens for better or for worse as part of our Digital Conversations series.

P2191171 - Copy

The ensuing discussion was wide-ranging, provocative, and highly stimulating, touching on themes that included digital reproduction and labour, the assumptions behind the internet, privacy and anonymity, the intersections between online and offline experience, and the likely Barons were a digital Magna Carta to arise.

P2191195 - Copy

A recording of the panel and open discussion sections of the event are now available on Soundcloud. We thank the speakers for agreeing to share this under a Creative Commons Attribution licence. My live, partial, and incomplete notes from the events are on GitHub Gist.

Our next Digital Conversations event will take place on 21 May and will examine the state of the art with regards to Digital Music Analysis. Tickets can be booked via Eventbrite. These are free but demand is high, so book now to avoid disappointment

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

15 April 2015

Library Carpentry: call for volunteers, call for participants

Add comment Comments (0)

There is demand in the library community for skills that enable librarians to automate tasks, integrate versioning into workflows, and to clean and manipulate data. Whilst Software Carpentry offers this training with a focus on the needs and requirements of research scientists, the needs and requirements of library professionals are different.

In February I asked what Library Carpentry might look like and how best to spend my Software Sustainability Institute Fellowship figuring it out. The response from you, the community, was wonderful: you discussed the idea on Twitter, commented on this blog, sent links to resources and further reading, and offered to discuss further on Skype and in person (I've put together a Zotero library capturing much of this).

Following on from these discussion I am delighted to announce that Library Carpentry will take place in November 2015 at the Centre for Information Science, City University London. Library Carpentry will take the form of four three-hour sessions, open to around 40 to 50 participants. Each session will be lead by a session leader and will primarily involve participants working in groups from a worksheet with troubleshooting handled by skilled volunteers. The programme is likely to cover the counting and mining in the Unix shell, versioning with Git, and cleaning data with Open Refine, though this is subject to change based on the interests of participants and their pace of working. Places are free but booking will be essential.

Full details are available at the Library Carpentry website.

I am also delighted to announce that two calls are now open: a Call for Participants and a Call for Volunteers.

Call for Participants

If you want to come along or bring a group to Library Carpentry please register your interest by contacting James Baker or if you use GitHub by commenting on our Call for Participants thread.

Call for Volunteers

Library Carpentry needs skilled volunteers to lead sessions, support participants as they work through tasks, help answer questions participants may have, and to contribute to lessons plans as they develop. If you are interested in being a volunteer trainer at Library Carpentry contact James Baker or if you use GitHub comment on our Call for Volunteers thread.

Many thanks to Lyn Robinson and Ernesto Priego for agreeing to host Library Carpentry at Centre for Information Science, City University London, to Software Carpentry for their advice and for allowing us to use the 'Carpentry' moniker, and to the Software Sustainability Institute for supporting this activity. The Software Sustainability Institute cultivates world-class research with software. The Institute is based at the universities of Edinburgh, Manchester, Southampton and Oxford.

James Baker

Curator, Digital Research

@j_w_baker

---

Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources