Digital scholarship blog

Enabling innovative research with British Library digital collections


Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

27 May 2015

Digital Conversations @ BL: Digital Music Analysis

Add comment Comments (0)

Last week the BL Digital Research team organised another Digital Conversations event to discuss research projects and trends on digital music analysis. The theme couldn’t be more timely as we just heard the news that the Library has been awarded a £9.5M grant from the Heritage Lottery Fund, as part of the BL’s Save Our Sounds campaign, to digitise and provide access to 500,000 rare, unique and at-risk sound recordings from our Sound Archive and other key audio collections in the UK.

Dr. Tillman Weyde kicked off the event by presenting some interesting findings from the Digital Music Lab, an AHRC funded project aimed at developing new software infrastructure to support musicologists to enquire into large collections of audio files, comparing and interpreting results applying innovative methodological approaches into musicology research. By analysing thousands of sounds recordings and metadata from the BL, CHARM and I Like Music datasets, researchers are now able to discover common patterns shared by specific musical genres, compare information on relationships between different musical styles and visualise changes in tonality, pitch and tempo as applied to a variety of genres as well as within a single piece recorded by various artists in different times and locations. One of the outcomes of this project was the development of an open Web interface that shows to the general public the various ways in which musical genres can be compared according to specific music parameters.

Digital Conversation 8 no 1.compressedAquiles Alencar-Brayner introducing the speakers

Prof. David Rowland and Dr. Simon Brown spoke about the Listening Experience Database project aimed at creating a database of transcribed personal accounts – mainly from manuscripts and printed sources – describing public responses to music. The LED database is a successful example on the importance of crowdsourcing activities for collecting and generating new data. So far the project has received 10,000 entries from the public and researchers involved in the project are interested in expanding the community of contributors so as to add more information to the database. If you are interested in contributing to this project on a more regular basis, or in learning more about the contribution process generally, please send an email to .

Prof. Mark Plumbley spoke about the ESRC funded project “Musical Audio Repurposing using Source Separation” lead by Queen Mary University, London.  The aim of the project is to develop a new approach to new methods for musical audio source separation, focussing on soloing and remixing of content to be generated during the project. Researchers involved in this project will also develop a software infrastructure to identify and extract different sounds from a single recording such, for example, the separation of each instrument in an orchestra recording or extraction of different sounds in environmental and wildlife audio files which will become available for researchers by the end of the project in 2017. 

Our colleague, Dr. Sandra Tuppen, discussed the Big Data History of Music, another AHRC funded project involving the British Library in partnership with Royal Holloway aimed at bringing together the world’s biggest datasets on published sheet music, music manuscripts and classical concerts (in excess of 5 million records). Through statistical analysis, manipulation and visualisation of this data, the project will develop new methods for researching music history in innovative ways, associating information from various library catalogues to analyse long term patterns in music trends, music dissemination and popularity, development of music taste, performances, relationships and influences between composers since the 15th century. As Sandra remarked, humans create catalogues and catalogues (as well as humans) change over time, hence the importance for today’s researchers to understand how early music data has been collected and described over the last seven centuries. BL catalogue of printed music used for the Big Data History of Music is available for download and re-use under CC0 license at the British Library open data page.

The last speaker of the evening, Dr Erinma Ochu, discussed the Hookedonmusic project she has been involved which aims to collect information on what makes a tune catchy for the general public. The data used for the project is based on a crowdsourcing activity via a Web based game interface that presents some music extracts to the player who decides which tunes are mostly associated with memories of past experiences. So far 175,000 people have played the hookedonmusic game helping to build the research database of musical memory. Amongst many interesting and multifaceted results (did you know that the catchiest tune since the 1950s according to the information provided by the players is Wannabe by the Spice Girls?) Hookedonmusic is helping researchers to better understand how long term memory is trigged in Alzheimer’s patients through connection between life facts and the music to which they are associated so as to support the treatment of individuals suffering from memory loss. Have a go on the game and bring back the good moments you lived through music.

The event, chaired by Prof. Stephen Cotrell Head of the Music Department at City University London, raised interesting points for debate with the audience. The main message of the evening, at least from my perspective, was that the interdisciplinary work these projects are promoting by putting together musicologists, computer scientists, engineers, archivist and content curators are an essential step to demonstrate how important digital scholarship is for today’s researchers – no matter what discipline we work in!


Aquiles Alencar-Brayner

Curator, Digital Research


29 April 2015

The British Library Machine Learning Experiment

Add comment Comments (0)

The British Library Big Data Experiment is an ongoing collaboration between British Library Digital Research and UCL Department of Computer Science, facilitated by UCL Centre for Digital Humanities, that enables and engages students in computer science with humanities research and digital libraries as part of their core assessed work.

The experiment plays host to undergraduate and postgraduate student projects that provide the Digital Research team with an experimental test-bed for developing, exploring and exploiting technical infrastructure and digital content in ways that may benefit humanities researchers. Enables Computer Science students to develop skills in a new (and often foreign) domain encourages critical thinking and questioning of their assumptions about the role of library and humanities scholars through real-world, complex projects that stretch and develop both their technical abilities and understanding of user requirements. Further, having Computer Science students engage with Humanities scholars as a routine part this work creates deeper mutual understanding of research needs and discipline specific practices.

The 'big data' in question here is a collection of circa 68k 16th – 19th century Public Domain digitised volumes. The data contains both optical character recognition derived text and over 1 million illustrations of which little is known apart from the size of the images and in which and on which page they appear (for more on the dataset see Ben O'Steen 'A million first steps').

The latest output from the project - the British Library Machine Learning Experiment - is led by a BSc systems engineering module team (Durrant, Rafdi, Sarraf). Together the team designed a public service built around a range of open source services and software (MongoDB, Heroku, Node.js, Weka). This services indexes a subset of the 1 million image collection using tags generated by two public image recognition APIs (Alchemy and Imagga) and a bespoke algorithm. Confidence values are returned and features implemented that allow users to not only search for tags but also browse by tag and by frequently co-occurring tags. The interface even allows a user to tag a random image themselves to see how quickly image recognition APIs can assign tags to images.

Screenshot 2015-04-07 15.30.46 - Copy

The British Library Machine Learning Experiment can be found at A video demonstration detailing the service functionality is embedded below. It is clear from using the experimental service that machine learning approaches to image recognition remains a maturing field. Nevertheless, as was underscored by a British Library Labs event last year on large scale image analysis (see my notes from the event), significant advances have been made in recent years. Searches of the British Library Machine Learning Experiment for the tags 'animal', 'bird', or 'church' confirm this trend.

Code from the British Library Machine Learning Experiment is available for reuse under a MIT licence. As this project is very much an experiment, we welcome your feedback via this blog, an email, or GitHub.

Rafdi, Muhammad; Sarraf, Ali; Durrant, James; Baker, James (2015). British Library Machine Learning Experiment. Zenodo. 10.5281/zenodo.17168

James Baker

Curator, Digital Research



Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources

17 April 2015

Looking back on Digital Conversations: A Web of Rights

Add comment Comments (0)

In February, we welcomed Dr Martin Paul Eve, Jim Killock, Professor John Naughton, and Dr Joss Wright to the British Library to discuss how and in what ways the web has complicated, enhanced, and changed the rights of citizens for better or for worse as part of our Digital Conversations series.

P2191171 - Copy

The ensuing discussion was wide-ranging, provocative, and highly stimulating, touching on themes that included digital reproduction and labour, the assumptions behind the internet, privacy and anonymity, the intersections between online and offline experience, and the likely Barons were a digital Magna Carta to arise.

P2191195 - Copy

A recording of the panel and open discussion sections of the event are now available on Soundcloud. We thank the speakers for agreeing to share this under a Creative Commons Attribution licence. My live, partial, and incomplete notes from the events are on GitHub Gist.

Our next Digital Conversations event will take place on 21 May and will examine the state of the art with regards to Digital Music Analysis. Tickets can be booked via Eventbrite. These are free but demand is high, so book now to avoid disappointment

James Baker

Curator, Digital Research



Creative Commons Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources