Digital scholarship blog

Enabling innovative research with British Library digital collections

207 posts categorized "Experiments"

11 July 2014

The British Library Big Data Experiment project update

In this post, the British Library Big Data Experiment team reflect on their work in the first six weeks of the project. For more information on this collaboration between the British Library Digital Research team, University College London Computer Science, and University College London Centre for Digital Humanities see our kickoff post.

Since the project began in early June we have had an interesting time coming to terms with the typical workflow of a researcher from the arts and humanities.  One of the key tasks for this goal was conducting a focus group where we learnt a variety of different things, for instance, considering research conventions within the field, it was surprising to discover that researchers are willing to leverage modern computing tools such as text analysis.  During the focus group researchers expressed views and ideas which had never occurred to us, such as, “each instance of a book is a different object, it is unique because one specific copy is particular,” and “the person who composes the content can be different from the person who actually writes it.”  Having the researcher’s perspective conveyed to us in such a way was invaluable.  It was also useful to learn how they would improve existing search systems, “I would like intelligent suggestions” and another felt “feedback on which collection has been searched would be particularly useful.”  Overall the focus group was an essential learning exercise for getting this project off the ground.

We have also spent some time interrogating the British Library’s data and gained an appreciation for the variety, volume, velocity and veracity of its structure.  This presents a challenge which is interesting because it is not possible to resolve using familiar database software systems.  The data we have begun working with is quite diverse, it was created during the digitisation of approximately 40,000 titles (equates to approximately 65,000 volumes) which until recently been challenging for researchers and the public to access.  Now, all of the metadata, data and scans within the collection are dedicated into the public domain for unrestricted use.

The team have taken the opportunity to consult with key stakeholders and leading academics of the field.  All of this has set us up very nicely to begin development work.  In the coming weeks, we hope to build a powerful and intuitive service which will enable arts and humanities researchers to better interact with the British Library’s digitised collection of public domain books, thereby enabling them to access the data in a more meaningful way.

Nektaria Stavrou (Team Leader and MSc Software Systems Engineering, University College London), Stelios Georgiou (Testing Director and MSc Software Systems Engineering, UCL), Wendy Wong (MSc Computer Science, UCL), Stefan P. Alborzpour (MSc Computer Science, UCL)

16 June 2014

Images Online: a selection

Some months back we released digital images of 430 objects from our collections into the public domain via Flickr Commons. Well short of the million we made available in December, these images have the benefit of being more precisely described at an individual level. Indeed they derive from Images Online, an extensive repository of high quality imagery that reflects the vast size and diversity of the British Library collections.

063983.1575x1003

The World before the Deluge (1865)

The selection covers a range of topics, including images from the 12th century Topographia Hibernica, the signature of William Shakespeare from a Blackfriars mortgage-deed, engravings after drawings made in the countries Captain Cook visited during his first voyage to the South Pacific, illustrations from The Legend of Sleepy Hollow, drawings depicting eighteenth century Hungarian and Saxon dress, Georgian caricatures by Thomas Rowlandson, and many more.

These digital images are freely available for unrestricted use and reuse, and are of an ideal quality for blogging, teaching, and creative work. We know that the community have already been viewing and sharing the collection, but if you've put the images to good use do let us know by emailing [email protected] or adding your work to our growing wiki of British Library Public Domain projects.

063200.1360x1663

The Penitential and other Psalms (circa 1509-1546)

025318.1746x1785

Seal from letter of Lord Nelson (1801)

James Baker

Curator, Digital Research

@j_w_baker

13 June 2014

Victorian Meme Machine

Posted on behalf of Bob Nicholson (a more detailed explanation of his winning entry to the British Library Labs competition for 2014)

Introducing the Victorian Meme Machine

What would it take to make a Victorian joke funny again?

Nothing short of a miracle, you might think. After all, there are few things worse than a worn-out joke. Some provoke a laugh, and the best are retold to friends, but even the most delectable gags are soon discarded. While the great works of Victorian art and literature have been preserved and celebrated by successive generations, even the period’s most popular jokes have now been lost or forgotten.

Fortunately, thousands of these endangered jests have been preserved within the British Library’s digital collections. I applied to this year’s Labs Competition because I wanted to find these forgotten gags and bring them back to life. Over the next couple of months we’re going to be working together on a new digital project – the ‘Victorian Meme Machine’ [VMM].

  VMMLogo-cogThe Victorian Meme Machine (VMM)

The VMM will create an extensive database of Victorian jokes that will be available for use by both researchers and members of the public. It will analyse jokes and semi-automatically pair them with an appropriate image (or series of images) drawn from the British Library’s digital collections and other participating archives. Users will be able to re-generate the pairings until they discover a good match (or a humorously bizarre one) – at this point, the new ‘meme’ will be saved to a public gallery and distributed via social media. The project will monitor which memes go viral and fine-tune the VMM in response to popular tastes. Hopefully, over time, it’ll develop a good sense of humour!

Let’s take a closer look at how it’ll work. Here’s a simple, two-line joke taken from a late-Victorian newspaper:

Chicago Woman: How much do you charge for a divorce?
Chicago Lawyer: One hundred dollars, ma’am, or six for 500dols
.

Users will then be invited to give the joke descriptive tags, highlight key words, and describe its structure. Here’s an example of how our sample joke might be encoded:

Chicago_jokeEncoding a joke for the VMM

This will give the VMM all the data it needs to pair the joke with an appropriate image. In this first example, the joke had been paired with an image featuring a woman talking to a lawyer and presented in the form of a caption:

  Chicago_joke_3_peopleJoke paired with an image to create a meme.

We also hope to present the jokes in other formats, such as speech bubbles: 

Chicago_joke_woman_clerkJoke represented as speech bubbles. 

Each of these images is a close match for the joke – both feature women speaking to men who appear to be lawyers. However, if we loosen these requirements slightly then the pairings begin to take on a new (and sometimes rather bizarre) light:

  Chicago_joke_collageA selection of representations of the joke.

These are just some early examples of what the VMM might offer. When the database is ready, we’ll invite the public to explore other ways of creatively re-using the jokes. Together, I hope we’ll be able to resurrect some of these long-dead specimens of Victorian humour and let them live again – if only for a day.

Bob_nicholson_cropped_2Dr Bob Nicholson
Lecturer in History, Edge Hill University
Winner of British Library Labs Competition 2014

 

 

 

Bob Nicholson is lecturer in history specialising in nineteenth-century Britain and America, with a particular focus on journalism, popular culture, jokes, and transatlantic relations. Bob has been exploring representations of the United States, and the circulation of its popular culture, in Victorian newspapers and periodicals. He is a keen proponent of the Digital Humanities and likes to experiment with the new possibilities offered to both researchers and teachers by digital tools and archives. He has written for The Guardian, had his research covered by The Times, and was shortlisted by the British Broadcasting Corporation (BBC) and Arts and Humanities Research Council (AHRC) in their first search for New Generation Thinkers (2011).

@DigiVictorian

www.DigitalVictorianist.com

12 June 2014

British Library Labs Competition 2014 - Winners Announced!

Stella Wisdom, Digital Curator in the Digital Research team announced the two winners of the second British Library Labs 2014 competition as part of her opening key note speech at the European Library Automation Group (ELAG) : Lingering Gold conference at the University of Bath on 11 June, 2014.

A judging panel made up of leaders in Digital Scholarship, some who sit on the British Library Labs advisory board (Claire Warwick and Melissa Terras at University College London, Andrew Prescott at Kings College London, Tim Hitchcock at University of Sussex, David De Roure at the University of Oxford and Bill Thompson from the BBC) and members of the British Library's Digital Scholarship team met at the end of May to decide upon two winners of this year's competition. After much deliberation, we can now proudly announce that the winners for the 2014 British Library Labs competition are the 'Victorian Meme Machine' and the 'Text to Image Linking Tool'.

Victorian Meme Machine

Bob Nicholson of Edge Hill University
Twitter: @DigiVictorian Web: http://www.digitalvictorianist.com/

What would it take to make a Victorian joke funny again?

 
Video explaining the Victorian Meme Machine

While the great works of Victorian art and literature have been preserved and celebrated by successive generations, even the period’s most popular jokes have now been lost or forgotten. Fortunately, thousands of these endangered jests have been preserved within the British Library’s digital collections. This project aims to find these forgotten jokes and bring them back to life.

Victorian Meme Machine
Victorian Meme Machine

The ‘Victorian Meme Machine’ [VMM] will create an extensive database of Victorian jokes that will be available for use by other scholars. It will analyse jokes and semi-automatically pair them with an appropriate image (or series of images) drawn from the British Library’s digital collections and other participating archives. Users will be able to re-generate the pairings until they discover a good match (or a humorously bizarre one) – at this point, the new ‘meme’ will be saved to a public gallery and distributed via social media. The project will monitor which memes go viral and fine-tune the VMM in response to popular tastes.

Bob_nicholson_croppedBob Nicholson is lecturer in history specialising in nineteenth-century Britain and America, focusing on journalism, popular culture, jokes, and transatlantic relations. Bob has been exploring representations of the United States, and the circulation of its popular culture in Victorian newspapers and periodicals. He is a keen proponent of the Digital Humanities and has written for The Guardian, had his research covered by The Times, and was shortlisted by the British Broadcasting Corporation (BBC) and Arts and Humanities Research Council (AHRC) in their first search for New Generation Thinkers (2011).

Text to Image Linking Tool (TILT)

Desmond Schmidt and Anna Gerber of the University of Queensland
Twitter account: @bltilt and @AnnaGerber

 
Video of Desmond and Anna explaining TILT

In order to make old printed books and manuscripts accessible to a Web audience, it is essential to display the page image / facsimile of the original document next to its transcription. This allows the user to comment on the text, and to read it clearly, but because original documents are often hard to read, or have different line-breaks than text on a computer screen, it is easy to get lost trying to match up words in the document with words in the transcription. To overcome this, the team are developing semi-automatic methods to generate links that highlight corresponding parts of the page image and the text.

Visualising manuscript regions to enable linking to transcriptions
Visualising manuscript regions to enable linking to transcriptions

More information about TILT can be found here, http://dh2013.unl.edu/abstracts/ab-112.html

Anna GerberAnna Gerber is a software developer and technical project manager specialising in Digital Humanities projects at the University of Queensland’s ITEE (Information Technology and Electrical Engineering) eResearch group. Anna was the senior software engineer for the AustESE project, developing eResearch tools to support the collaborative authoring and management of electronic scholarly editions. She is a contributor to the W3C (World Wide Web) Community Group for Open Annotation and was a co-principal investigator on the Open Annotation Collaboration project.

Desmond SchmidtDesmond Schmidt has degrees in classical Greek papyrology from the University of Cambridge, UK, and in Information Technology from the University of Queensland, Australia. He has worked in the software industry, in information security, on the Vienna Edition of Ludwig Wittgenstein, on Leximancer, a concept-mining tool, and on the AustESE (Australian Electronic Scholarly Editing) project. He is currently a Research Scientist at the Institute for Future Environments, Queensland University of Technology.

 

The winners will work with the British Library for the next 5-6 months on their ideas and their work will be showcased at the British Library Conference Centre on Monday November 3rd 2014, whereupon a first prize of £3,000 and second prize of £1,000 will be awarded.

Finally, we would like to thank all of those that entered the competition this year and we hope to continue to run events and organise meetings where we can support researchers who would like to use the Library's digital content for their scholarly work.

We will be blogging about each of the projects over the next few months and you can track progress by following us on @BL_Labs

Mahendra Mahey @mahendra_mahey

13 May 2014

Crowdsourcing Comic Art

This month Comics Unmasked: Art and Anarchy in the UK opened at the British Library, a major exhibition celebrating the UK's rich heritage of mainstream and underground comic and comic art. Though much of the exhibition focuses on work produced by recent icons of the genre - Neil Gaiman, Alan Moore, Posy Simmonds - the British Library collections contains a wealth of early work from artists both iconic (James Gillray, George Cruikshank) and those whose work is unknown, forgotten, and unattributed.

As we are a library, much of this work is hidden away inside books, making it hard to find. This is where we need you.

11293432303_4e2fa1981b_z

Last year we released a collection of over a million images from the British Library's 18th, 19th, and 20th century digitised book collections into the Public Domain for unrestricted use and reuse (for more info see 'A million first steps'). As we used automated processes to clip these images from each digitised book, we knew very little about them apart from the title of the books themselves. Since then members of the public have added over 80,000 tags to these images, thereby aiding discovery of and research using the collection. As a result, certain patterns have been identified: there are many portraits, there are many maps, there are many beautiful decorative flourishes. But there is also a wealth of comic art in the collection: including reproductions of and homages to Georgian satire, gentle late-19th-century humorous illustration, picture puzzles, political drama, and early-Victorian cat memes.

11022944114_ef65515f0b_z

We have collected these together in Flickr under the tag 'comic_art' but we suspect there are many more hidden comic treasures to be found.

This is where you come in. All we ask is that you to head to the British Library Flickr page, enter some creative search terms in the search box (remember to select 'The British Library's Photostream' from the dropdown, or alternatively enter the URL https://www.flickr.com/search/?w=12403504@N02&q=YOURSEARCHTERMHERE into your web browser), browse the collection, tag any humorous, funny, satirical, ribald, or comic art you find with the tag 'comic_art', and share them via your prefered social network.

Update 14 May. We have two sets that refresh daily: 'Illustrations needing tags!' and 'Unseen Illustrations'. These sets represent the least tagged and least seen of the 1 million images. One approach would be to pick through those each day in search of comic art!

11181772315_90da61537b_z

Before Comic Unmasked closes we'll collect them all together as a set and report back on the fruits of your labour. Your efforts will help us unlock the secrets of the collection for the benefit of all, so we look forward to seeing what you find!

James Baker

Curator, Digital Research

@j_w_baker

02 April 2014

Unconferencing; or a digital scholarship training experiment

The British Library's Digital Scholarship Training programme aims to provide library colleagues with the skills and knowledge to best exploit the digital transformations taking place around us, both in and outside the research community.

To close the third semester of this programme we in Digital Research decided to embark on something of an experiment: to transform our one-day 'What is Digital Scholarship?' course (one of the sixteen one-day courses we offer) into a staff-only unconference on Digital Scholarship and Working Innovatively with Digital Collections.

For those out of the unconference loop, an 'Unconference' brings together delegates under a particular theme but the schedule for the day is entirely created by the attendees. Anyone can propose a session beforehand or on the morning of the event. The day begins with all who have proposed sessions having an opportunity to briefly pitch their ideas. A vote is then cast (at our event three votes each cast as ticks/crosses/marks placed next to a session name on a flip-chart, so relatively anonymous!) and those with the highest votes form the final schedule for the day (as we couldn't guarantee that there would be room for every proposed session, we asked colleagues not to over prepare!) Pitchers then act as facilitators for their sessions. Not all participants are required to propose a session - most, in fact, come along for the ride!

2014-03-28 13.04.38

As digital scholarship and innovation is happening across the British Library, we wanted to give colleagues the opportunity to share their interests and skills with others. We suggested that sessions could be on anything related to digital scholarship and innovation with digital cultural heritage collections in the broadest sense, and to avoid the event amounting to little more than a series of talks we asked colleagues to fit their proposal into one of more of the following categories:

  • Talk ...such as, a presentation on a digital project.
  • Make  ...such as, a session where attendees collaboratively build or work on something like tagging or geo-referencing a collection.
  • Teach ...such as, a session where you show a group of people how to do something, such as how to update Wikipedia articles.
  • Play ...such as, a discussion aimed at generating fresh, creative ideas for innovating with our digital collections or services.

We then put together a skeleton schedule (Pitching and Voting session 10-11.15 - First sessions 11.30-12.30 - Break 12.30-13.30 - Second sessions 13.30-14.30 - Third sessions 14.45-15.45 - Wrap-up, discussion & reflection 16.00-16.30) and put out a call for contributions via various internal channels.

On the day, every proposed session passed a threshold of interest and we hosted ten sessions across three rooms. These ranged from a discussion of open licensing, an introduction to editing Wikipedia, a talk about Chinese social media (who knew our one million Flickr images are blocked by the Great Firewall of China?), and a workshop on creative reuses of Europeana content, to an update on our web archiving and associated access activities, an informal survey on how we might engage with local history communities, and a lively session around what access to our digital content should and could look like in an ideal world.

  2014-03-28 14.07.07

As should now be clear, having no schedule did not equate to little organisation, to anarchy. Rather, a carefully constructed framework needed to be built around the day to ensure everything ran smoothly. And as it turned out, the event was as creative, provocative, and fun as we had hoped, as well as being enormously productive. In particular, what emerged was clear feedback regarding how the Digital Research team can develop our future training provision in line with staff needs, including not only a sense that colleagues valued a varied and creative programme but also around how best to introduce colleagues to digital scholarship. As so the unconference will return, but likely as an external event embedded within our otherwise staff-only programme. A few eager twitterers have expressed an interest in this already, but if you'd like to collaborate with us on an unconference around 'Digital Scholarship and Working Innovatively with Digital Collections' (working title) sometime later in 2014 then get in touch via email ([email protected]) or Twitter (@j_w_baker or @ndalyrose). We'd love to hear from you.

James Baker

Curator, Digital Research

@j_w_baker

27 March 2014

Tracking Public Domain Re-use in the Wild

We folks over at #bldigital are excited to be partnering with Technology Strategy Board and IC tomorrow on their next Digital Innovation Contest. £25K is up for grabs to encourage digital innovation in data. 

Our challenge, should you choose to accept it, is to encourage and establish the necessary feedback loop for tracking and measuring the use and impact of the public domain content we make available online.

The Library seeks to enrich the cultural life of the nation and stimulate economic and social growth and the release of 1 million images and counting into the public domain and on to Flickr Commons is one way in which we hope to fulfill that mission.

 

But what happens when this content is released into the wild? Once online, we have little way of following that content as it is re-used, which makes it difficult to measure the creative and economic benefits of having done so.  

At the moment we try and capture innovative re-use as best we can manually, primarily by scouring social media channels for mentions, but this is hardly sustainable nor scalable and we know there is much inspired activity we are missing. 

Colouringin1
Colouring-In Pages for Children by Zoe Toft

What we need is an innovative solution for enabling the sharing of this content on platforms which are popular but outside of our control, while retaining the ability to see how it has eventually been remixed and reused.

A formidable challenge we know, and one which is shared by anyone putting content online today. Think you might be able to crack it? Visit IC Tomorrow for more details on the call. Closing date is noon on May 7, 2014.

@ndalyrose

07 February 2014

The Metadata Quest

Posted on behalf of Sara Wingate Gray, originally posted here: http://artefacto.org.uk/content/metadata-quest-part-1

Recently, we've been involved in an exciting new project, which comes out of some exploratory work we produced during the British Library Labs May 2013 hack event as part of their inaugural 2013 digital collections competition. Here at artefacto, we were particularly excited when BL Labs launched, in March last year, not least because we'd been following the pioneering digital and creative libraryings of Harvard Library Lab for several years, alongside the more recent developments of the Digital Public Library of America and the wonderful work that the New York Public Library has been getting up to (check out their historical menu's project for a start). What's exciting about all these developments (and there's so many we could list in this vein: Europeana ... in fact, have a list, courtesy of The Open Knowledge Foundation) is the opening up of public access to this "digital reserve of knowledge", and the potential it brings, in the case of BL Labs, for instance, "to create new narratives from the British Library’s vast incredible digital collections from 19th Century books to archived websites and wildlife sounds to manuscripts to name but a few examples."1

For us, what's also intriguing, in this new world collision and collection of objects, and people, in digital space, is how we might go about piecing together, jigsaw-like, the underlying narratives which sit within: how do we help reveal and "unlock" each object's own story?

The May 2013 hackday event at BL Labs gave us the opportunity (and the excuse) to explore this question: with access to their 68,000 digitised volumes of text (from the 19th Books collection), sounds (e.g. the archive of Resonance FM, Survey of English Dialects), Ordance survey maps, and much much more, it promised a veritable feast of digital content, and importantly, metadata to get our hands on.

That metadata is finally a hot topic of discussion worldwide is not only music to the ears of all librarians out there (well, ok, not all of you guys, but you're the groundswell folks!) but it also means we don't have to give you a definition. Except we probably do, since all this consorting with the NSA is frankly giving metadata a bad rep right now (and no, Guardian newspaper, metadata is not just "information generated as you use technology"). Wikipedia provides the very vaguely straightforward term "data about data" as a definition, while Zeng and Qin (2008, p.7) note that "[b]roadly speaking, metadata encapsulates the information that describes any document or object in both digital and traditional formats."2 In the context of any of the British Library's digital content, for instance, this could mean information about a painting's date and artist, a map's geographic range, or a sound's physical placing (to name just a few instances or rather, metadata elements: take a look at The Library of Congress's sample of metadata for an 1864 letter from Alexander Melville Bell to Alexander Graham Bell if you really want to explore metadata in more detail).

Essentially, what excites us about metadata, is that by harnessing it in different ways, new surfaces and territories can suddenly open up in a digital object's narrative; by making explicit, textually and visually, an object's creation space, or time, new threads of connections are discovered and yarns newly spun.

The result of our brief two days at the BL Labs event was a quick build of an experimental version of our imagined platform, where digital content could sit waiting to be explored in these ways, through people navigating and thinking about these facets, and although not ultimately winning the Labs competition, we got some great feedback which suggested we should continue with our project and idea. We named it Curatorial and went on our merry way, content in our imaginings. It was great, therefore, to (some months later) find ourselves participating in the Data Tales project, which gave us the opportunity to further develop the platform: under the AHRC's Digital Transformations Network Data – Asset – Method: Harnessing the Infinite Archive network, we've been able to spend time re-building and imagining further what is possible, and what stories can be told, when metadata, objects and people digitally collide. Partnering with the Horizon Digital Economy Research Institute based at the University of Nottingham, Loughborough University, and the British Library for the Data Tales project has meant a great team experiment, and we presented the first results of our work together at a workshop at the British Library (January 24th, 2014).

One of the first ports of call when approaching this project was: how can we get our hands on the metadata we want? What types of collections (and their 'owners' or 'content holders'?) are out there? Our previous BL Labs experience grappling with the vast range of data types available from the British Library was really helpful, not least for priming us for detective work (what format is that geolocation data in exactly?), and so our sleuthing, and structuring, commenced.

"Why does a man need to tell stories to others and himself? It is a way by which the mind uses fantasy to structure the chaos of the original experience. Complex and unpredictable, the vivid experience always lacks what fiction can provide: a closed time, a hierarchy of events, the value of people, effects and causes, the connections under the actions."3

This is where the quest for metadata begins.

Why not come and see us speak at Making the Most of Metadata, on Wednesday 12 February, 2014 at the British Library.

TBC.

1. http://britishlibrary.typepad.co.uk/digital-scholarship/2013/03/bl-labs-launch-event.html

2. Zeng, M. L. and Qin, J. 2008. Metadata. New York: Neal Schuman Publishers.

3. Vargas Llosa, M. 1997. The Truth of Lies. In Making Waves. New York: Farrar, Straus and Giroux.

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs