Digital scholarship blog: December 2013

Enabling innovative research with British Library digital collections

2 posts from December 2013

12 December 2013

A million first steps

We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images to us, allowing us to release them back into the Public Domain.

The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of.

Which brings me to the point of this release. We are looking for new, inventive ways to navigate, find and display these 'unseen illustrations'. The images were plucked from the pages as part of the 'Mechanical Curator', a creation of the British Library Labs project. Each image is individually addressible, online, and Flickr provies an API to access it and the image's associated description.

We may know which book, volume and page an image was drawn from, but we know nothing about a given image. Consider the image below. The title of the work may suggest the thematic subject matter of any illustrations in the book, but it doesn't suggest how colourful and arresting these images are.

(Aside from any educated guesses we might make based on the subject matter of the book of course.)

See more from this book: "Historia de las Indias de Nueva-España y islas de Tierra Firme..." (1867)

Next steps

We plan to launch a crowdsourcing application at the beginning of next year, to help describe what the images portray. Our intention is to use this data to train automated classifiers that will run against the whole of the content. The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.

The manifests of images, with descriptions of the works that they were taken from, are available on github and are also released under a public-domain 'licence'. This set of metadata being on github should indicate that we fully intend people to work with it, to adapt it, and to push back improvements that should help others work with this release.

There are very few datasets of this nature free for any use and by putting it online we hope to stimulate and support research concerning printed illustrations, maps and other material not currently studied. Given that the images are derived from just 65,000 volumes and that the library holds many millions of items.

If you need help or would like to collaborate with us, please contact us on email, or twitter (or me personally, on any technical aspects)

The Initial Layout

The images have been tagged to aid browsing and to provide new views on the works themselves. They are tagged by publication year (eg 1764, 1864, 1884), by book (eg 003927270, 000149253), by author (eg Charles Dickens) and by other means.

This structure is helpful but we can do better! We want to collaborate with researchers and anyone else with a good idea for how to markup, classify and explore this set with an aim to improve the data and to improve and add to the tagging. We are looking to crowdsource information about what is depicted in the images themselves, as well as using analytical methods to interpret them as a whole.

We are very interested to hear what ideas and projects people use these images for and we would ideally like to collaborate with those who have been inspired to explore them.

Finally, while they have been released into the public domain, we would like to direct you to a post by Dan Cohen titled "CC0 (+BY)" There is no obligation for you to attribute anything to us, but we'd appreciate it. The dataset will develop over time, and will improve after all!

Some examples

"Manners and Customs of the ancient Egyptians, ... Illustrated by drawings, etc. 3 vol. (A second series of the Manners and Customs of the Ancient Egyptians. 3 vol.)" by WILKINSON, John Gardner - Sir

"The United States of America. A study of the American Commonwealth, its natural resources, people, industries, manufactures, commerce, and its work in literature, science, education and self-government. [By various authors.] Edited by N. S. Shaler ... With many illustrations" by SHALER, Nathaniel Southgate.

"Comic History of Greece from the earliest times to the death of Alexander the Great ... Illustrated, etc" by SNYDER, Charles M.

"The Coming of Father Christmas" by MANNING, Eliza F.

"The Casquet of Literature, being a selection of prose and poetry from the works of the most admired authors. Edited with biographical and literary notes by C. Gibbon ... and M. E. Christie. Illustrated from original drawings by eminent artists" by GIBBON, Charles - Esq., and CHRISTIE (Mary Elizabeth) Miss

Posted by Ben O'Steen at 12:50 PM

Tags

BL Labs, Data, Experiments

11 December 2013

open, Open and ‘Open’: conflations of openness is public discourse

What do we mean when we use the word 'open'? And how does context change how open that open is?

It often helps when things are open… Open photograph courtesy of Flickr user loop_oh / Creative Commons Licensed

During a recent event at the Institute of Historical Research, a single speaker used the phrase 'and please don't tweet this' on three separate occasions. The phrase was used in both a negative and a positive sense: negative because something many in the room were doing was being prohibited; positive because we knew we were going to be told something in the spirit of open discourse. This then was a particular type of openness: a closed openness common to environments where face-to-face discussion among peers takes place. In the carefully curated openness of networked arenas, ergo Twitter, the three things that were preceded by 'and please don't tweet this' would never have been said by the speaker. But when that networked openness and the face-to-face openness collided, the speaker felt it necessary to bound our use of the former, to point out that the two were not the same.

But then even when things are open, walls can be useful… Open photograph courtesy of Flickr user loop_oh / Creative Commons Licensed

Now I should add, this was no 'How dare you tweet my talk!' debacle. On this occasion it was assumed the openness in the room would be networked out, and hence the need for the - polite - 'and please don't tweet this' caveat. And yet it set me thinking, for much of what we mean by open is, it seems to me, inherently contextual. This is hardly revelatory: language is inherently contextual. But I think the point is worth elaborating, even if there is plenty of relevant literature I'm failing to mention. Put simply, an open access publication is not an open invitation to snoop around the entire process by which the publication was made (though I'd like to see historians, for example, publishing their data with their papers: the days of clinging onto data beyond the point of publication, and therefore not allowing another scholar to replicate the results in the publication, are surely numbered). Similarly, an open access policy, whether a publisher's or a research council's, is open to all in theory, but in practice only if authors are willing and able to negotiate an often substantial financial hurdle. During the course of a talk, panel or conference discussion, a hashtag enabled conversation on Twitter (often referred to as a backchannel) is open to all with access to the web but is conducted with a knowledge that not everyone at the event is there or will be there - tellingly those conversation often don't, and are often assumed not to need to, make it into the room, into the other available venue for scholarly discourse. And open data, even data shared under permissive open licences, is often made open with caveats that an individual's, a group's or an institution's generosity is somehow acknowledged, respected or replicated: what Dan Cohen has recently called 'CC0 (+BY)'.

I guess much depends on how you slice the cake… Open Access Week photograph courtesy of Flickr user slubdresden / Creative Commons Licensed

I'm very much in favour of the latter approach, as is the British Library. Indeed very soon we'll be pushing out a large amount of digitised stuff under a Public Domain declaration. And although we'd like folks to tell us how they use that stuff, they won't have to: legally, morally and institutionally we will have waived any right to demand such information. This is very open, and yet on reflection I am conscious that the content is the only open part of this activity: we won't be making fully open our communications with external partners, our internal discussions over standards, structures and releases, and our plans to capture data on the use of all this digital stuff. And while some of these communications, discussions and plans will be no doubt be blogged about, will be made open via a particular form of publication, there is a reasonable expectation that much of this information will not be made open. Or at least not published openly, in favour of talking about it openly at seminars or conferences, in open and yet closed face-to-face environments occupied by our peers: peers who'll not only value this information but are likely to know, by way of a speaker's intonation or their own intuition, what is and isn't sayable in other open environments. Indeed some of you may already know what this stuff is, and because you've talked to us about it you'll appreciate why I'm not spilling the beans just yet.

'And please don't tweet this' comes to mind.

@j_w_baker

Posted by James Baker at 12:50 PM

Tags

Data, Events