Digital scholarship blog

Enabling innovative research with British Library digital collections

12 December 2013

A million first steps

We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images to us, allowing us to release them back into the Public Domain.

The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of.

Which brings me to the point of this release. We are looking for new, inventive ways to navigate, find and display these 'unseen illustrations'. The images were plucked from the pages as part of the 'Mechanical Curator', a creation of the British Library Labs project. Each image is individually addressible, online, and Flickr provies an API to access it and the image's associated description. 

We may know which book, volume and page an image was drawn from, but we know nothing about a given image. Consider the image below. The title of the work may suggest the thematic subject matter of any illustrations in the book, but it doesn't suggest how colourful and arresting these images are.

(Aside from any educated guesses we might make based on the subject matter of the book of course.)

11075039705_36900f9312

See more from this book: "Historia de las Indias de Nueva-España y islas de Tierra Firme..." (1867)

Next steps

We plan to launch a crowdsourcing application at the beginning of next year, to help describe what the images portray. Our intention is to use this data to train automated classifiers that will run against the whole of the content. The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.

The manifests of images, with descriptions of the works that they were taken from, are available on github and are also released under a public-domain 'licence'. This set of metadata being on github should indicate that we fully intend people to work with it, to adapt it, and to push back improvements that should help others work with this release. 

There are very few datasets of this nature free for any use and by putting it online we hope to stimulate and support research concerning printed illustrations, maps and other material not currently studied. Given that the images are derived from just 65,000 volumes and that the library holds many millions of items.

If you need help or would like to collaborate with us, please contact us on email, or twitter (or me personally, on any technical aspects)

The Initial Layout

The images have been tagged to aid browsing and to provide new views on the works themselves. They are tagged by publication year (eg 1764, 1864, 1884), by book (eg 003927270000149253), by author (eg Charles Dickens) and by other means.

This structure is helpful but we can do better! We want to collaborate with researchers and anyone else with a good idea for how to markup, classify and explore this set with an aim to improve the data and to improve and add to the tagging. We are looking to crowdsource information about what is depicted in the images themselves, as well as using analytical methods to interpret them as a whole.

We are very interested to hear what ideas and projects people use these images for and we would ideally like to collaborate with those who have been inspired to explore them.

Finally, while they have been released into the public domain, we would like to direct you to a post by Dan Cohen titled "CC0 (+BY)" There is no obligation for you to attribute anything to us, but we'd appreciate it. The dataset will develop over time, and will improve after all!

Some examples

11223149846_449d526f31_z

"Manners and Customs of the ancient Egyptians, ... Illustrated by drawings, etc. 3 vol. (A second series of the Manners and Customs of the Ancient Egyptians. 3 vol.)" by WILKINSON, John Gardner - Sir

11305478975_8d6c506459

"The United States of America. A study of the American Commonwealth, its natural resources, people, industries, manufactures, commerce, and its work in literature, science, education and self-government. [By various authors.] Edited by N. S. Shaler ... With many illustrations" by SHALER, Nathaniel Southgate.

11307227433_e5bb52c3ba_z

"Comic History of Greece from the earliest times to the death of Alexander the Great ... Illustrated, etc" by SNYDER, Charles M.

11228106243_cfaba62d0f_z

"The Coming of Father Christmas" by MANNING, Eliza F.

11232670175_86031d436a_z

"The Casquet of Literature, being a selection of prose and poetry from the works of the most admired authors. Edited with biographical and literary notes by C. Gibbon ... and M. E. Christie. Illustrated from original drawings by eminent artists" by GIBBON, Charles - Esq., and CHRISTIE (Mary Elizabeth) Miss

Comments

I used PhotoShop's gallery option to create a set of linked thumbnail images of the pages of BL MS Harley 4431 in a joint project with the British Library. You can see more at our G+ Social Media sharing site at this address

http://goo.gl/x3axvI

Many thanks for any comments on our project. We will be giving a paper on this at the Medieval manuscript conference in Kalamazoo next year so would be interested in hearing from you on MS presentation.

This is a splendid initiative, much to be applauded. I have played a little with the Flickr photo stream, and reported on my experiences here: http://landscapelover.wordpress.com/2013/12/19/a-million-first-steps/
As a landscape historian, I would be delighted to collaborate with the Library on ways of exploring and improving the dataset.

What an incredible resource this is. Thank you all so much for making it available!

You might be interested in a project at WIkimedia Commons (the image-bank arm of Wikipedia) to make a subject and place index to the books in the collection, at
https://commons.wikimedia.org/wiki/Commons:British_Library/Mechanical_Curator_collection/Synoptic_index

It's early days yet -- as of December 24 only about 1200 of the titles have been indexed, corresponding to about 12% of the images in the Collection, so the listings at the moment are still very very incomplete -- but with luck it will grow and grow over the next days and weeks.

Commons also has a full list up of all the titles that were scanned to create the collection, at

https://commons.wikimedia.org/wiki/Commons:British_Library/Mechanical_Curator_collection/Full_list_of_books

The comments to this entry are closed.

.