THE BRITISH LIBRARY

Digital scholarship blog

7 posts from June 2014

30 June 2014

Data Driven?

The accumulation of data is driving change in research libraries, but how do we shape that change and bring everyone we need to on-board with the implications? Such considerations were themes of the recent Data Driven: Digital Humanities in the Library conference hosted by the College of Charleston.

Members of the Digital Research team attended the event with three aims: to report on our activities (in particular our Digital Scholarship Training Programme), to discuss how we see our role in the ecosystem of digital research in the arts, humanities, and social sciences, and to gain insight on comparable work and agenda setting taking place in North American research libraries.

2014-06-21 14.38.58
Aquiles telling it how it is.

My notes from the event are available on GithHub Gist, presentations from our talks on Slideshare (Aquiles Alencar-Brayner, 'Digital Scholarship Training Programme at the British Library'; James Baker, 'Mind the gap! Posing problems to unify research with digital research' and 'Making the unfamiliar familiar. Reflections on training digital scholarship in a library filling up with data') and a volume containing many of the papers will appear in 2015 published by Purdue University Press.

Given the flurry of available and forthcoming content this, then, isn't the time or place for a conference write-up, but for reflection on some takeaways (even if the folks from Davidson College got there first), on topics likely to inform our own practice and thinking in the coming months.

2014-06-19 11.29.31
Our glorious setting for all this thinking

On roles and responsibilities, it was clear that many of the librarians in attendance were performing a role whose parameters could extend from promoting and publishing digital collections to training students and colleagues in digital research methods, from researching what digital technologies mean for research and pedagogy to advocating for open scholarship, from digital preservation activities to curating acquisitions of mixed media and born-digital collections, from creating physical maker spaces to networked data sandpits. Some undertaking work in this space were called Digital Humanities librarians, others were not, but all seemed to operate with the notion of the DH librarian as a framework for responding to their local situation; to innovate, to caution, and to enrich how their institution responds to the digital transformations taking place around it.

If this notion of the DH librarian bubbled beneath the surface, the value of embedding librarians into research and research-led teaching stood front and centre. The North American higher education context is important to emphasise here - one where large research libraries containing notable research active information professionals proliferates. Trevor Muñoz set the tone here, positioning librarians as collaborators in rather than supporters of research activity, arguing in favour of a DH librarianship resistant to notions of administrative and programmatic service, and teasing at the key points of connection, - evident in the history of librarianship pre-dating the digital - between core library work and humanistic scholarship (Trevor Muñoz, 'Data Driven but How Do We Steer This Thing'). Notable here too were talks by Jolanda-Pieta van Arnhem and Benjamin Fraser (College of Charleston) on collaborative teaching of urban cultural studies between library and faculty, by Harriet Green (University of Illinois at Urbana-Champaign) on faculty-library partnership in teaching digital literacy, and by Liz Milewicz (Duke University) on the challenges and opportunities of curating DH research projects and experiments (such as the Haiti Lab).

In sharing and reflecting on how we achieve and fail in this way and in venues such as Data Driven, we are confronted with why the work we seek to describe happened in the first place. This latter area is an aspect of library practice that Muñoz argues is worthy of more consideration, of couching in something grounded and theorised. For we need only to consider an historical perspective longer than reflections from one project to the next to gain (at least in part) that grounding, and here both Muñoz's paper and the chatter around Data Driven connected fruitfully with a surge of recent scholarship and comment on enriching DH (and avoiding it's eternal September - Nowviskie 2010) through histories of librarianship, of both the humanities (Bod 2013) and humanities disciplines (Turkel, Muhammedi, Start 2014), and of DH itself (Nyhan, Flinn, Welsh 2013; Terras 2014). As Wilard McCarty writes DH needs 'to begin remembering what our predecessors did and did not do, and the conditions under which they worked, so as to fashion stories for our future.' (McCarty 2014; see also a recent interjection on the Humanist group (28.140) responding to a statement by the Cambridge Centre for Digital Knowledge.) For research librarians to be part of that future, to flourish as collaborators in research, we need to allow ourselves then to be driven not by data but rather - with an eye to our pasts and the pasts of those with whom we work - by considerations around why and why not data.

James Baker

Curator Digital Research

@j_w_baker

17 June 2014

Open Source Digital Forensics Adapted for Archival and Memory Institutions

 
 
BitCurator Version 0.9.12 is available
 
Not far to Version 1.0 !
 
 

There have been "dramatic changes in the status of digital forensics within LAMs (Libraries, Archives and Museums) in just a few years". This is a conclusion of a wide ranging white paper released by the BitCurator Project: From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions

The BitCurator project is funded by the Andrew W. Mellon Foundation and is jointly led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH). Cal Lee (SILS) is Principal Investigator and Matthew Kirschenbaum (MITH) is Co-Principal Investigator.

It has been running since 2011 and is aimed at assessing and incorporating techniques and tools designed for digital forensics into the workflows of archival and collection institutions.

BitCurator is built on a number of free and open source forensic tools including bulkextractor and fiwalk, and makes extensive use of DFXML (Digital Forensics XML). Features include pre-imaging data triage, forensic disk imaging, filesystem analysis and reporting, identification of private and individually identifying information, and the export of technical and other metadata.
The overall design is modular which means that tools can be replaced gracefully.

The British Library has been at the forefront of the adoption of digital forensics for curatorial purposes, and has been participating in the BitCurator project through membership of the Professional Experts Panel (PEP) since 2011.

The white paper goes on to state: "Many institutions now acknowledge that procedures and practices for curation of born-digital materials should involve forensic tools and methods. There is growing recognition, for example, of the value of creating forensic disk images". 

Much of the credit for this recognition goes to the BitCurator team who have been strongly advocating the use of digital forensics to collecting institutions while at the same time disseminating tools and practices. The BitCurator website and wiki - richly populated with tutorials and guidelines (including screencasts) - have been advanced by Amanda Visconti while Porter Olsen has been leading community engagement at institutions throughout the USA. 

Porter is currently visiting the UK to give talks and hands on demonstrations at several institutions including the British Library under the auspices of Digital Scholarship. 



The development and consolidation of the software has been led technically by Kam Woods while Alexandra Chassanoff has been addressing metadata requirements and aspects of the workflow. BitCurator is now in version 0.9.12 and is close to becoming the full version 1.0. 

The software has been tested and explored at the British Library (in part to provide feedback) and is incorporated in the ongoing draft workflow for handling personal digital archives at the British Library. 




Bitcurator pep


Three highlights of the recent Professional Experts Panel meeting at the Maryland Institute for Technology in the Humanities, 30-31 May 2014: 



(1) There was an excellent talk by Matthew Kirschenbaum on the labels of floppy disks in the John Updike archive during which he discussed the scholarly interest of the writings and amendments on the exterior of floppy disks. 



(2) There was also a very interesting presentation by Jürgen Enge of the University of Applied Science and Art in Germany (Zentrum für Information, Medien und Technologie, Hochschule für angewandte Wissenschaft und Kunst) discussing some digital curation work with the German Literature Archive at Marbach, Germany (Deutsches Literaturarchiv Marbach) concerning digital acquisition, interpretation and access to floppy disks. A related paper can be found in a nestor publication: Beitrage des Workshops "Digitale Lanzeitarchivierung" auf der Informatik 2013



(3) The third highlight was a presentation by Courtney Mumma of Artefactual Systems, the group that is behind Archivematica which is also an open source and free software package aimed at digital preservation and the archival profession. Courtney was a member of the Digital Records Forensics project at the University of British Columbia and visited the eMSS Lab, the Digital Preservation Team and the Sound Archive technical team at the British Library some months ago.


Archivematica has adopted the microservices strategy for a fully integrated system and is collaborating with BitCurator having already incorporated digital forensic tools. The approach is very promising and reminds me of scientific workflow systems which not only outline the workflow but execute it step by step.


The PEP meeting concluded with a discussion led by Cal Lee about emerging plans for a BitCurator Consortium. Watch this Space!


Useful background papers are:



Adapting existing technologies for digitally archiving personal lives: digital forensics, ancestral computing, and evolutionary perspectives and tools from the iPRES 2008 Conference at the British Library

Digital forensics and born-digital content in cultural heritage collections from the Council on Library and Information Resources



Digital forensics and preservation, a technology watch paper from the Digital Preservation Coalition



Of special interest for its emphasis on disk images and Digital Forensics XML (DFXML) is the paper: 



Extending digital repository architectures to support disk image preservation and access from  the Joint Conference on Digital Libraries



Another influential paper focusses on authenticity and provenance of evidence in the context of digital records:



Digital record forensics: a new science and academic program for forensic readiness in the Journal of Digital Forensics, Security and Law

For a brief history of digital forensics and an outline of the interests that overlap with collecting institutions:

Shared perspectives, common challenges: a history of digital forensics & ancestral computing for digital heritage, a paper published by UNESCO Memory of the World Conference at Vancouver

Jeremy Leighton John @emsscurator

There have been "dramatic changes in the status of digital forensics within LAMs (Libraries, Archives and Museums) in just a few years". This is a conclusion of a wide ranging white paper released by the BitCurator Project: From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions

The BitCurator project is funded by the Andrew W. Mellon Foundation and is jointly led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH).

It has been running since 2011 and is aimed at assessing and incorporating techniques and tools designed for digital forensics into the workflows of archival and collection institutions.

BitCurator is built on a number of free and open source forensic tools including bulkextractor and fiwalk, and makes extensive use of DFXML (Digital Forensics XML). Features include pre-imaging data triage, forensic disk imaging, filesystem analysis and reporting, identification of private and individually identifying information, and the export of technical and other metadata. 

The overall design is modular which means that tools can be replaced gracefully.

The British Library has been at the forefront of the adoption of digital forensics for curatorial purposes, and has been participating in the BitCurator project through membership of the Professional Experts Panel (PEP) since 2011.

The white paper goes onto to state: "Many institutions now acknowledge that procedures and practices for curation of born-digital materials should involve forensic tools and methods. There is growing recognition, for example, of the value of creating forensic disk images".

Much of the credit for this recognition goes to the BitCurator team who have been strongly advocating the use of digital forensics to collecting institutions while at the same time disseminating tools and practices. The BitCurator website and wiki - richly populated with tutorials and guidelines (including webinars) - have been advanced by Amanda Visconti while Porter Olsen has been leading community engagement at institutions throughout the USA.

Porter Olsen is currently visiting the UK to give talks and hands on demonstrations at several institutions including the British Library under the auspices of Digital Scholarship.

The development and consolidation of the software has been led technically by Kam Woods while Alexandra Chassanoff has been addressing metadata requirements and aspects of the workflow. BitCurator is now in version 0.9.12 and is close to becoming the full version 1.0

The software has been tested and explored at the British Library (in part to provide feedback) and is incorporated in the draft workflow for handling personal digital archives at the British Library.

Bitcurator pep


Three highlights of the recent Professional Experts Panel meeting at the Maryland Institute for Technology in the Humanities, 30-31 May 2014:

(1) There was an excellent talk by Matthew Kirschenbaum on the labels of floppy disks in the John Updike archive during which he discussed the scholarly interest of the writings and amendments on the exterior of floppy disks.

(2) There was also a very interesting presentation by Jürgen Enge of the University of Applied Science and Art in Germany (Zentrum für Information, Medien und Technologie, Hochschule für angewandte Wissenschaft und Kunst) discussing some digital curation work with the German Literature Archive at Marbach, Germany (Deutsches Literaturarchiv Marbach) concerning digital acquisition, interpretation and access to floppy disks. A related paper can be found in a nestor publication: Beitrage des Workshops "Digitale Lanzeitarchivierung" auf der Informatik 2013

(3) The third highlight was a presentation by Courtney Mumma of Artefactual Systems, the group that is behind Archivematica which is also an open source and free software package aimed at digital preservation and the archival profession. Courtney was a member of the Digital Records Forensics project at the University of British Columbia and visited the eMSS Lab and the Digital Preservation Team at the British Library some months ago.

Archivematica has adopted the microservices strategy for a fully integrated system and is collaborating with BitCurator having already incorporated digital forensic tools. The approach is very promising and reminds me of scientific workflow systems which not only outline the workflow but execute it step by step.

The PEP meeting concluded with a discussion about emerging plans for a BitCurator Consortium. Watch this Space!

Useful background papers are:

Digital forensics and born-digital content in cultural heritage collections from the Council on Library and Information Resources

Digital forensics and preservation, a technology watch paper from the Digital Preservation Coalition

Of special interest for its emphasis on disk images and Digital Forensics XML (DFXML) is the paper:

Extending digital repository architectures to support disk image preservation and access from  the Joint Conference on Digital Libraries

Another influential paper focusses on authenticity and provenance of evidence in the context of digital records:

Digital record forensics: a new science and academic program for forensic readiness in the Journal of Digital Forensics, Security and Law

- See more at: file:///Users/digitalcuration/Desktop/post.html#sthash.sF3rhyx6.dpuf

There have been "dramatic changes in the status of digital forensics within LAMs (Libraries, Archives and Museums) in just a few years". This is a conclusion of a wide ranging white paper released by the BitCurator Project: From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions

The BitCurator project is funded by the Andrew W. Mellon Foundation and is jointly led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH).

It has been running since 2011 and is aimed at assessing and incorporating techniques and tools designed for digital forensics into the workflows of archival and collection institutions.

BitCurator is built on a number of free and open source forensic tools including bulkextractor and fiwalk, and makes extensive use of DFXML (Digital Forensics XML). Features include pre-imaging data triage, forensic disk imaging, filesystem analysis and reporting, identification of private and individually identifying information, and the export of technical and other metadata. 

The overall design is modular which means that tools can be replaced gracefully.

The British Library has been at the forefront of the adoption of digital forensics for curatorial purposes, and has been participating in the BitCurator project through membership of the Professional Experts Panel (PEP) since 2011.

The white paper goes onto to state: "Many institutions now acknowledge that procedures and practices for curation of born-digital materials should involve forensic tools and methods. There is growing recognition, for example, of the value of creating forensic disk images".

Much of the credit for this recognition goes to the BitCurator team who have been strongly advocating the use of digital forensics to collecting institutions while at the same time disseminating tools and practices. The BitCurator website and wiki - richly populated with tutorials and guidelines (including webinars) - have been advanced by Amanda Visconti while Porter Olsen has been leading community engagement at institutions throughout the USA.

Porter Olsen is currently visiting the UK to give talks and hands on demonstrations at several institutions including the British Library under the auspices of Digital Scholarship.

The development and consolidation of the software has been led technically by Kam Woods while Alexandra Chassanoff has been addressing metadata requirements and aspects of the workflow. BitCurator is now in version 0.9.12 and is close to becoming the full version 1.0

The software has been tested and explored at the British Library (in part to provide feedback) and is incorporated in the draft workflow for handling personal digital archives at the British Library.

Bitcurator pep


Three highlights of the recent Professional Experts Panel meeting at the Maryland Institute for Technology in the Humanities, 30-31 May 2014:

(1) There was an excellent talk by Matthew Kirschenbaum on the labels of floppy disks in the John Updike archive during which he discussed the scholarly interest of the writings and amendments on the exterior of floppy disks.

(2) There was also a very interesting presentation by Jürgen Enge of the University of Applied Science and Art in Germany (Zentrum für Information, Medien und Technologie, Hochschule für angewandte Wissenschaft und Kunst) discussing some digital curation work with the German Literature Archive at Marbach, Germany (Deutsches Literaturarchiv Marbach) concerning digital acquisition, interpretation and access to floppy disks. A related paper can be found in a nestor publication: Beitrage des Workshops "Digitale Lanzeitarchivierung" auf der Informatik 2013

(3) The third highlight was a presentation by Courtney Mumma of Artefactual Systems, the group that is behind Archivematica which is also an open source and free software package aimed at digital preservation and the archival profession. Courtney was a member of the Digital Records Forensics project at the University of British Columbia and visited the eMSS Lab and the Digital Preservation Team at the British Library some months ago.

Archivematica has adopted the microservices strategy for a fully integrated system and is collaborating with BitCurator having already incorporated digital forensic tools. The approach is very promising and reminds me of scientific workflow systems which not only outline the workflow but execute it step by step.

The PEP meeting concluded with a discussion about emerging plans for a BitCurator Consortium. Watch this Space!

Useful background papers are:

Digital forensics and born-digital content in cultural heritage collections from the Council on Library and Information Resources

Digital forensics and preservation, a technology watch paper from the Digital Preservation Coalition

Of special interest for its emphasis on disk images and Digital Forensics XML (DFXML) is the paper:

Extending digital repository architectures to support disk image preservation and access from  the Joint Conference on Digital Libraries

Another influential paper focusses on authenticity and provenance of evidence in the context of digital records:

Digital record forensics: a new science and academic program for forensic readiness in the Journal of Digital Forensics, Security and Law

- See more at: file:///Users/digitalcuration/Desktop/post.html#sthash.sF3rhyx6.dpuf

16 June 2014

Images Online: a selection

Some months back we released digital images of 430 objects from our collections into the public domain via Flickr Commons. Well short of the million we made available in December, these images have the benefit of being more precisely described at an individual level. Indeed they derive from Images Online, an extensive repository of high quality imagery that reflects the vast size and diversity of the British Library collections.

063983.1575x1003

The World before the Deluge (1865)

The selection covers a range of topics, including images from the 12th century Topographia Hibernica, the signature of William Shakespeare from a Blackfriars mortgage-deed, engravings after drawings made in the countries Captain Cook visited during his first voyage to the South Pacific, illustrations from The Legend of Sleepy Hollow, drawings depicting eighteenth century Hungarian and Saxon dress, Georgian caricatures by Thomas Rowlandson, and many more.

These digital images are freely available for unrestricted use and reuse, and are of an ideal quality for blogging, teaching, and creative work. We know that the community have already been viewing and sharing the collection, but if you've put the images to good use do let us know by emailing digitalresearch@bl.uk or adding your work to our growing wiki of British Library Public Domain projects.

063200.1360x1663

The Penitential and other Psalms (circa 1509-1546)

025318.1746x1785

Seal from letter of Lord Nelson (1801)

James Baker

Curator, Digital Research

@j_w_baker

13 June 2014

Text to Image Linking Tool (TILT)

This is a detailed description of the Text to Image Linking Tool (TILT)  one of the winners for the British Library Labs competition 2014. It has been reposted on the Digital Scholarship blog on behalf of Desmond Schmidt and Anna Gerber, University of Queensland.

TILT is born again

This is a fresh start for the text-to-image linking tool (TILT). TILT is a tool for linking areas on a page-image taken from an old book, be it manuscript or print, and a clear transcription of its contents. As we rely more and more on the Web there is a danger that we will leave behind the great achievements of our ancestors in written form over the past 4,000 years. On the Web what happens to all those printed books, handwritten manuscripts on paper, vellum, papyrus, stone, or clay tablets etc.? Can we only see and study them by actually visiting a library or museum? Or is there some way that they can come to us, so they can be properly searched and studied, commented on and examined by anyone with a computer and an Internet link?
 
So how do we go about that? Haven't Google and others already put lots of old books onto the Web by scanning images of pages and their contents using OCR (optical character recognition)? Sure they have, and I don't mean to play down the significance of that, but for objects of greater than usual interest you need a lot more than mere page-images and unchecked OCR of its contents. For a start you can't OCR manuscripts, or not well enough at least. And OCR of even old printed books produces lots of errors. Laying the text directly on top of the page-images means that you can't see the transcription to verify its accuracy. Although you can search it you can't comment on it, format or edit it. And in an electronic world, where we expect so much more of a Web page than for it merely to sit there dumbly to be stared at, the first step in making the content more useful and interactive is to separate the transcription from the page-images.

Page-image and content side by side

Page images are useful because they show the true nature of the original artefact. Not so for transcriptions. These are composed of mere symbols that, by convention, were chosen to represent the contents of writing. You can't use just text on a line to represent complex mathematical formulae, drawings or wood-cuts, the typography, layout, or the underlying medium. So you still need an image of the original to provide supplementary information, and not least because you might want to verify that the transcription is a true representation of it. So the only practical way to do this is to put the transcription next to the image.
 
Now the problems start. One of the principles of HCI (human-computer interaction) design is that you have to to minimise the effort or ‘excise’ as the user goes about doing his or her tasks. And putting the text next to the image creates a host of problems that increase excise dramatically.
 
As the user scrolls down the transcription, reading it, at some point the page-image will need refreshing. And likewise if the user moves on to another page image, the transcription will have to move down also. So some linkage between the two is already needed even at the page-level of granularity.
 
And if the text is reformatted for the screen, perhaps on a small device like a tablet or a mobile phone, the line-breaks will be different from the original. So even if the printed text is perfectly clear, it won't be clear, as you read the transcription, where the corresponding part of the image is. You may say that this is easily solved by enforcing line-breaks exactly as they are in the original. But if you do that and the lines don't fit in the available width – and remember that half the screen is already taken up with the page-image – then the ends of each enforced line must wrap around onto the next line, or else they will become invisible off to the right. Either way it is pretty ugly and not at all readable. And consider also that the line height, or distance between lines in the transcription can never match that of the page-image. So at best you'll struggle to align even one line at a time in both halves of the display.

Scrolling1

So what's the answer? It is, as several others have already pointed out, to link the transcription to the page-image at the word-level. As the user moves the mouse over, or taps on, a word in the image or in the transcription the corresponding word can be highlighted in the other half of the display, even when the word is split over a line. And if needed the transcription can be scrolled up or down so that it automatically aligns with the word on the page. And now the ‘excise’ drops back to a low level.

Text-image2

Making it practical

The technology already exists to make these links, but the problem is, how? Creating them by hand is incredibly time-consuming and also very dull work. So automation is the key to making it work in practice. The idea of TILT is to make this task as easy and fast as possible, so we can create hundreds or thousands of such text-to-image linked pages at low cost, and make all this material truly accessible and usable. The old TILT was written at great speed for a conference in 2013. What it did well was outline how the process could be automated, but it had a number of drawbacks that can, now they are understood properly, be remedied in the next version. So this blog is to be a record of our attempts to make TILT into a practical tool. The British Library Labs ran a competition recently and we were one of two winners. They are providing us with support, materials and some publicity for the project. We aim to have it finished in demonstrable and usable form by October 2014.

Twitter: @bltilt

Blog: http://bltilt.blogspot.co.uk/

Desmond_cropped2Desmond Schmidt has degrees in classical Greek papyrology from the University of Cambridge, UK, and in Information Technology from the University of Queensland, Australia. He has worked in the software industry, in information security, on the Vienna Edition of Ludwig Wittgenstein, on Leximancer, a concept-mining tool, and on the AustESE (Australian electronic scholarly editing) project at the University of Queensland. He is currently a Research Scientist at the Institute for Future Environments, Queensland University of Technology.

 

Anna2croppedAnna Gerber is a full-stack developer and technical project manager specialising in digital humanities projects in the University of Queensland’s ITEE eResearch group. Anna was the senior software engineer for the AustESE project, developing eResearch tools to support the collaborative authoring and management of electronic scholarly editions. She is a contributor to the W3C (World Wide Web) Community Group for Open Annotation and was a a co-principal investigator on the Open Annotation Collaboration project. In her spare time, Anna is an avid maker who enjoys tinkering with wearables, DIY robots and 3D printers.

Victorian Meme Machine

Posted on behalf of Bob Nicholson (a more detailed explanation of his winning entry to the British Library Labs competition for 2014)

Introducing the Victorian Meme Machine

What would it take to make a Victorian joke funny again?

Nothing short of a miracle, you might think. After all, there are few things worse than a worn-out joke. Some provoke a laugh, and the best are retold to friends, but even the most delectable gags are soon discarded. While the great works of Victorian art and literature have been preserved and celebrated by successive generations, even the period’s most popular jokes have now been lost or forgotten.

Fortunately, thousands of these endangered jests have been preserved within the British Library’s digital collections. I applied to this year’s Labs Competition because I wanted to find these forgotten gags and bring them back to life. Over the next couple of months we’re going to be working together on a new digital project – the ‘Victorian Meme Machine’ [VMM].

  VMMLogo-cogThe Victorian Meme Machine (VMM)

The VMM will create an extensive database of Victorian jokes that will be available for use by both researchers and members of the public. It will analyse jokes and semi-automatically pair them with an appropriate image (or series of images) drawn from the British Library’s digital collections and other participating archives. Users will be able to re-generate the pairings until they discover a good match (or a humorously bizarre one) – at this point, the new ‘meme’ will be saved to a public gallery and distributed via social media. The project will monitor which memes go viral and fine-tune the VMM in response to popular tastes. Hopefully, over time, it’ll develop a good sense of humour!

Let’s take a closer look at how it’ll work. Here’s a simple, two-line joke taken from a late-Victorian newspaper:

Chicago Woman: How much do you charge for a divorce?
Chicago Lawyer: One hundred dollars, ma’am, or six for 500dols
.

Users will then be invited to give the joke descriptive tags, highlight key words, and describe its structure. Here’s an example of how our sample joke might be encoded:

Chicago_jokeEncoding a joke for the VMM

This will give the VMM all the data it needs to pair the joke with an appropriate image. In this first example, the joke had been paired with an image featuring a woman talking to a lawyer and presented in the form of a caption:

  Chicago_joke_3_peopleJoke paired with an image to create a meme.

We also hope to present the jokes in other formats, such as speech bubbles: 

Chicago_joke_woman_clerkJoke represented as speech bubbles. 

Each of these images is a close match for the joke – both feature women speaking to men who appear to be lawyers. However, if we loosen these requirements slightly then the pairings begin to take on a new (and sometimes rather bizarre) light:

  Chicago_joke_collageA selection of representations of the joke.

These are just some early examples of what the VMM might offer. When the database is ready, we’ll invite the public to explore other ways of creatively re-using the jokes. Together, I hope we’ll be able to resurrect some of these long-dead specimens of Victorian humour and let them live again – if only for a day.

Bob_nicholson_cropped_2Dr Bob Nicholson
Lecturer in History, Edge Hill University
Winner of British Library Labs Competition 2014

 

 

 

Bob Nicholson is lecturer in history specialising in nineteenth-century Britain and America, with a particular focus on journalism, popular culture, jokes, and transatlantic relations. Bob has been exploring representations of the United States, and the circulation of its popular culture, in Victorian newspapers and periodicals. He is a keen proponent of the Digital Humanities and likes to experiment with the new possibilities offered to both researchers and teachers by digital tools and archives. He has written for The Guardian, had his research covered by The Times, and was shortlisted by the British Broadcasting Corporation (BBC) and Arts and Humanities Research Council (AHRC) in their first search for New Generation Thinkers (2011).

@DigiVictorian

www.DigitalVictorianist.com

12 June 2014

British Library Labs Competition 2014 - Winners Announced!

Stella Wisdom, Digital Curator in the Digital Research team announced the two winners of the second British Library Labs 2014 competition as part of her opening key note speech at the European Library Automation Group (ELAG) : Lingering Gold conference at the University of Bath on 11 June, 2014.

A judging panel made up of leaders in Digital Scholarship, some who sit on the British Library Labs advisory board (Claire Warwick and Melissa Terras at University College London, Andrew Prescott at Kings College London, Tim Hitchcock at University of Sussex, David De Roure at the University of Oxford and Bill Thompson from the BBC) and members of the British Library's Digital Scholarship team met at the end of May to decide upon two winners of this year's competition. After much deliberation, we can now proudly announce that the winners for the 2014 British Library Labs competition are the 'Victorian Meme Machine' and the 'Text to Image Linking Tool'.

Victorian Meme Machine

Bob Nicholson of Edge Hill University
Twitter: @DigiVictorian Web: http://www.digitalvictorianist.com/

What would it take to make a Victorian joke funny again?

 
Video explaining the Victorian Meme Machine

While the great works of Victorian art and literature have been preserved and celebrated by successive generations, even the period’s most popular jokes have now been lost or forgotten. Fortunately, thousands of these endangered jests have been preserved within the British Library’s digital collections. This project aims to find these forgotten jokes and bring them back to life.

Victorian Meme Machine
Victorian Meme Machine

The ‘Victorian Meme Machine’ [VMM] will create an extensive database of Victorian jokes that will be available for use by other scholars. It will analyse jokes and semi-automatically pair them with an appropriate image (or series of images) drawn from the British Library’s digital collections and other participating archives. Users will be able to re-generate the pairings until they discover a good match (or a humorously bizarre one) – at this point, the new ‘meme’ will be saved to a public gallery and distributed via social media. The project will monitor which memes go viral and fine-tune the VMM in response to popular tastes.

Bob_nicholson_croppedBob Nicholson is lecturer in history specialising in nineteenth-century Britain and America, focusing on journalism, popular culture, jokes, and transatlantic relations. Bob has been exploring representations of the United States, and the circulation of its popular culture in Victorian newspapers and periodicals. He is a keen proponent of the Digital Humanities and has written for The Guardian, had his research covered by The Times, and was shortlisted by the British Broadcasting Corporation (BBC) and Arts and Humanities Research Council (AHRC) in their first search for New Generation Thinkers (2011).

Text to Image Linking Tool (TILT)

Desmond Schmidt and Anna Gerber of the University of Queensland
Twitter account: @bltilt and @AnnaGerber

 
Video of Desmond and Anna explaining TILT

In order to make old printed books and manuscripts accessible to a Web audience, it is essential to display the page image / facsimile of the original document next to its transcription. This allows the user to comment on the text, and to read it clearly, but because original documents are often hard to read, or have different line-breaks than text on a computer screen, it is easy to get lost trying to match up words in the document with words in the transcription. To overcome this, the team are developing semi-automatic methods to generate links that highlight corresponding parts of the page image and the text.

Visualising manuscript regions to enable linking to transcriptions
Visualising manuscript regions to enable linking to transcriptions

More information about TILT can be found here, http://dh2013.unl.edu/abstracts/ab-112.html

Anna GerberAnna Gerber is a software developer and technical project manager specialising in Digital Humanities projects at the University of Queensland’s ITEE (Information Technology and Electrical Engineering) eResearch group. Anna was the senior software engineer for the AustESE project, developing eResearch tools to support the collaborative authoring and management of electronic scholarly editions. She is a contributor to the W3C (World Wide Web) Community Group for Open Annotation and was a co-principal investigator on the Open Annotation Collaboration project.

Desmond SchmidtDesmond Schmidt has degrees in classical Greek papyrology from the University of Cambridge, UK, and in Information Technology from the University of Queensland, Australia. He has worked in the software industry, in information security, on the Vienna Edition of Ludwig Wittgenstein, on Leximancer, a concept-mining tool, and on the AustESE (Australian Electronic Scholarly Editing) project. He is currently a Research Scientist at the Institute for Future Environments, Queensland University of Technology.

 

The winners will work with the British Library for the next 5-6 months on their ideas and their work will be showcased at the British Library Conference Centre on Monday November 3rd 2014, whereupon a first prize of £3,000 and second prize of £1,000 will be awarded.

Finally, we would like to thank all of those that entered the competition this year and we hope to continue to run events and organise meetings where we can support researchers who would like to use the Library's digital content for their scholarly work.

We will be blogging about each of the projects over the next few months and you can track progress by following us on @BL_Labs

Mahendra Mahey @mahendra_mahey

06 June 2014

The British Library Big Data Experiment

This week The British Library Big Data Experiment began, a collaboration between the British Library Digital Research team, UCL Computer Science, and UCL Centre for Digital Humanities that will experiment with approaches to opening up the digital collections at the British Library, particularly to benefit those undertaking research in the arts and humanities.

The experiment itself will see a team of UCL Computer Science and Software Systems Engineering students work between now and September on an MSc project that seeks to develop experimental platforms for access to and interrogation of British Library public domain digital collections using the Microsoft Azure cloud infrastructure. Their brief is to design a research-oriented front end with adaptors and facades and to construct implementations of Azure APIs that are functionally scalable to the datasets provided. Features of this public front end might include recommender and similarity engines, machine learning interfaces, statistics integration, and the ability to bundle mixed subsets of digital resources for download.

P6050686 - Copy
Adam Farquhar, Head of Digital Scholarship, welcomes the student team to the British Library

The project team assembled through a process of self-selection and each member has their own reasons for being involved. For Stefan P. Alborzpour, MSc Computer Science, "the opportunity to plough through vast quantities of digitised historical content using intelligent systems presents an exciting challenge.  This shall benefit not only academics but enrich the wider community and I am thrilled at the prospect of contributing to this project." Stelios Georgiou, Testing Director and MSc Software Systems Engineering, is "eager to work for a globally renowned hub of information and knowledge, such as the British Library, because it will provide me with the opportunity to develop my skills in a challenging infrastructure setting." Wendy Wong, MSc Computer Science, is all about the data. "Big Data is such a modern and growing field now," she said, "to integrate it with century old works just shows how far the technological age has come, and this in itself I find very exciting." Finally, Nectaria Stavrou, Team Leader and MSc Software Systems Engineering, sees effecting change and the challenge of the project as the biggest draw: "Handling a mass amount of information and mining into that can be considered a big challenge, but yet powerful enough to advance humanity a step further. The British Library is definitely one of the greatest libraries globally and now is the time for that big move that will allow researchers and other people to find the gold, the information they seek faster.  And I am excited that I will be a small part of this significant change."

P6050685 - Copy

The first stage of the project will involve bringing the team up to speed with research patterns in the arts and humanities and the demands these researchers place on the British Library as digital library. Armed with a clear sense of user need, the students will then go on to grapple with the data from our Public Domain Microsoft Books collection (for more details see 'A million first steps'  and the British Library Public Domain wiki), before building an public facing experimental interface to the collection that you'll be able to use in your research.

This project is intended as the first stage of a long term collaboration that will see UCL Computer Science students using British Library open data and public domain digital collections to develop experimental services, tools, and infrastructures with support from the Digital Research Team.

James Baker

Curator, Digital Research

@j_w_baker