Digital scholarship blog

Enabling innovative research with British Library digital collections

5 posts from November 2013

26 November 2013

Interactive Narratives @ British Library

Earlier this month the British Library hosted the second public event in our Digital Conversations Series. Run by the Digital Research team this series aims to bring together diverse panels to speak around, with help from the general public, a particular topic or controversy related to Digital Scholarship in the broadest sense.

PB040277
Photograph by Stella Wisdom. No known copyright restrictions.

On this occasion, the topic was Interactive Narratives and our panel - consisting of the author Iain Pears, the writer/designer Rob Sherman, and, from the academic community, Andrew Burn and Joanne Shattock - kicked off a lively discussion on the past, present and future of interactive stories and storytelling. Topics that came up included:

  • Reader choice as both a problem and an enabler.
  • Fragmenting and re-organising narratives.
  • Physical/digital worlds as story-telling devices.
  • Reading experience as non-author intentioned interactivity.
  • Pre-digital interactivity.

Our thanks go out to all the speakers and to the (sell-out!) crowd. For those who were unable to make it and want to know more, a recording from the event is now available as a podcast.

20131104_184538
The event in full swing. Photograph by Stella Wisdom. No known copyright restrictions.

The third event in the series will take place on 27 February 2014, and is provisionally entitled "Data Visualisation: is ugly the new smart?". Full details will be announced shortly.

25 November 2013

Mixing the Library: Information Interaction & the DJ - Origins

Posted on behalf of Dan Norton - British Library Labs 2013 Competition Winner

Following the completion of my PhD at the University of Dundee, I spent a period of time as Artist in Resident at Hangar Centre for Art and Research, Barcelona. There, I collaborated closely with the Department of Library Science and Documentation at the University of Barcelona to explore the potential value of the DJ’s model of information interaction in the field of Library Science and particularly with the use of digital collections.

It was with this background, I decided to enter the British Library Labs 2013 competition after participating in an online meeting: 

British Library Labs Virtual event, May 2013
British Library Labs Virtual Event (Google hangout), 17 May 2013

http://www.youtube.com/watch?v=RFt0NvbTFHs

My project idea was to apply the DJ's way of working with music, their model of information interaction (developed as part of my doctoral study entitled: "Mixing the Library: Information Interaction and the DJ”), in a prototype for working with multiple data-representations from the British Library digital collections. This practice-led investigation would be used to describe the interface requirements for collecting, enriching, mixing/linking, and visualizing information from large digital libraries.

The prototype would be built on previous work and would attempt to combine the two essential interface features of the DJ's model: continual visual presence of the collection as a writeable menu system; and a "mixing screen" that allows two (or more) data fragments from the collection to be presented, combined, and linked. These simple interface requirements would be used to build sequences through an archive by linking articles, and later be evaluated and extended as a tool for developing semantically rich data, to be incorporated into the RCUK SerenA project (serena.ac.uk) of which my doctoral study was a part.  

The tool would hopefully  operate as a powerful scholarly interface for learning and development in digital collections, for sharing annotated semantically rich data, and for communicative exchange of findings. A screencast demonstrating the interface is available here:

Prototype Mixing Interface
Prototype Mixing Interface

 

http://ablab.org/BLLabs/test/index2.html

I think it would be important to give some background and explanation as to why I think the DJ’s model of Information Interaction is valuable for working with multiple data types (images, video, sound, text etc.)

Background

'The significance of the 'information explosion' may lie not in an explosion of quantity per se, but in an incalculably greater combinatorial explosion of unnoticed and unintended logical connections.' (Swanson 1996).

 

Dan Norton DJaying
Wakanegra
Sonidero and Greenpoint Reggae support Mungos HiFi, Palma. 2012

Creativity in the DJ’s system apparently simple. It is entirely reliant upon bringing together sequences of material and exploring the possible combinations. It can do this simply as a sequence of tracks one after another, or in my complicated way which involve over layering, sampling, and mashups.

The DJ’s creativity enters the system in two fundamental information behaviours: selecting and mixing. With these two behaviours alone personal expression and intent can enter an activity that always reuses stored content.

Selecting is the principle of creative behaviour. It reduces information volume, builds useful groups of related material, and in the live event is done responsively to feedback and personal idea.

Mixing is the second creative information behaviour. It combines the material and explores the connection between articles. Mixing formulates previously unnoticed connections from within the archive.

A Model of Learning

Learning is intrinsic to the DJ’s model of interaction. Learning and memory develop through retrieval, organisation, classification, and addition of metadata. The association of human memory to the digital image of the collection (digital memory), creates a system in which the DJ can work in a creative flow with information, moving between idea and information.

The model incorporates retrieval and learning, with creative development and a publication workflow. Newly created texts are directly tested in a field of listeners, and may also be published and released as informational resources. The model is described in the image below.

 

DJ's Model of Information Interaction (Norton, 2013)
DJ's Model of Information Interaction (Norton, 2013)


The next few blogposts will discuss my experiences of using the British Library’s collections, working with Labs to develop the first functioning iteration of the interface and the future developments of the work.

 

20 November 2013

The georeferencer is back!

Coinciding nicely with GIS Day 2013 the British Library's Lead Curator of Digital Mapping Kimberly Kowal has just released online the biggest batch yet of digitised maps needing georeferencing: 2,700 of them! 

She explains, "We're asking the public to help "place" them, ie identify their locations by assigning points using modern mapping. It can be a challenge, but it is an opportunity to discover familiar areas as they existed around one hundred years ago." 

Read more about it over on the British Library Maps & Views Blog.

Or just dive right in and get georeferencing!

 

 

 

19 November 2013

Digital Research in the wild

I've been on the road recently, first at a Nineteenth Century Periodicals Day held at Liverpool John Moores University and second at the Centre for Eighteenth Studies at the University of York. Sandwiched between these excursions to the north was the Transforming Research through Digital Scholarship event here at the British Library. This event was in part a showcase of the first BL Labs competition winners, and during the other part a showcase of recent major projects funded under the AHRC Digital Transformations Theme. As the title suggests, the event contained plenty of digitally -enabled, -driven, -focused humanities researchers discussing all things digital humanities and Digital Humanities: an inspiring day at which I met plenty of new faces, and - from the buzz in the room - plenty of high quality networking was to be had. But it is the first two events I want to write about here briefly, because it was in Liverpool and York I encountered that tricky, fascinating, and ever rewarding middle ground between traditional humanities scholarship and digital research methods: a middle ground us digital folks would do well to remain ever in touch with should be wish our work to be understood and critiqued by the profession at large.

At Liverpool I was introduced to the Punch Contribution Ledgers project and the Writing Lives project, and at CECS, among other things, I heard a little about their nascent Digital Humanities Forum. It is fair to say that at both institutions there are folks dipping their toes into digital research, whether through online publication and transcription, research blogging and public history, or discussions around training, collaboration and resource consolidation. I was delighted to give talks at both, at Liverpool on Research in the Digital Age (slides with link to text/notes) and at CECS on quantitative analysis of late-Georgian satirical printing. In both cases it proved to be of enormous benefit that I am someone whose explorations of historical phenomena focus on the eighteenth and nineteenth centuries and draw on periodicals of various kinds: in short, I was able to speak around the digital from a position of mutual research interest. And so the rich discussions that were had have given me plenty to chew on, the highlights of which are worth repeating as a reminder that out there in the research community, many of the more 'basic' considerations around digital research are still being debated and resolved at a local level (I put basic within apostrophes here to stress that I neither wish to condescend nor offend: the basics of any method or approach are always worth returning to, fighting over, and being exposed to fresh eyes). So, onto those themes.

 

7046151669_960241c7e7
No, not that Data… Data photograph courtesy of Flickr user puntxote / Creative Commons Licensed

Data

How to get at data, how to develop data, and how to interpret data were common themes. That we at the British Library could provide certain large datasets with relative ease pleased people; that we are not able to do deliver these datasets online however is clearly a barrier to engagement with the curious researcher. Discussions of wrangling data turned towards the perceived complexity of such activities, how to get the most from initial and often time-consuming forays into tool use, and some terminological barriers: what, for example, .csv and .tsv are compared to .xls and why content holders see them as prefered data formats (when they're not using even less understood .xml and RDF schemas). Clearly coordinated training, outreach and myth busting in this area is needed. Finally, colleagues in the sector seem aware that a decline in quantitative approaches to historical phenomena over last two decades has left a gap in their skillset just when those skills are most needed, and a lack of practical experience of validating research based on small data alongside research based on large data (a forthcoming event at NYU looks set to add something valuable to what is an ongoing debate in the digital research community). So again a training need, though this time perhaps a coordinated discussion is needed around the skills that are embedded into humanities graduate training programmes: do humanists, for example, want to push for the introduction mandatory modules in quantitative analysis as many social science departments do for their new MSc and PhD students?

 

3183330846_b481528f5e
A one-stop shop in need of an audience...  Mega One-Stop Shop at Georgia State University photograph courtesy of Flickr user sylvar / Creative Commons Licensed

Resources

A common concern is where someone new to digital research should go to for information, tips, tools. As many of us know, this 'one stop shop' approach has a delicate history: of the many lessons that can be drawn from an infrastrucutre project such as Project Bamboo, a well intentioned attempt to draw together knowledge and expertise, is that we don't want to repeat such projects (for a useful summary of why, see Stephen Ramsay on Quinn Dombrowski’s paper at DH 2013). It is the nature of the web that makes this aggregation so tricky, for - in a sense - the aggregation tool is already there: it is called Google (or Bing, or Yahoo, or to a lesser extent - because we really need complex search algorithms for knowledge aggregation - DuckDuckGo). Savvy, well targetted searches combined with a little conversation (either in person or on DHers favourite network, Twitter), can reap huge rewards. But perhaps this can only work for the individual researcher playing around on their own terms. How does this scale to group learning? A comment I heard more than once on my travels was that seemingly introductory volumes - in which we can include Digital Humanities in Practice (2012), Understanding Digital Humanities (2012), the forthcoming Defining Digital Humanities (2013) - don't quite fit the bill: for they either speak around disparate research topics, are technical in their delivery, or are theoretically inclined. This isn't - I think - a criticism of these volumes per se, but rather a criticism of the lack of alternatives the newcomer has. Perhaps the volume invisaged - a reader that combines an introduction to key concepts with some basic tips/tutorials and examples of the potential and scope for digital research - will never happen: I certainly worry about tips and tutorials suffering from link rot, tool use decline, obselecence. But projects such as The Historian's Macroscope and the approachable ethos of Journal of Digital Humanities suggest this problem is being worked on. How far these projects go toward becoming the 'one stop shop' there is an undoubted appetite for remains to be seen. Perhaps it will be through subject specificity that the best gains will be made: if The Historian's Macroscope can prove as useful and slow-burning as Hudson’s History by Numbers (2000) has, that will be worth celebrating. And as The Historian's Macroscope is being written openly and is inviting contributions to an open peer review process, it behoves us - the community - to do our bit to make it everything those on the fringes want and need.

 

8758728512_0c7a5ba8dc
Data data everywhere, but what doth it mean?... Data Visualization of Street Trees photograph courtesy of Flickr user Intel Free Press / Creative Commons Licensed

Tool and content integration

Building tools into the digitised and born-digital content we make available seems vital to future gains in research and the integration of digital research into wider humanities research practices. Of course there are problems with this: bolting, for example, a georeferencer onto an interface for OCRd newspapers is shaping rather than enabling research - it is promoting one method over another, it is inviting outdatedness (after all, who is going to keep updating the interface and how?), and it is obscuring the fuzziness of the OCRd data. Moreover, if 'know your data!' is a mantra we seek to promote, integrating data and tools really shouldn't be the way forward. But, inspite of what I’ve just said, I think we have to. Researchers appear to want this integration, primarily - it seems - because they know they could do more, have learnt from years using ECCO and the digitised Burney Newspapers interfaces that poor OCR they are unable to see drives researchers toward awkward compromises, and are interested in finding new ways into the kind of digital research methods out there: to test whether an approach would be of use to them before embarking on getting hold of the 'raw' data, wrangling it, processing it, and the like. Voyant Tools goes some way to offering this for text analysis - but of course you need to find the text to put in it (that is unless the Old Bailey Online happens to form the core of your research) - and Locating London's Past for geospatial work, but beyond that, and apart for creating researcher unfriendly APIs, content holding heritage institutions (and we are as guilty here as anyone else...) are only just waking up to this need. Perhaps the API that sits around the Digital Public Library of America, which allows tools (such as the Serendip-o-matic) to be built on top of and integrated within the data it federates, will offer a model for how we can proceed in this area. Only time will tell.

 

5327354467_882414bb54
You, replaced... Machine image courtesy of Flickr user johnlaudun / Creative Commons Licensed

What is DH and what is it doing in my house?

I don't wish to go into all the permutations of this (especially not the 'what is DH' bit, my current default is to point people at this), but I've sensed for some time that self-identified historians, literary scholars, or similar, are having problems with the catch-all nature of DH, the breadth of the big tent, and claims to disciplinarity those within the tent might have. I don't think this stems from fearful conservatism, rather I sense that the historians I speak to see the value of digital methods and want to see more hybrid digital work they could get their teeth into: work by say, historians, using digital tools and methods as an approach, as part of their toolkit, not as all of their toolkit. The Digital History seminar at the IHR is doing good work in this area, though perhaps is struggling to pull in the non-digital subject specific attendees it needs to make significant gains, but I sense some expansion is needed: some hearts and minds outreach work, some domain specific examples becoming mainstream (I have high hopes for Bob Shoemaker and Tim Hitchcock's forthcoming book in their regard), and some myth busting. DH certainly isn't here to kill pets, snatch babies, wantonly destroy disciplines. These simple messages need to be constantly reiterated even as we celebrate our successes.

@j_w_baker

08 November 2013

The Sample Generator - Part 1: Origins

Posted on behalf of Pieter Francois.

Imagine being asked to describe the tool you always wanted when you were writing your PhD.

Imagine being asked (without having to worry too much about technical implementations), to make a case for a digital tool that would have:

  • saved you enormous time
  • allowed you to expand drastically the number of sources to study
  • allowed you to ask new and more relevant research questions 

Which digital tool would you choose?
What functionality seems crucial to you but is surprisingly lacking in your research area? 

It was with this frame of mind I decided to enter the 2013 British Library Labs competition with the idea to create a Sample Generator, i.e. a tool which is able to give me an unbiased sample of texts based on my search criteria. Being one of the chosen winners provided me with an opportunity to put together a small team of people from both within and outside the British Library to make it reality.

When studying the world of nineteenth-century travel for my PhD I used the collections of the British Library extensively. Being able to look for relevant material in roughly 1.8 million records is a researcher's dream. It can also be a curse.

Travel_query_primo

Snapshot of catalogue window with search word "travel"

How did the material I decided to look at fit into the overall holdings? Sure, my catalogue searches did produce plenty of relevant material, but how representative was the material I looked at for the overall nineteenth-century publication landscape? Even when assuming the British Library holdings are as a good a proxy as any for the entire nineteenth-century British publication landscape, this is a very difficult question to answer. Historians and literary scholars have designed many clever methodological constructs to tackle such issues of representativity, to tackle potential biases of the studied sources and to deal with gaps in their source material. Yet very few attempts have been made to deal with these issues in a systematic way.

The ever growing availability of large digital collections has changed the scale of this issue, but it did not change its nature. For example, the wonderful digital '19th Century books' collection of the British Library provides you access to approximately fifty thousand books in digital form and to the enthusiast of text and sentiment mining or scholars interested in combining distant and close reading its potential is phenomenal. However, the impressive size of the collection does not deal with the crucial questions:

How these books relate to the approximately 1.8 million nineteenth-century records the library holds?

How the digitized books of the '19th Century books' collection fit into the overall nineteenth-century publication landscape.?

Large numbers can create a false sense of completeness.

The Sample Generator does provide researchers a way to understand more fully the relation between the studied sources and the overall holdings of the British Library. Whereas a traditional title word search in the British Library Integrated catalogue generates an often long list of hits, the use of the Sample Generator allows you with a few additional clicks to generate structured unbiased samples from this list. The key innovation is that these samples mimic the rise and fall in popularity of the searched terms over the nineteenth-century as it is found in the entire British Library holdings for this period.

Depending on the amount of research time available it is possible to change the sample size (or, for cross-validation purposes, to create several samples based on the same search criteria). Furthermore, as the Sample Generator not only works with the catalogue data (metadata) of all nineteenth-century books the British Library holds, but also keeps a special focus on the metadata of the digital '19th Century book' collection (see, http://britishlibrary19c.tumblr.com/ for a representative sample), it is possible to create samples of only digitized texts. These samples can then be further queried by using advanced text analysis and data mining tools (e.g. geo-tagging). As all the samples generated by the various searches will be stored with a unique URL, the samples become citable and they can be shared with peers and be more easily used in collaborative research.

Whereas in this phase of the project the Sample Generator has only been tried out on the the nineteenth-century holdings of the British Library and on the digital '19th Century book' collection, its application is nearly universal. The Sample Generator can be implemented on any catalogue (or even bibliography) and, if relevant, links can be made to one or more digital collections.

Adding such a link with a digital collection allows users to make a different type of claim. For example, the finding 'I observed trend X in digital collection Y' is replaced by the finding 'I observed trend X in a structured unbiased sample Y which is representative of the entire catalogue/bibliography Z'. This adds an important functionality to the increasing number of large digital collections as it removes the inherent, yet often poorly documented, biases of the digitization process (although it introduces the curatorial biases of the much larger collections which are fortunately usually better documented and understood as generations of scholars have come to term with these).

Finally, the Sample Generator is a great hypothesis testing tool. Its use allows scholars to cover a lot of ground fairly quickly by testing a range of hypotheses and ideas on relatively small sample sizes. This allows for a creative, yet structured and well documented, going back and forth between the conceptual drawing board and the data. Whereas such a structured dialogue is fundamental in the natural and social sciences, it is largely lacking in the humanities where this dialogue between ideas and data has tend to happen in a more haphazardly fashion.

The past four months were spent on turning this general idea (which at times felt as overly ambitious) into reality. We faced several challenges, for example the catalogue data was incomplete and inconsistent. Furthermore, I firmly believed that it was essential to accompany this tool with some case studies highlighting its transformative potential. Given the amount of labour and the range of skill sets necessary to complete both tasks, the project had to be team based. Without both the time and intellectual contributions of Mahendra Mahey, Ben O'Steen, Ed Turner and Justin Lane the Sample Generator would still simply be the digital tool I always wanted to have. 

Pieter Francois ~ is one of the winners of the 2013 British Library Lab competition. He works at the Institute of Cognitive and Evolutionary Anthropology, University of Oxford, where he specializes in longitudinal analysis of archaeological and historical data.

The next blogposts of this short series, written by various members of the team, will focus on how to use the Sample Generator, on explaining the technical nuts and bolts at the back end of the tool, and on recounting the experiences of collecting the necessary data to test drive the tool.