Digital Research in the wild
I've been on the road recently, first at a Nineteenth Century Periodicals Day held at Liverpool John Moores University and second at the Centre for Eighteenth Studies at the University of York. Sandwiched between these excursions to the north was the Transforming Research through Digital Scholarship event here at the British Library. This event was in part a showcase of the first BL Labs competition winners, and during the other part a showcase of recent major projects funded under the AHRC Digital Transformations Theme. As the title suggests, the event contained plenty of digitally -enabled, -driven, -focused humanities researchers discussing all things digital humanities and Digital Humanities: an inspiring day at which I met plenty of new faces, and - from the buzz in the room - plenty of high quality networking was to be had. But it is the first two events I want to write about here briefly, because it was in Liverpool and York I encountered that tricky, fascinating, and ever rewarding middle ground between traditional humanities scholarship and digital research methods: a middle ground us digital folks would do well to remain ever in touch with should be wish our work to be understood and critiqued by the profession at large.
At Liverpool I was introduced to the Punch Contribution Ledgers project and the Writing Lives project, and at CECS, among other things, I heard a little about their nascent Digital Humanities Forum. It is fair to say that at both institutions there are folks dipping their toes into digital research, whether through online publication and transcription, research blogging and public history, or discussions around training, collaboration and resource consolidation. I was delighted to give talks at both, at Liverpool on Research in the Digital Age (slides with link to text/notes) and at CECS on quantitative analysis of late-Georgian satirical printing. In both cases it proved to be of enormous benefit that I am someone whose explorations of historical phenomena focus on the eighteenth and nineteenth centuries and draw on periodicals of various kinds: in short, I was able to speak around the digital from a position of mutual research interest. And so the rich discussions that were had have given me plenty to chew on, the highlights of which are worth repeating as a reminder that out there in the research community, many of the more 'basic' considerations around digital research are still being debated and resolved at a local level (I put basic within apostrophes here to stress that I neither wish to condescend nor offend: the basics of any method or approach are always worth returning to, fighting over, and being exposed to fresh eyes). So, onto those themes.
How to get at data, how to develop data, and how to interpret data were common themes. That we at the British Library could provide certain large datasets with relative ease pleased people; that we are not able to do deliver these datasets online however is clearly a barrier to engagement with the curious researcher. Discussions of wrangling data turned towards the perceived complexity of such activities, how to get the most from initial and often time-consuming forays into tool use, and some terminological barriers: what, for example, .csv and .tsv are compared to .xls and why content holders see them as prefered data formats (when they're not using even less understood .xml and RDF schemas). Clearly coordinated training, outreach and myth busting in this area is needed. Finally, colleagues in the sector seem aware that a decline in quantitative approaches to historical phenomena over last two decades has left a gap in their skillset just when those skills are most needed, and a lack of practical experience of validating research based on small data alongside research based on large data (a forthcoming event at NYU looks set to add something valuable to what is an ongoing debate in the digital research community). So again a training need, though this time perhaps a coordinated discussion is needed around the skills that are embedded into humanities graduate training programmes: do humanists, for example, want to push for the introduction mandatory modules in quantitative analysis as many social science departments do for their new MSc and PhD students?
A common concern is where someone new to digital research should go to for information, tips, tools. As many of us know, this 'one stop shop' approach has a delicate history: of the many lessons that can be drawn from an infrastrucutre project such as Project Bamboo, a well intentioned attempt to draw together knowledge and expertise, is that we don't want to repeat such projects (for a useful summary of why, see Stephen Ramsay on Quinn Dombrowski’s paper at DH 2013). It is the nature of the web that makes this aggregation so tricky, for - in a sense - the aggregation tool is already there: it is called Google (or Bing, or Yahoo, or to a lesser extent - because we really need complex search algorithms for knowledge aggregation - DuckDuckGo). Savvy, well targetted searches combined with a little conversation (either in person or on DHers favourite network, Twitter), can reap huge rewards. But perhaps this can only work for the individual researcher playing around on their own terms. How does this scale to group learning? A comment I heard more than once on my travels was that seemingly introductory volumes - in which we can include Digital Humanities in Practice (2012), Understanding Digital Humanities (2012), the forthcoming Defining Digital Humanities (2013) - don't quite fit the bill: for they either speak around disparate research topics, are technical in their delivery, or are theoretically inclined. This isn't - I think - a criticism of these volumes per se, but rather a criticism of the lack of alternatives the newcomer has. Perhaps the volume invisaged - a reader that combines an introduction to key concepts with some basic tips/tutorials and examples of the potential and scope for digital research - will never happen: I certainly worry about tips and tutorials suffering from link rot, tool use decline, obselecence. But projects such as The Historian's Macroscope and the approachable ethos of Journal of Digital Humanities suggest this problem is being worked on. How far these projects go toward becoming the 'one stop shop' there is an undoubted appetite for remains to be seen. Perhaps it will be through subject specificity that the best gains will be made: if The Historian's Macroscope can prove as useful and slow-burning as Hudson’s History by Numbers (2000) has, that will be worth celebrating. And as The Historian's Macroscope is being written openly and is inviting contributions to an open peer review process, it behoves us - the community - to do our bit to make it everything those on the fringes want and need.
Tool and content integration
Building tools into the digitised and born-digital content we make available seems vital to future gains in research and the integration of digital research into wider humanities research practices. Of course there are problems with this: bolting, for example, a georeferencer onto an interface for OCRd newspapers is shaping rather than enabling research - it is promoting one method over another, it is inviting outdatedness (after all, who is going to keep updating the interface and how?), and it is obscuring the fuzziness of the OCRd data. Moreover, if 'know your data!' is a mantra we seek to promote, integrating data and tools really shouldn't be the way forward. But, inspite of what I’ve just said, I think we have to. Researchers appear to want this integration, primarily - it seems - because they know they could do more, have learnt from years using ECCO and the digitised Burney Newspapers interfaces that poor OCR they are unable to see drives researchers toward awkward compromises, and are interested in finding new ways into the kind of digital research methods out there: to test whether an approach would be of use to them before embarking on getting hold of the 'raw' data, wrangling it, processing it, and the like. Voyant Tools goes some way to offering this for text analysis - but of course you need to find the text to put in it (that is unless the Old Bailey Online happens to form the core of your research) - and Locating London's Past for geospatial work, but beyond that, and apart for creating researcher unfriendly APIs, content holding heritage institutions (and we are as guilty here as anyone else...) are only just waking up to this need. Perhaps the API that sits around the Digital Public Library of America, which allows tools (such as the Serendip-o-matic) to be built on top of and integrated within the data it federates, will offer a model for how we can proceed in this area. Only time will tell.
What is DH and what is it doing in my house?
I don't wish to go into all the permutations of this (especially not the 'what is DH' bit, my current default is to point people at this), but I've sensed for some time that self-identified historians, literary scholars, or similar, are having problems with the catch-all nature of DH, the breadth of the big tent, and claims to disciplinarity those within the tent might have. I don't think this stems from fearful conservatism, rather I sense that the historians I speak to see the value of digital methods and want to see more hybrid digital work they could get their teeth into: work by say, historians, using digital tools and methods as an approach, as part of their toolkit, not as all of their toolkit. The Digital History seminar at the IHR is doing good work in this area, though perhaps is struggling to pull in the non-digital subject specific attendees it needs to make significant gains, but I sense some expansion is needed: some hearts and minds outreach work, some domain specific examples becoming mainstream (I have high hopes for Bob Shoemaker and Tim Hitchcock's forthcoming book in their regard), and some myth busting. DH certainly isn't here to kill pets, snatch babies, wantonly destroy disciplines. These simple messages need to be constantly reiterated even as we celebrate our successes.