Why we are big on (big)data

Big data. Yes, that again. It is tempting to think we in the humanities are a bit behind the times in getting excited right now about using large datasets to answer research questions. After all our colleagues in the sciences have been using big(ger) data for ages, and economic-minded social historians were doing quantitative work with lots of stuff in the 60s and 70s. Yet I think the key is that we humanists mean something different by big data: that being collections of stuff of a size which is both too big to deal with on our own (ergo without computers, both for processing and memorising) and without changing the ways in which we do what we do. Our big data then is not the same size as the big data produced by the LHC, but I suspect [citation needed!] that the shift in scale in each field is equivalently big.

So big data – or, to paraquote Torsten Reimer "thinking big with data" – has lurched to the forefront of humanistic inquiry. There are dissenters. Some just don't like the buzzyness word. Others see big data digital humanities (DH) as a by-product not of good research but of the managerial humanities, of a chase for money as opposed to a chase for answers. The latter I find especially troubling. Arts and humanities funders have latched onto big data DH not (IMO) because it offers results, outputs, stuff, but because it offers a genuine opportunity to do things with stuff differently, to not just ask a slightly different set of questions but to consider a different way of formulating those questions in the first place.

Of course such semantic tricks are nothing new either. Novelty will always claim the earth (especially when there are juicy funding grants involved). But believe it or not big data may actually be different, for it promises like nothing before a genuine cross-disciplinary endeavour. There is a simple reason for this: we can't be at the same time everyday historians, historians of the big picture, historians of the small picture, quantitative historians, qualitative historians, programming historians, statisticians with an eye for history, and programmers with a sense of how to interrogate past phenomena (I’ll spare you the full list of possible permutations, skillsets, et cetera). But we can build teams of people who as a sum of their parts approach just that (or even a sum of collective parts we have yet to imagine). And with genuine teams comes a potential for work with genuine novelty.

A scientist friend said to me recently that 'what you guys consider a thank you in a footnote we call a third author'. Those 'thank yous' often go to people we consider part of our team: colleagues, PhD students, RAs, postdocs. But those people are rarely considered co-authors. Big data changes that. If I am unable to work alone to 'read' the data, neither can I work alone to interpret the data nor to publish my results from that data. And as my co-authors will have to be those from the fringes of my discipline and beyond in order for us to get a handle on the big data, the decision to use big data will in turn disrupt what it means to be a historian, literary scholar et al, and what it means to produce outputs in those fields.

Having evangelised, you won't be surprised to learn that I write this whilst returning from a workshop on big data in the arts and humanities (our slides on British Library digital content here). I don't think it is letting the cat out of the bag too much to say that at that event the organisers, the AHRC, confirmed that next week they will be announcing a major funding call centring on big data. From the discussions today with researchers and colleagues in the sector, I can't wait to see how this investment changes humanistic inquiry.

@j_w_baker

Digital scholarship blog

Why we are big on (big)data

Comments