Thirty months ago I joined the British Library Digital Research Team. In that time we (often with the folks from British Library Labs) have achieved a huge amount, not least putting over one million public domain images on Flickr, developing our internal training provision, and repurposing British Library collections to enrich the education and outlook of computer science and game design students. This week I say goodbye.
The Digital Research Team was created in 2010 with a broad mission that covers everything from enabling computational analysis of large scale digitised collections and creative reuse of openly licenced collections to advocacy of clear data citation and digital skills training. I often have summed our role up by saying that we are here to ensure that the British Library's digital collections are used in ways that go beyond looking at them on a webpage, an open, data, and creativity orientated approach that is at the forefront of the British Library's vision.
I came to the team from academia and a background in studying long eighteenth-century satirical prints. My data was small, perspectives narrow, and foobar modest, but my eyes, ears, and mind open. And they needed to be, for in my first month in the job the British Library celebrated enhanced powers to collect non-print materials published in the UK. In effect this meant that this library of around 170 million things had the power to collect the UK web domain. Since then the library has collected over 2 billion web pages, fundamentally changing our collection profile (see the UK Web Archive blog for more), making the British Library a place full of data as much as books. Even the beloved manuscript, I soon learnt, was not 'safe' from the bitstream also changing our collection profile were the small but growing volume of floppy disks, CD-ROMs, hard-drives, and email archives that are the archives of life in the 'Information Age'. And these personal digital archives are more than just collections of 'proper' born-digital documents typed up on personal computers, they include software, browser-caches, spam, and downloads folders, in fact they include every bit on every disk: captures of whole computing environments that can be booted up to offer an experiential window into a person's interaction with their machine.
I say can but in most cases they aren't. For as unpublished material these archives, like their paper counterparts, can only be made available to readers once we are sure we have complied with things like The Data Protection Act, a time consuming process that requires people to examine each and every digital object. This clash of possibilities speaks to two overarching themes of my thirty months with the Digital Research Team. The first is the gap that often appears between well thought out established practice and the demands of large and/or complex digital collections: in the case of born-digital manuscript collections, responsibilities to both readers and depositors compete when faced with hundreds of thousands of files. The second is the important - but often forgotten - role of decisions made by people in the creation, management, and marshalling of large and/or complex digital collections. This role may be self-evident. But data does tend to flatten and depersonalise. And interfaces to data tend to emphasise those qualities in their haste to ensure that experiences are smooth, that tensions recede from view. As someone trained to trace the provenance of evidence and to examine the role of agency and power in humanistic phenomena, I see it as important to out the personal back into our use of data. Why? Well, when you search Explore the British Library and Google Books you don't just search databases of 56 million things and over 30 million books respectively, rather you search accumulations of human labour, expertise, and decision making shaped (and constrained) by local, temporal, and organisational priorities and worldviews. When you browse Wikipedia, Wikimedia Commons, or Wikisource you rely on the production of human labour mediated through community guidelines and practices that - perhaps inevitably - introduce prejudices. When you use any computational process to take data in and push data out, the bit in the middle isn't the work of a machine but the work of a people instructing a machine, people - as Mia Ridge, Ramon Amaro and the Software Sustainability Institute, among others, remind us - with opinions, perspectives, fears, and dreams. And when you seek solace in a standard, you seek solace in something that, as a produce of human agency, can never wholly be neutral.
This may all sound a bit negative. But my point is that many of the achievements of the Digital Research Team stem from this sort of thinking, an approach that is deeply critical of techno-evangelist perspectives to the role of digital collections, methods, and approaches in society and culture. We don't assume that digital technology is the solution but rather that an approach that sees people using digital technology is one solution among many possible solutions. My job over the last thirty months has been to collaborate with amazing people both in and outside to British Library to chose the right solutions. As I move to a new position outside the British Library, I look forward to seeing the fruits of these and future decisions appear on the Digital Scholarship Blog.
James Baker -- Curator, Digital Research -- @j_w_baker