25 March 2015
Enabling Complex Analysis of Large Scale Digital Collections
Jisc have announced the projects that have been funded through their Research Data Spring programme. One of those chosen is 'Enabling Complex Analysis of Large Scale Digital Collections', a project led by Melissa Terras (Professor of Digital Humanities, UCL) in collaboration with the British Library Digital Research team.
Research Data Spring aims to find new technical tools, software, and service solutions which will improve researchers’ workflows and the use and management of their data. Following an invitational sandpit event in Birmingham last month aimed to encouraging co-design, 'Enabling Complex Analysis of Large Scale Digital Collections' was chosen from over 40 proposed projects to proceed to a three month development phase.
Our rationale for the project is that lots of money has been spent digitising heritage collections and that - as well as being objects that can be presented online for research and public use and reuse - digitised heritage collections are data. The problem of course is that non-computationally trained scholars often don't know what to ask of large quantities of data, it is common that they do not have access to high performance computing facilities, and the exemplar workflows that they need are hard to find. As a consequence, support from content providers for this category of work is regularly ad hoc and difficult to justify substantial investment in. 'Enabling Complex Analysis of Large Scale Digital Collections' aims to address this fundamental problem by extending research data management processes in order to enable novel research and a deeper understanding of emerging research needs. In the initial three month pilot period we will index a collection of circa 60,000 public domain digitised books (see 'A Million First Steps') at UCL Research IT Services and work with a small number of researchers to turn their research questions in computational analysis. The outputs from each research scenario - including derived data, queries, documentation, and indicative visualisations - will be made available as citeable, CC-BY workflow packages suitable for teaching, self-learning, and reuse. Moreover these workflows will deepen understanding of complex, poorly structured, and heterogeneous humanities data and the questions researchers could ask of that data, highlighting through use cases the potential for process and service development in the cultural sector. Details of the proposed work for after the initial three month phase are on the Figshare document embedded above.
We are also delighted that two other projects with British Library involvement have been funded through the Research Data Spring. 'Unlocking the UK's thesis data through persistent identifiers' will investigate integrating ORCID personal identifiers and DataCite DOIs into our ever growing and unique UK thesis collection. 'Methods for Accessing Sensitive Data', otherwise known as AMASED, will adapt and implement DataSHIELD technology in order to (legally) circumvent key copyright, licensing, and privacy obstacles preventing analysis of digital datasets in the humanities and academic publishing. The British Library will supply the same circa 60,000 public domain digitised books to this project to test the extension of DataSHIELD to textual data.
James Baker
Curator, Digital Research
---
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: embeds to and from external sources