THE BRITISH LIBRARY

Social Science blog

14 November 2012

What can Public Open Data and Academia Learn From Each Other

This blog talks about the links that can be made between tools and methods used in academia and those in public open data and how the two worlds can learn from each other in order to promote greater use and re-use of data, track impact and make information more widely available 

Professor Nigel Shadbolt recently visited the Library to talk to staff about the benefits of releasing public data into the wild. He didn't need to convince me, being a public sector researcher prior to joining the library I fought many licensing and cost battles to get my hands on the data needed for my research projects. This blog isn't about making the case for opening up public data as this has been made many times and yielded numerous important benefits. Having worked in creating, using and disseminating both public and academic data I think that there are tools and methodologies that both areas can learn from each other. 

Due to policies like the Research Excellence Framework there is a big focus in academia to develop methods to measure the impact of research funded and undertaken and to promote the re-use of data created in this research. Tools like impactstory that allow researchers to gauge the impact of their research outputs are already being created using open infrastructure being supported by the British Library and our partners in science and publishing.

These tools work by assigning a digital object identifier to a research output (datasets, papers etc.) which can then be tracked through to citations. ORCID is new system that allows researchers (academic and non-academic) researchers to register a unique identifier for themselves and attach their research outputs via DOIs and other identifiers. These help to link the researchers to their outputs and membership is available to any researcher: academic, public sector, open data hacker etc.

Currently it seems to me that the impact of open data is judged by the number of visible applications that have been developed and headline findings published in the media. However imagine the situation if a dataset posted on data.gov.uk had one of these trackable DOI’s attached to it and open data users and researchers cited the use of this data using the DOI; a public sector data creator or an organisation as a whole could then track this, see what their data was being used for and what type of impact it was having. This could have potentially huge benefits for encouraging the sharing and re-use of public data and could help to provide evidence to support the collection and maintenance of certain types of well used data.

Another area we can learn from each other is around metadata, the data about the data. This is bread and butter in the academic library world, but quality is variable in the open data world. data.gov.uk is a great start at a catalogue, however the metadata in there is of variable quality, however this is not the case with all open data reposirories: The London Datastore provides a good example of an open data repository containing high quality metadata. Maintaining good metadata and exposing it openly to other organisations will make open data more visible to researchers. For example it would be great if we could take a feed of data.gov.uk and make it available in The British Library catalogue so researchers could discover a primary open data source alongside books and journals.

In academia more data that was previously locked away to the public in data centres and repositories is being made open. Last week it was great to see that the UK Data Service co-signed the Denton Declaration on open data, but the main area that academia can learn from the open data movement is to make the data properly open! Don’t make people fill in forms and register to use your data as if you do this you will lose a huge number of potential users. Good web analytics and citation metrics can hopefully give you enough feedback on your audience. In his talk at the BL Tim Shadbolt mentioned a 60% attrition rate from services that make people jump through hoops to get to the resources, however as Wikipedia would say – citation needed.

 

The British Library is UK Registration Agent for DataCite

The Library is also project coordinator for The ORCID and DataCite Interoperability Network (ODIN) 

Read more about The British Library's Datasets Program

 

These views are from John Kaye – Lead Curator Digital Social Sciences @johnkayebl and don’t necessarily reflect the views of the BL as an institution.

Comments

Hello John,

This is a really interesting post. There is definitely a need for better mechanisms to capture information on the re-use of public data from sites like Data.gov.uk. I've in the past had to use brute-force searching for links back to data.gov.uk datasets to try and track re-use, and that only picks up a handful of possible re-uses.

How feasible would it be to create an extension to the CKAN software that powers data.gov.uk that could easily assign DataCite DOIs to datasets? Importantly would then be to give users suggested citations to use (link snippets as well as citations) recognising the need to track online and offline re-use of data.

With the current discussions ongoing about data catalogue federation standards (See http://www.dataprotocols.org/ for e.g.) getting a feed out of sites like Data.gov.uk should be fairly straightforward.

When it comes to the criteria that open data should not require registration, I think it's important not to jump straight to the idea that there should be no option at all for optional registration. The latest datahub.io / CKAN platform has recognised that sometimes data users want to get updates about a dataset, and to indicate their interest in it, or use of it - and now registered users of the platform can 'follow' a dataset. In other contexts, giving people an option to give an e-mail address to get updates on when a dataset is updated etc. could complement webometric and bibliometric approaches to ascertaining dataset re-use.

Tim

Hi Tim

Thanks for the useful comment. I may have to back to you on some points.

I'd have to ask our developers about the CKAN extension, but it shouldn't be out of the realms of possibility. The important thing is getting the metadata that DataCite requires to 'mint' the DOI for each dataset, see: http://schema.datacite.org/The process is completely automated if there is good metadata.

I'm not sure the quality on data.gov.uk is good enough, but I think it might be on the London Datastore.....or if you have any other suggestions for repositories to approach?

I take your point on registration, but feel that registering for those benefits should be optional rather than mandatory, which is the approach being taken by some academic repositories, mainly for licensing agreements.

Again, thanks for the useful comment, we've only just started thinking about public data and I may get back to you for some advice/ideas if that's OK

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.