What can Public Open Data and Academia Learn From Each Other
This blog talks about the links that can be made between tools and methods used in academia and those in public open data and how the two worlds can learn from each other in order to promote greater use and re-use of data, track impact and make information more widely available
Professor Nigel Shadbolt recently visited the Library to talk to staff about the benefits of releasing public data into the wild. He didn't need to convince me, being a public sector researcher prior to joining the library I fought many licensing and cost battles to get my hands on the data needed for my research projects. This blog isn't about making the case for opening up public data as this has been made many times and yielded numerous important benefits. Having worked in creating, using and disseminating both public and academic data I think that there are tools and methodologies that both areas can learn from each other.
Due to policies like the Research Excellence Framework there is a big focus in academia to develop methods to measure the impact of research funded and undertaken and to promote the re-use of data created in this research. Tools like impactstory that allow researchers to gauge the impact of their research outputs are already being created using open infrastructure being supported by the British Library and our partners in science and publishing.
These tools work by assigning a digital object identifier to a research output (datasets, papers etc.) which can then be tracked through to citations. ORCID is new system that allows researchers (academic and non-academic) researchers to register a unique identifier for themselves and attach their research outputs via DOIs and other identifiers. These help to link the researchers to their outputs and membership is available to any researcher: academic, public sector, open data hacker etc.
Currently it seems to me that the impact of open data is judged by the number of visible applications that have been developed and headline findings published in the media. However imagine the situation if a dataset posted on data.gov.uk had one of these trackable DOI’s attached to it and open data users and researchers cited the use of this data using the DOI; a public sector data creator or an organisation as a whole could then track this, see what their data was being used for and what type of impact it was having. This could have potentially huge benefits for encouraging the sharing and re-use of public data and could help to provide evidence to support the collection and maintenance of certain types of well used data.
Another area we can learn from each other is around metadata, the data about the data. This is bread and butter in the academic library world, but quality is variable in the open data world. data.gov.uk is a great start at a catalogue, however the metadata in there is of variable quality, however this is not the case with all open data reposirories: The London Datastore provides a good example of an open data repository containing high quality metadata. Maintaining good metadata and exposing it openly to other organisations will make open data more visible to researchers. For example it would be great if we could take a feed of data.gov.uk and make it available in The British Library catalogue so researchers could discover a primary open data source alongside books and journals.
In academia more data that was previously locked away to the public in data centres and repositories is being made open. Last week it was great to see that the UK Data Service co-signed the Denton Declaration on open data, but the main area that academia can learn from the open data movement is to make the data properly open! Don’t make people fill in forms and register to use your data as if you do this you will lose a huge number of potential users. Good web analytics and citation metrics can hopefully give you enough feedback on your audience. In his talk at the BL Tim Shadbolt mentioned a 60% attrition rate from services that make people jump through hoops to get to the resources, however as Wikipedia would say – citation needed.
The British Library is UK Registration Agent for DataCite
The Library is also project coordinator for The ORCID and DataCite Interoperability Network (ODIN)
Read more about The British Library's Datasets Program
These views are from John Kaye – Lead Curator Digital Social Sciences @johnkayebl and don’t necessarily reflect the views of the BL as an institution.