Science blog

Exploring science at the British Library

8 posts categorized "DataCite"

06 February 2015

DataCite Case Study: ForestPlots.net at the Unviersity of Leeds

In June last year, we held a DataCite workshop hosted by the University of Glasgow. We've now turned our speaker's use of Digital Object Identifiers (DOIs) for rainforest data into a video and printed case study.

You can still find a short summary of that event here. Our thanks go to Gabriela Lopez-Gonzalez for taking the time to come and film with us.

 

We hope that this case study will help institutions promote the idea of data citation and use of DOIs for data to their researchers, and that this in turn will encourage more submission of data to institutional repositories.

 

A DataCite DOI is not just for data

During January we had also been trying to spread the word that DOIs from DataCite aren't necessarily just for data. We've been working with the British Library's EThOS service to look at how UK institutions might give DOIs to their electronic theses and dissertations.

There was an initial workshop to divine the issues in November 2014, and on 16th January we held a bigger workshop, bringing more institutions together to look at how we might start to establish a common way of identifying e-theses in the UK.

The technical step of assigning a DOI to a thesis is relatively straightforward. Once an institution is working with DataCite (or CrossRef) they can use their established systems to assign a DOI to a thesis. But the policies surrounding the issue and management of this process are more complex. We're hoping that these workshops have helped everyone to pull in the same direction and collaborate on answers to common questions.

This work has given rise to a proposal to look at how to improve the connection between a thesis and the data it is built on. By triggering the consideration of sharing the data supporting a thesis, maybe we can "get 'em young" and introduce good data sharing practice as early in the research career as possible. Connecting the thesis and its data also increases the visibility of both, helping early career researchers to reap the benefits of their hard work sooner.

Watch this space to see what happens next!

 

12 December 2014

Wishing you a Merry Crystal-mas from DataCite UK

As 2014 draws to a close, it has been another busy year for us here at the Library running DataCite UK. Over the past 12 months the number of organisations that are now using DataCite DOIs in the UK has gone up to 26.

One highlight from earlier in the year was the minting of 3millionth DOI, which you can find here: http://doi.org/10.5517/CCPHZ37. This was minted as part of the work by the Cambridge Crystallographic Data Centre to assign DOIs to their crystallographic datasets. This has been a particularly nice milestone to have as 2014 has been the International Year of Crystallography.

In this year of crystallography, CCDC are by no means the only crystallography database getting DOIs for their data. Both eCrystals (http://ecrystals.chem.soton.ac.uk/) based at Southampton and the SPECTRa project at Imperial (https://spectradspace.lib.imperial.ac.uk:8443/handle/10042/13) are doing the same thing.

This work now means that there are DOIs available for the crystal structure of caffeine (http://doi.org/10.5517/CCNH4QZ), paracetamol (http://doi.org/10.5517/CC4C64T) and theobromine (http://doi.org/10.5517/CC4D14P), all things that you might want to (or might need to) partake of this Christmas.

ChocolateimageTheobromine is a key flavour compound in milk and dark chocolate, and the reason you can't feed it to your pets: theobromine is particularly toxic to animals. Image from Flickr, CC-BY-NC-SA. https://www.flickr.com/photos/jhard/11399049754 

 

 

24 June 2014

UK DataCite on the road

Our data citation workshops have gone on the road. This blog post summarises the recent event at the University of Glasgow.

On Friday 13 June, we held an Introduction to DataCite workshop at the University of Glasgow. As well introducing what DataCite is and what it does, we demonstrated the various ways you can look at what you put a DOI on (see this previous blog post), considering issues about the versioning of data, and the ‘granularity’ – whether you apply a DOI to a collection of data, individual data files or some other slice of the data. Slides from the day are available on our website.

We had two really enlightening talks from users of the service Gabriela Lopez-Gonzalez and Graham Blythe, both from the University of Leeds.

Gabriela is a researcher at Leeds, and runs the site forestplots.net. The site is part of international work to share longitudinal inventory data from permanent forest plots. Gabriela spoke passionately about how important data citation is for her and her community, and how having a persistent identifier such as a DOI for that data will help to acknowledge, not just the researchers who collected the data, but the research assistants, data managers and curators. These people play a vital role in the quality of the data, and in making sure the data are available for further research, but they do not traditionally get recognition in subsequent research papers. And most of them spend an equal amount of time camping out in the rainforest, enduring mosquitos, snakes and spiders as those who are recognised on research papers!

 

  GlasgwWSimage

Making good quality research data avialable for reuse involves many people. Image credit Gabriela Gonzalez-Lopez

 

Graham is part of the research data management team at Leeds. Gabriela provided them with a great test case early on in their planning. He talked about the process Leeds has been through in deciding how to use DOIs for their data. He was wonderfully honest in talking about where, like many institutions we’re talking to, not all the possibilities have been decided on – or even uncovered yet.

Some of the issues around using DOIs seem difficult at first, for instance what data should get a DOI and when. It can be hard to make those decisions when you’re aware of how diverse an institution’s research and data is – no one wants to set policies that will exclude important data. But while it’s good to have general rules on assigning your DOIs, it is important to be flexible as best practice evolves.

We hope to run further data citation workshops around the country, not just to provide details on working with DataCite, but also to bring institutions dealing with these issues together – keep an eye on our webpages and Twitter feed for details.

08 November 2013

Why not cite data?

Rachael Kotarski, our Content Expert for scientific datasets, explains why citing data as well as the article is the way forward.

In a previous post, Lee-Ann Coleman looked at citations in science, asking what should be cited, and what a citation means. The answers to these questions are not necessarily simple, but one response we have been hearing (and that we support), is that data needs to be cited.

Citing data not only gives credit to those who created or gathered it, but can also give some kudos to the repository that looks after it. Despite the fact that data is also key to verifying and validating research, it is not yet standard practice to cite it when writing a paper. And even if it is cited, it is rarely done in a way that allows you to identify and access that data.

DataLinks
Citation should connect the literature to its data foundations. Image source: Shutterstock.

As part of the Opportunities for Data Exchange (ODE) project, we investigated data citation and the ways in which data centres, publishers, libraries and researchers can encourage better data citation.

What does ‘better data citation’ look like and how do we encourage it to happen? We examined three aspects of current practice in order to answer this question:

  • How data is cited?
  • What data is cited?
  • Where is data cited within the article?

How to cite
A data citation needs to contain enough information to find and verify the data that was used, as well as give credit to those who spent considerable time/money/effort generating or collecting the data. The DataCite recommended data citation is just one example of how to include details that support these aims (and it’s pretty simple!):

Creator (publication year): Title. Publisher. Identifier.

What to cite
Data are not necessarily fixed, stable or homogenous objects, so citing them can be considerably more complicated than for articles. It is important for testing reproducibility that regardless of subsequent changes to the data or subsets of it, they are cited as used. Aspects such as the version used or date downloaded should also be encapsulated in the citation, where necessary. Linking users via an identifier (such as a DOI as used by DataCite) to the location of that exact version or subset of the data is also important. An example of citing a specific wave of data from GESIS demonstrates this:

Förster, Peter; Brähler, Elmar; Stöbel-Richter, Yve; Berth Hendrik (2012): Saxonian longitudinal study – wave 24, 2010. GESIS Data Archive, Cologne. ZA6242 Data file version 1.0.0, doi: 10.4232/1.11322

Where to cite in the article
Where you cite data in the article may depend on the form of the data being cited. For example, data obtained via colleagues but not widely available may be best mentioned in acknowledgements, and data identified by accession numbers could be cited inline in the body of the article. But the interviewees who participated in the ODE study largely advocated citation of datasets in the full reference list, to promote tracking and credit. In order to do this, data needs a full, stable citation, which also depends on reliable, long-term storage and management of the data. Of course publisher requirements play an important role. But that’s a post for another day!

These are the three ‘simple’ steps to better citation of data, but there are still cultural and behavioural barriers to sharing data. In the ODE report we concluded that the whole community - researchers, publishers, libraries and data centres - all have a role in promoting and encouraging data citation.

ODE

The recent Out of Cite, Out of Mind report has since updated and greatly extended the ODE work, with an excellent set of first principles for data citation:

CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal vol. 12 p. CIDCR1-CIDCR75 doi: 10.2481/dsj.OSOM13-043

I recommend it – and encourage anyone thinking about citing their data (or anyone else’s) to stop thinking and start doing it.

 

04 October 2013

Collecting new data

This week Elizabeth Newbold and Lee-Ann Coleman reflect on a week of data related meetings in Washington DC

You can’t go to Washington DC and not go to the Air and Space Museum – at least not if you’re interested in science and have a free day. We saw a great exhibition about the Wright brothers and their experiments into flight, which highlighted the value of collecting new data. The brothers relied upon existing ‘tables of coefficients’ to factor into their equations for calculating lift and drag upon different wing shapes. But their experiments showed that the coefficient for the density of air – in use since the 18th Century – did not appear to be right. They determined a new average which, using modern techniques, was shown to be very close to the correct value. What a great demonstration of the value of re-use of data and being open to evaluating it in the light of new evidence.
Most of the week of 16-20 September was not spent at great museums but at the Research Data Alliance second plenary  and the DataCite Summer Meeting. Both were held (mostly) in the beautiful National Academy of Sciences building, where a statue of Einstein looks benignly over the gardens.

Einstein

This was our first time at an RDA meeting – not surprising, since it is a relatively new venture brought about by the US National Science Foundation, the Australian Government and the European Commission – so we weren’t too sure what to expect. On the first day, an array of impressive speakers highlighted the value of access to research data. The second day was reserved for meetings of the working groups and on the third day, representatives of these reported back to the whole assembly. These meetings are not typical academic conferences but are intended to be working meetings and will be held twice a year, meaning that being involved requires significant commitment.

The speakers, including Tom Kalil (Deputy Director for Technology and Innovation, White House Office of Science and Technology Policy) and John Wilbanks (Chief Commons Officer, Sage Bionetworks) highlighted the need to speed up scientific discovery and its applications. For this to happen the implementation of frameworks, legal as well as technical, are required. President Obama signed the open data executive order in May this year - but implementation is the next step.
The RDA aims to create a community of practice and a pipeline of impact and it is doing this through both working and interest groups. Interest groups cover broad topics that are on-going, with loosely defined goals but as clarity emerges about a problem or issue to address they may then develop into working groups. Working groups produce case statements of what they will do, meet virtually every 4 to 6 weeks and are expected to last around 12 – 18 months. Some of these groups are very focussed, addressing a particular, often technical, issue but others seem less well defined. Given that the RDA is just becoming established, great progress has been made but with a ‘ground-up’ approach, it is difficult to know if all the issues are being addressed. They are currently seeking an executive director – who will hopefully provide a clearer sense of direction and be able to provide an aerial view of the landscape and a better articulated strategy for achieving the aims.
And then it was onto particle physics –for the start of the DataCite summer meeting. Salvatore Mele, Head of Open Access at CERN told us that the dataset providing evidence of the Higgs Boson had been cited in a paper with a DataCite DOI.

Some other highlights from the meeting included the presentation from Michael Witt of Purdue University about the Purdue University Research Repository – aka PURR. It has a lot of nice features to encourage researchers to upload, store and produce data management plans. It issues DOIs and emails users monthly with metrics and offers secure and reliable preservation for 10 years.

There were presentations from a range of organisations with interests in citation and identification. Thomson Reuters discussed their data citation index, launched last year, with 3m records. While it is not aiming to be most comprehensive it is aiming to link to important, relevant scientific data. CrossRef highlighted the new service they are offering called FundRef which aims to enable tracking of funding sources to publications and other outputs. A pilot is underway involving several US funding agencies and the Wellcome Trust and a registry of over 4000 funding body names has been created. Ultimately, to funder IDs, grant numbers and DOI could be linked. The presentation from ORCID – an identifier service for individuals demonstrated that it’s not just for the John Smiths! Over 280,000 identifiers have been issued since October 2012. Grant submission systems are starting to ask for ORCID IDs during submission process and HEIs are also getting on board. Some publishers are also requesting it.

So a lot of interesting data for thought – but considering the Wright brothers again provides a reminder that the reason for all of this activity is to enable research and support those people generating the data in the first place, to make better use of it and as result enhance science.

02 August 2013

Show me more data

Expanding on last week’s post on open data, today we look at our role in DataCite and how we are supporting the UK research data community.

The British Library is one of the founding members of DataCite, an international organisation bringing together the research data community to work collaboratively on the challenges of making research data visible, accessible and citable. DataCite is a registration agency for Digital Object Identifiers (DOIs), and the British Library is an allocating agent on behalf of DataCite. We provide an infrastructure that supports simple and effective methods of discovery and access. We work with data centres and other organisations to enable them to assign to DOIs to data. 


Since 2011, the Library’s Science team has been developing DataCite services in the UK. In practical terms, this has involved working with a range of organisations that create, manage or archive data, setting them up on the system, so that they can assign DOIs (a process known as minting - we even have mints, pictured, to prove it!), Mints1working on the DataCite metadata schema and ensuring our community’s needs are represented within the global DataCite membership. To support this work, we have organised a series of workshops, exploring the various aspects of data citation, as well as the requirements for working with DataCite and DOIs.


We’ve covered a lot of topics in the last year. From the basics - such as what does minting a DOI actually mean and how do I do it? (you can find out how in our YouTube video) and what should I put a DOI on - to more complex subjects such as how do I deal with sensitive data or different versions? We’ve had lively discussions at all of the workshops, supported by excellent presentations from colleagues who are working with research data. You can see the full list of topics covered and presentations from the workshops on our webpages www.bl.uk/datasets

 

In addition to running workshops, we’ve been out and about talking to colleagues in universities - discussing how they can use the service as well as hearing about the challenges they face in managing research data. These meetings and workshops have provided opportunities to explore how we can work together – across a range of institutions and disciplines. What is certain and, I think reassuring for everyone, is that no one has all the answers – processes and practices are evolving but it is encouraging that we can work on solutions together. If you’d like to talk to us or arrange a workshop for your organisation, then do get in touch ([email protected])


We’ll be coming back to issues in research data management and data citation in future posts but for now we’re looking forward to a week of discussion and debate at the Research Data Alliance meeting and DataCite Summer meeting in September.


Elizabeth Newbold

26 July 2013

Show me the data

Libraries just worry about books, right? Wrong! We also worry about data. If you want to provide a useful service to the research community (and that community includes anyone who wants to do research), you need to think about all the information, including research data sets, that people may need. But we recognise that isn’t always easy to do.

The Royal Society’s 2012 report on science as an open enterprise focused on the value of research data and, at a recent meeting, Professor Geoffrey Boulton who led the study noted that ‘open science’ approaches are not new. Henry Oldenburg, the 17th-century German natural philosopher and first Secretary of the Royal Society, ensured all his scientific correspondence was written in vernacular (and not Latin, as was the norm), and that all his observations were supported by supplementary evidence (and not just assertions).

Thus Boulton reflected that while the value of supporting reproducibility and providing an evidence base had been recognised very early on, many journals no longer published the results in tandem with the underlying data. Fortunately the technology is now allowing many publishers and others to provide better access to the data.

In some areas of science there has been a culture of data sharing. If researchers are sequencing DNA from any species they are asked to submit it to GenBank: a database established to ensure that scientists have access to the most up-to-date and comprehensive DNA sequence information. Most publishers require the researchers to provide evidence that they have added their data to GenBank before publication. So, if you work on sequencing DNA, getting access to other people’s data is relatively easy – but that is not necessarily the case for many other areas of science.

DNA sequence shutterstock_53986852

The reasons are complex. In many areas of research, there are no established or permanent stores for the many types of data that are produced. For researchers, the data they collect or generate is the primary output of the research and therefore comprises their intellectual capital. Many researchers are concerned about receiving appropriate credit for their efforts and that may not happen if they share their data with all and sundry. But that objection could be tackled if researchers could cite data – and thereby be recognised for their contribution.

Picture1

The British Library is a founding member of an organisation called DataCite which, as the name suggests, was established to enable data to be cited. We have been working with a range of organisations responsible for managing, storing and preserving data from a variety of areas – everything from archaeology to atmospheric science – to enable them to attach a ‘digital tag’ to data that allows it to be referenced. This tag is ‘persistent’, so that even if the data is no longer available, it will be possible to find out what has happened to that resource. We hope when someone says – ‘show me the data’ – we will have played a role in making that possible.

Lee-Ann Coleman and Allan Sudlow

02 July 2013

Introducing our new Science blog...

Welcome to our brand new Science blog bought to you by the Science Team at the British Library. We hope to inform, inspire and surprise you as we highlight the work that we do and the things that interest us. We’re also keen to hear what interests you, so please let us know and we’ll try to cover it.

Before the science bit, comes the history bit… The British Library is distinctive in many ways and one of its unexpected aspects is that, unlike many other national libraries, we cater for science, as well as the humanities. In fact, this remit was written into the British Library Act (1972) when a number of separate institutions, including the National Reference Library of Science and Invention and the National Lending Library for Science and Technology, were brought together to create 'a national centre for reference, study, and bibliographic and other information services, in relation both to scientific and technological matters and to the humanities'. They finally merged physically when the St Pancras building opened in 1998.

Although the public may be less aware of the role that the British Library plays in science, many people needing access to scientific information make extensive use of our two Science Reading Rooms in London. We also offer access to scientific articles through our document supply service. But we do much more than that.

Office
The Science Team is working on providing scientific information to more people, wherever they are. We have done quite a bit of research ourselves to understand how contemporary researchers discover and use information, not only to enhance our existing provision but to develop new services. We have been involved in Europe PubMed Central since 2006 – providing access to millions of biomedical research articles for free. We are also developing a resource – called Envia - for environmental scientists interested in flooding, which provides free access to relevant resources. Scientific data is generated in increasingly large volumes and discovering and accessing it requires new methods of gathering that information and pointing people in the right direction. The Science Team has been cataloguing datasets to make them more discoverable and is also delivering the UK DataCite service which enables datasets to be cited. While providing access to information is our core business, the British Library also has a fantastic space where scientists, researchers and the public can meet, debate issues and be challenged by new ideas. Our TalkScience events have a loyal following and we celebrate science with an annual public events programme called Inspiring Science. Next year will see a science-themed exhibition at the Library – called Beautiful Science – exploring scientific data visualisation from past to present. We’ll be keeping you updated about plans and progress on that.

Shutterstock_91957307
Science permeates every aspect of our lives but you don’t have to wear a lab coat to be a scientist. By having curiosity and asking questions about the world, ourselves, where we’ve come from and what the future might hold is to think scientifically. Of course having access to trusted information helps us to understand what is already known and where the boundaries lie and what remains to be discovered. We hope that you will discover some new information in our blog posts, ask questions, make requests and use the resources that our experts highlight to explore new horizons.

You can expect to hear from us weekly so look out for our next post and follow us on Twitter for more frequent news, information and resources – @ScienceBL

Lee-Ann Coleman