Science blog

Exploring science at the British Library

29 posts categorized "Data"

20 January 2014

Beautiful Science Preview

Johanna Kieniewicz spills a few beans on the upcoming British Library exhibition

We are now just a month out from the British Library’s first science exhibition: Beautiful Science: Picturing Data, Inspiring Insight. Life in our team right now is a whirlwind of writing captions, finalising commissions, testing interactives and liaising with our press office. But all for a good reason. Opening February 20th, Beautiful Science will highlight the very best in graphical communication in science, linking classic diagrams from the Library’s collections to the work of contemporary scientists. The exhibition will cover the subject areas of public health, weather and climate and the tree of life, telling stories both of advances in science, as well as look at the way in which we communicate and visualise scientific data.

 

Picturing Data

Data is coming out our ears. From data collected by our mobile phones and movements about the city to the data acquired by scientists when sequencing genomes or smashing subatomic particles together, the quantities are vast. While a simple table of numbers is a form of data visualisation in itself, our human ability to scan, analyse and identify patterns and trends is limited.


Farr-cholera-opt
William Farr, 1852, Report on the Mortality of Cholera in England 1848-1849

Whilst today we see a proliferation of data visualisation, it is hardly a new phenomenon, and might even be considered a rediscovery of the ‘Golden Age’ of statistical graphics of the late 19th century. Like today, the Victorian period featured a confluence of new techniques for data collection, developments in statistics and advances in technology created an environment in which data graphics flourished. In Beautiful Science, we highlight a number of graphics from this period—some of which are well known, others of which may prove to be more of a surprise, such as this piece on cholera mortality by epidemiologist and statistician William Farr.

 

Inspiring Insight

The very best visualisations of scientific data, do not merely present it, but also inspire insight and reveal meaning. Data visualisation is both a tool through which we can analyse and interpret data, but also functions as a method by which we communicate its meaning. It is most powerful when it does both.

Circos-small-nov-opt
Circles of Life, Martin Krzywinski, 2013

In curating Beautiful Science, we were keen to highlight the ways in which the visualisation of data is integral to the scientific process, as well as the way cutting edge science is communicated. The Circos diagrams used to display genomic data do this very well. In Beautiful Science, you can examine a comparison of the human genome with both closely and distantly related animals. Here, you see that we are quite closely related to the chimpanzee (though we presume you knew that already). But what about a chicken or a platypus? You’ll have to come to the exhibition and see for yourself.

 

 

Beautiful Science

Should we impose an aesthetic upon the presentation of scientific information? Or is beauty indeed in the eye of the beholder? We take a rather agnostic position in this debate, and rather seek to inspire the exhibition visitor with both intriguing images and inspiring ideas. What is clear, however, is that scientists should take care and be thoughtful when producing their graphics. In a world where research impact is ever more important, producing images that compellingly communicate discoveries is of increasing importance.

NASAPerpetualOcean
NASA/Goddard Space Flight Center Scientific Visualization Studio

Compelling imagery is something at which the NASA Scientific Visualisation Studio excels. Something like a model of ocean currents might potentially be quite dry and dull. Originally developed for a scientific purpose, would not colour coded vectors increasing and decreasing size not do the job? With a leap of insight, they developed a visualisation that is both informative and inspiring. We hope you will watch it with awe in the entry to the exhibition, tracking the Gulf Stream as it moves water northwards towards the British Isles, bringing us our temperate climate.

 

Even More Beautiful Science

A fantastic programme of events will also accompany the exhibition. From serious debate to science comedy shows, competitions, workshops and family activities, we’ve developed a programme that’s designed to make you think. Please join us!

 

Beautiful Science runs from 20 February to 26 May, 2014, is sponsored by Winton Capital Management, and is free to the public.

10 January 2014

Gathering dust? Opening up access to PhD research

In our first blog post of 2014, Katie Howe explores another of the services that we offer to contemporary researchers - the British Library’s e-thesis collection, EThOS.

I finished my PhD in 2012. Four years of blood, sweat and tears were summed up in one 200 page document neatly bound in blue cotton. But who has actually read my thesis? Well, my supervisor read it very closely, suggesting many alterations and improvements. My viva examiners read it. I like to think that contemporary members of my old lab might refer to it when working on some of the methods I developed. But what about its wider impact? Is my thesis destined to gather dust or simply be used as a bookend?

Theses cropped
PhD theses ready for submission (Photo: Katie Howe)

My thesis is deposited in the UCL Discovery repository and the full text will soon be available via the British Library’s e-thesis service, EThOS, meaning that people will be able to access the information even though much of the data hasn’t been published in an academic journal. EThOS works by harvesting information from university and institutional repositories, thereby creating a single point of access for doctoral theses from across the UK. The EThOS website has records for over 300,000 UK theses and for 100,000 of these, it is possible to access the full text instantly - either by downloading directly from EThOS or via a link to the relevant institutional website. If you haven’t used EThOS before then you can give it a try here. The great thing about EThOS is that it can be accessed remotely, from all over the world. You can use it to search and read theses on your topic of interest, or to research the work of individuals in your field. EThOS can also be useful in finding out how to structure a thesis and some people even use it for leisure purposes in researching their own interests.

As one user noted, “a wealth of primary data is buried in theses, which can shed light on very interesting areas that may have been missed for decades”. PhD theses are increasingly recognised as important sources of information and although the UK’s open access initiatives relate mainly to journal articles, open access to PhD theses has been embraced by UK universities and research councils. Nowadays, many institutions require PhD graduates to deposit their thesis in a local repository and services such as EThOS facilitate access to this material.

Shutterstock_58965016
EThOS contains records for over 300,000 UK theses (Image: Shutterstock)

Although my thesis might not quite be worthy of a Nobel Prize, having it available on EThOS will undoubtedly increase the visibility of the unpublished information within it. With about 35,000 theses viewed per month on EThOS, hopefully someone will find my thesis useful rather than having it languishing on my bookshelf gathering dust.

Katie Howe

29 November 2013

From base pairs to bedside...

Katie Howe and Allan Sudlow report on their experiences from the EMBL Genomics, Medicine and Society conference.

The first draft of a complete human genome was published in 2001. It took 13 years to complete and cost a massive $2.7bn. Since then the cost of genome sequencing has plummeted. 12 years on it is now possible to sequence a human genome for less than $1000.

In light of these rapid advances, the EMBL Genomics, Medicine and Society conference brought together a diverse audience to explore how new genomic technologies may benefit public health and to discuss some of the challenges for the future. The conference was part of EMBL’s Science and Society conference series, which aims to consider how advances in biology impact on society. Our TalkScience events have explored social and ethical consequences of genetic technologies in the past, whether it be genetic testing kits sold to the public, or pre-implantation genetic testing in fertility clinics.

During the conference we heard about the wide range of projects that aim to enhance our understanding of genome function and help us pinpoint the genetic mutations that lead to particular diseases. Examples include the 1000 Genomes Project and the International Cancer Genome Consortium. Recently, David Cameron committed £100m for sequencing the genomes of 100,000 people in the UK. These projects generate huge quantities of valuable genomic information but this presents serious problems for data storage and management. Professor Eric Green, Director of the National Human Genome Research Institute, noted that, “We are no longer data limited. We are analysis limited”.

Shutterstock_125006387

The Human Genome (Shutterstock)

In the British Library’s Science team we are interested in the generation, storage and re-use of scientific information and data, so were particularly keen to hear Dr Paul Flicek’s presentation on “genomics as an information science”. Paul observed that if the price of a Ferrari had fallen at the same rate as genome sequencing it would now cost less than a dollar! He noted that nowadays, a major cost associated with genome sequencing is the storage and management of genomic information and this is set to become the dominant cost as the price of the sequencing itself drops further. Paul reminded us that in order to fulfil the promise of genomic research treating disease, ambitious plans to sequence the genomes of huge numbers of individuals must be accompanied by major investment in the infrastructure to support data management and advanced data analysis.

The shrinking cost of genome sequencing has also led to a thriving industry in direct-to-consumer (DTC) genetic testing. For $100, personal genome companies such as 23andme and Navigenics offer members of the public the opportunity to have their DNA tested and uncover their predisposition to certain genetic conditions, which may then inform their healthcare options. Just this week, the United States Food and Drug Administration (FDA) issued a warning to 23andme to cease the marketing of their personal DNA spit kit due to concerns over the “public health consequences of inaccurate results” from their service. But others have argued that consumers have the right to their own genetic information, and the emphasis should be on educating doctors and patients about how to interpret the results rather than banning these tests. This story illustrates the controversy surrounding genomic medicine.

While personalised genomic medicine holds enormous potential for public health, conference speaker Professor Tim Caulfield warned that the benefits of DTC genetic testing are often amplified in promotional material. Tim’s opinion was that getting enough exercise, eating healthily and not smoking will have greater health benefits than many of the unproven personalised genomic approaches that are being marketed.

Whilst in Heidelberg, we also found some time to explore some of the local sights. We visited the ancient Heidelberg castle and the “student prison”. But the highlight for us was the German Apothecary Museum - a veritable treasure trove of historical scientific equipment. We spotted a 19th century Bunsen Burner - very topical since the iconic burner was invented by Robert Bunsen and his colleague Peter Desaga at Heidelberg University in 1851.

P1040855

German Apothecary Museum (Photo: Allan Sudlow)

We left Heidelberg thinking that although genomic technologies are undoubtedly a source of great promise, they also present many ethical, social and legal issues. There remains a huge challenge in translating recent advances in genomics into tangible healthcare solutions.

Katie Howe and Allan Sudlow

08 November 2013

Why not cite data?

Rachael Kotarski, our Content Expert for scientific datasets, explains why citing data as well as the article is the way forward.

In a previous post, Lee-Ann Coleman looked at citations in science, asking what should be cited, and what a citation means. The answers to these questions are not necessarily simple, but one response we have been hearing (and that we support), is that data needs to be cited.

Citing data not only gives credit to those who created or gathered it, but can also give some kudos to the repository that looks after it. Despite the fact that data is also key to verifying and validating research, it is not yet standard practice to cite it when writing a paper. And even if it is cited, it is rarely done in a way that allows you to identify and access that data.

DataLinks
Citation should connect the literature to its data foundations. Image source: Shutterstock.

As part of the Opportunities for Data Exchange (ODE) project, we investigated data citation and the ways in which data centres, publishers, libraries and researchers can encourage better data citation.

What does ‘better data citation’ look like and how do we encourage it to happen? We examined three aspects of current practice in order to answer this question:

  • How data is cited?
  • What data is cited?
  • Where is data cited within the article?

How to cite
A data citation needs to contain enough information to find and verify the data that was used, as well as give credit to those who spent considerable time/money/effort generating or collecting the data. The DataCite recommended data citation is just one example of how to include details that support these aims (and it’s pretty simple!):

Creator (publication year): Title. Publisher. Identifier.

What to cite
Data are not necessarily fixed, stable or homogenous objects, so citing them can be considerably more complicated than for articles. It is important for testing reproducibility that regardless of subsequent changes to the data or subsets of it, they are cited as used. Aspects such as the version used or date downloaded should also be encapsulated in the citation, where necessary. Linking users via an identifier (such as a DOI as used by DataCite) to the location of that exact version or subset of the data is also important. An example of citing a specific wave of data from GESIS demonstrates this:

Förster, Peter; Brähler, Elmar; Stöbel-Richter, Yve; Berth Hendrik (2012): Saxonian longitudinal study – wave 24, 2010. GESIS Data Archive, Cologne. ZA6242 Data file version 1.0.0, doi: 10.4232/1.11322

Where to cite in the article
Where you cite data in the article may depend on the form of the data being cited. For example, data obtained via colleagues but not widely available may be best mentioned in acknowledgements, and data identified by accession numbers could be cited inline in the body of the article. But the interviewees who participated in the ODE study largely advocated citation of datasets in the full reference list, to promote tracking and credit. In order to do this, data needs a full, stable citation, which also depends on reliable, long-term storage and management of the data. Of course publisher requirements play an important role. But that’s a post for another day!

These are the three ‘simple’ steps to better citation of data, but there are still cultural and behavioural barriers to sharing data. In the ODE report we concluded that the whole community - researchers, publishers, libraries and data centres - all have a role in promoting and encouraging data citation.

ODE

The recent Out of Cite, Out of Mind report has since updated and greatly extended the ODE work, with an excellent set of first principles for data citation:

CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal vol. 12 p. CIDCR1-CIDCR75 doi: 10.2481/dsj.OSOM13-043

I recommend it – and encourage anyone thinking about citing their data (or anyone else’s) to stop thinking and start doing it.

 

04 October 2013

Collecting new data

This week Elizabeth Newbold and Lee-Ann Coleman reflect on a week of data related meetings in Washington DC

You can’t go to Washington DC and not go to the Air and Space Museum – at least not if you’re interested in science and have a free day. We saw a great exhibition about the Wright brothers and their experiments into flight, which highlighted the value of collecting new data. The brothers relied upon existing ‘tables of coefficients’ to factor into their equations for calculating lift and drag upon different wing shapes. But their experiments showed that the coefficient for the density of air – in use since the 18th Century – did not appear to be right. They determined a new average which, using modern techniques, was shown to be very close to the correct value. What a great demonstration of the value of re-use of data and being open to evaluating it in the light of new evidence.
Most of the week of 16-20 September was not spent at great museums but at the Research Data Alliance second plenary  and the DataCite Summer Meeting. Both were held (mostly) in the beautiful National Academy of Sciences building, where a statue of Einstein looks benignly over the gardens.

Einstein

This was our first time at an RDA meeting – not surprising, since it is a relatively new venture brought about by the US National Science Foundation, the Australian Government and the European Commission – so we weren’t too sure what to expect. On the first day, an array of impressive speakers highlighted the value of access to research data. The second day was reserved for meetings of the working groups and on the third day, representatives of these reported back to the whole assembly. These meetings are not typical academic conferences but are intended to be working meetings and will be held twice a year, meaning that being involved requires significant commitment.

The speakers, including Tom Kalil (Deputy Director for Technology and Innovation, White House Office of Science and Technology Policy) and John Wilbanks (Chief Commons Officer, Sage Bionetworks) highlighted the need to speed up scientific discovery and its applications. For this to happen the implementation of frameworks, legal as well as technical, are required. President Obama signed the open data executive order in May this year - but implementation is the next step.
The RDA aims to create a community of practice and a pipeline of impact and it is doing this through both working and interest groups. Interest groups cover broad topics that are on-going, with loosely defined goals but as clarity emerges about a problem or issue to address they may then develop into working groups. Working groups produce case statements of what they will do, meet virtually every 4 to 6 weeks and are expected to last around 12 – 18 months. Some of these groups are very focussed, addressing a particular, often technical, issue but others seem less well defined. Given that the RDA is just becoming established, great progress has been made but with a ‘ground-up’ approach, it is difficult to know if all the issues are being addressed. They are currently seeking an executive director – who will hopefully provide a clearer sense of direction and be able to provide an aerial view of the landscape and a better articulated strategy for achieving the aims.
And then it was onto particle physics –for the start of the DataCite summer meeting. Salvatore Mele, Head of Open Access at CERN told us that the dataset providing evidence of the Higgs Boson had been cited in a paper with a DataCite DOI.

Some other highlights from the meeting included the presentation from Michael Witt of Purdue University about the Purdue University Research Repository – aka PURR. It has a lot of nice features to encourage researchers to upload, store and produce data management plans. It issues DOIs and emails users monthly with metrics and offers secure and reliable preservation for 10 years.

There were presentations from a range of organisations with interests in citation and identification. Thomson Reuters discussed their data citation index, launched last year, with 3m records. While it is not aiming to be most comprehensive it is aiming to link to important, relevant scientific data. CrossRef highlighted the new service they are offering called FundRef which aims to enable tracking of funding sources to publications and other outputs. A pilot is underway involving several US funding agencies and the Wellcome Trust and a registry of over 4000 funding body names has been created. Ultimately, to funder IDs, grant numbers and DOI could be linked. The presentation from ORCID – an identifier service for individuals demonstrated that it’s not just for the John Smiths! Over 280,000 identifiers have been issued since October 2012. Grant submission systems are starting to ask for ORCID IDs during submission process and HEIs are also getting on board. Some publishers are also requesting it.

So a lot of interesting data for thought – but considering the Wright brothers again provides a reminder that the reason for all of this activity is to enable research and support those people generating the data in the first place, to make better use of it and as result enhance science.

13 September 2013

Measure for Measure

Those who fund UK research, including the public, should expect to know about the outputs of that work. However, as Allan Sudlow discovered, this is a complex and expensive activity that needs better co-ordination.

The UK does not currently have a national reporting infrastructure that brings together all the information on public and charity-funded research. At a high level, such a system would allow those who had access, to evaluate inputs (e.g. money, people, time) against outputs (e.g. publications, patents, data). No such unified system exists, which makes it impossible to look across different sources of funding in any detailed way to assess the impacts of the research at a national or international level. And this is ignoring the more difficult task of evaluating the longer term benefits of such research, for example identifying how investing research money, time and effort in biomedical research has led to improvements in human health.

That’s not to say people aren’t trying. In fact, a large number of people employed by organisations that receive public funding for research, e.g. universities, are working to bring together all the information on their institute’s research spend and outputs for reporting and evaluation. Similarly, the UK Government , UK Research Councils, research universities, institutes and a huge number of different charities, foundations and trusts, have invested in IT systems and people to do just the same. A big driver for much of this activity across UK universities is the forthcoming Research Excellence Framework (REF), which in 2014 will evaluate the research outputs and impacts arising from all government-funded higher education institutes across the UK.

Until fairly recently however, this investment in IT systems and people has not been co-ordinated. Thus, research organisations across the UK are at different levels of maturity in managing research information. Some larger organisations have invested in commercial systems such as ResearchFish. Others have developed in-house systems to facilitate the gathering of information, particularly smaller organisations with limited resources, many of whom still rely on storing data in spread sheets and preparing information by hand. This has inevitably resulted in duplication and increased costs due to inefficiency across the sector as a whole.

Brains vs Cash

What's the measure? Copyright Photos.com.

So why isn’t it all coalescing into a single dedicated system for research evaluation? Well aside from the many different motivations for developing research information systems, layer on to this the complexities of all the different stakeholder views. For example, beyond a simple agreement of “we need to gather information on X”, there then needs to be agreement on what exactly can and should be measured, how often, for how long, in what format and structure, etc, etc.

Having said all that, there are a range of projects and developments that are attempting to bring some coherence to the world of research reporting. Some of this is happening by default, as organisations begin to use the same IT systems, and some of it is being led top-down by UK Government projects such as Gateway to Research which attempt to provide some level of visibility and access to research information to people outside of the academic research community.

In a bottom-up approach, I am involved in a JISC-funded feasibility study called UK Research Information Shared Service: UKRISS. This project has examined the motivations and needs of those involved in research reporting alongside an analysis of the current landscape of research information systems and standards. Our aim is to define an approach (based on a common research information format called CERIF) to allow better research information sharing and benchmarking across different organisations which are already using different systems. A small attempt to tackle what remains a big challenge.

Allan Sudlow

16 August 2013

Divining the Deluge

Data visualisation isn’t just about making pretty graphics. It’s also helping scientists make new discoveries. Johanna Kieniewicz explores how data is displayed and provides a teaser for the Science Team’s upcoming exhibition.

We are awash in data. Whether it’s the vast amounts of genomic data being sequenced every day by bioscientists, data generated by human activity and transactions, or the 15 petabytes of data produced per year by the Large Hadron Collider, we are up to our necks in data. But hopefully swimming, not drowning. Mechanisms are being set up to harness the power of this data and make sure it is suitable and available for future use. DataCite is busy enabling researchers to get credit for their data, research funders are encouraging their scientists to think about where their data goes, and the open data movement and the principles of open data have been embraced by the UK Government with the development of the Open Data Institute and data.gov.uk. But when it comes to the analysis of all this data, how do we make sense of endless strings of 1’s and 0’s? C’s, G’s, T’s, A’s? One cross-cutting tool that unites fields as seemingly unrelated as genetics, climate science and finance is data visualisation.

Binary Matrix 122204205

Visualisation is key to our ability to identify trends, patterns and correlations within scientific data, thereby deriving meaning and making discoveries. While a glance at the Guardian Data Store or a website like Visual Complexity might lead you to believe that the visualisation of data is a fad that has washed in with this most recent tidal wave of data, it is actually something that has been with us much longer. On the extreme end, we can trace ideas around data visualisation to cuneiform markings on clay tablets and early maps of our universe. However, our modern graphic representation of data owes a great deal to the Scottish statistician, William Playfair, with his ground-breaking statistical graphics of social, political and economic data. Many of these techniques were adopted by 19th century epidemiologists grappling with the cholera outbreaks ravaging London. Other techniques were adopted by those who were looking at the weather; or attempting to organise and rationalise all life on our planet onto a single piece of paper.
Playfair
Where data visualisation and statistical graphics, as we know them, started.  From William Playfair’s Commercial and Political Atlas, 1786.

Fast forward to the 21st century. Thanks to John Snow, we now know what causes cholera. But we are still mapping it, and now using genomic visualisations to identify the source of the 2010 Haiti cholera outbreak. We have moved beyond the three dimensional ‘heat map’ to bring in the component of time, producing detailed visualisations of how systems evolve over seconds or decades. And data need not be static; it can be dynamic and interactive. But despite these changes to the data itself and our means of visualising it, scientists are still making the same sorts of choices as in the 19th century about visualisation—what should we display? Using what method? How do we create an image that tells a thousand words? And so on…

Gapminder

Data visualisation today. The interactive tool, Gapminder World, uses data visualisation for global good, looking at trends in the health and wealth of nations.

These are issues that we in the Science Team at the British Library are thinking about in particular depth at the moment. Opening in February 2014, our exhibition, Beautiful Science, will look at the past and present of data visualisation in science, bringing together classic visualisations from the Library’s collection with cutting edge visualisations from today’s scientists and designers. We will tell stories of advances both in how we think about data, along with the scientific stories that the data tells.  We aren’t quite ready to let the cat out of the bag yet in terms of what we’ll be showing… but stay tuned for a mixture of the interesting, intriguing and unexpected. We’re excited and we hope you follow us on this journey!

Johanna Kieniewicz

02 August 2013

Show me more data

Expanding on last week’s post on open data, today we look at our role in DataCite and how we are supporting the UK research data community.

The British Library is one of the founding members of DataCite, an international organisation bringing together the research data community to work collaboratively on the challenges of making research data visible, accessible and citable. DataCite is a registration agency for Digital Object Identifiers (DOIs), and the British Library is an allocating agent on behalf of DataCite. We provide an infrastructure that supports simple and effective methods of discovery and access. We work with data centres and other organisations to enable them to assign to DOIs to data. 


Since 2011, the Library’s Science team has been developing DataCite services in the UK. In practical terms, this has involved working with a range of organisations that create, manage or archive data, setting them up on the system, so that they can assign DOIs (a process known as minting - we even have mints, pictured, to prove it!), Mints1working on the DataCite metadata schema and ensuring our community’s needs are represented within the global DataCite membership. To support this work, we have organised a series of workshops, exploring the various aspects of data citation, as well as the requirements for working with DataCite and DOIs.


We’ve covered a lot of topics in the last year. From the basics - such as what does minting a DOI actually mean and how do I do it? (you can find out how in our YouTube video) and what should I put a DOI on - to more complex subjects such as how do I deal with sensitive data or different versions? We’ve had lively discussions at all of the workshops, supported by excellent presentations from colleagues who are working with research data. You can see the full list of topics covered and presentations from the workshops on our webpages www.bl.uk/datasets

 

In addition to running workshops, we’ve been out and about talking to colleagues in universities - discussing how they can use the service as well as hearing about the challenges they face in managing research data. These meetings and workshops have provided opportunities to explore how we can work together – across a range of institutions and disciplines. What is certain and, I think reassuring for everyone, is that no one has all the answers – processes and practices are evolving but it is encouraging that we can work on solutions together. If you’d like to talk to us or arrange a workshop for your organisation, then do get in touch ([email protected])


We’ll be coming back to issues in research data management and data citation in future posts but for now we’re looking forward to a week of discussion and debate at the Research Data Alliance meeting and DataCite Summer meeting in September.


Elizabeth Newbold

26 July 2013

Show me the data

Libraries just worry about books, right? Wrong! We also worry about data. If you want to provide a useful service to the research community (and that community includes anyone who wants to do research), you need to think about all the information, including research data sets, that people may need. But we recognise that isn’t always easy to do.

The Royal Society’s 2012 report on science as an open enterprise focused on the value of research data and, at a recent meeting, Professor Geoffrey Boulton who led the study noted that ‘open science’ approaches are not new. Henry Oldenburg, the 17th-century German natural philosopher and first Secretary of the Royal Society, ensured all his scientific correspondence was written in vernacular (and not Latin, as was the norm), and that all his observations were supported by supplementary evidence (and not just assertions).

Thus Boulton reflected that while the value of supporting reproducibility and providing an evidence base had been recognised very early on, many journals no longer published the results in tandem with the underlying data. Fortunately the technology is now allowing many publishers and others to provide better access to the data.

In some areas of science there has been a culture of data sharing. If researchers are sequencing DNA from any species they are asked to submit it to GenBank: a database established to ensure that scientists have access to the most up-to-date and comprehensive DNA sequence information. Most publishers require the researchers to provide evidence that they have added their data to GenBank before publication. So, if you work on sequencing DNA, getting access to other people’s data is relatively easy – but that is not necessarily the case for many other areas of science.

DNA sequence shutterstock_53986852

The reasons are complex. In many areas of research, there are no established or permanent stores for the many types of data that are produced. For researchers, the data they collect or generate is the primary output of the research and therefore comprises their intellectual capital. Many researchers are concerned about receiving appropriate credit for their efforts and that may not happen if they share their data with all and sundry. But that objection could be tackled if researchers could cite data – and thereby be recognised for their contribution.

Picture1

The British Library is a founding member of an organisation called DataCite which, as the name suggests, was established to enable data to be cited. We have been working with a range of organisations responsible for managing, storing and preserving data from a variety of areas – everything from archaeology to atmospheric science – to enable them to attach a ‘digital tag’ to data that allows it to be referenced. This tag is ‘persistent’, so that even if the data is no longer available, it will be possible to find out what has happened to that resource. We hope when someone says – ‘show me the data’ – we will have played a role in making that possible.

Lee-Ann Coleman and Allan Sudlow

Science blog recent posts

Archives

Tags

Other British Library blogs