Science blog: Open data

18 June 2020

Citizen Science and COVID-19

Your experience of the COVID-19 pandemic could be an important contribution to science. Researchers from diverse disciplinary backgrounds are keen to learn about your stories, insights, routines, thoughts and feelings. While some projects would be eager to receive diaries in the narrative style of Samuel Pepys or John Evelyn, others want more specific information in survey format.

Hand-drawn and painted cartoon illustrating various ways people have entertained themselves during lockdown

Illustration: Graham Newby, The British Library: Lockdown Rooms (3rd June 2020)

Citizen science engages self-selected members of the public in academic research that generates new knowledge and provides all participants with benefits. The engagement can vary from data gathering or participatory interpretation to shared research design. Different forms of citizen science can be referred to as public science, public participation in scientific research, community science, crowd-sourced science, distributed engagement with research and knowledge production, or trans-disciplinary research that integrates local, indigenous and academic knowledge.

Contributing to citizen science projects sustains a sense of control, sense of belonging (empowering feelings in and after isolation) and sense of being useful which are particularly important in uncertain times. According to the UK Environment Observation Framework, self-measured evidence is more trusted by people, and organisations that draw on data generated through citizen science are more trusted. Trust is linked to transparency. Better understanding of how scientific knowledge is produced, and having a role and responsibility in shaping the knowledge production process, are likely to enable citizen scientists to re-frame the often-uneasy relationship between society and science.

Scale is a distinctive feature of citizen science. The more people are engaged, the more comprehensive an understanding can be reached about the researched topic. The featured COVID-19 Symptom Study has become the largest public science project in the world in a matter of weeks: 3,881,488 citizen scientists are involved as of 18th June 2020. Big data allowed medics to develop an artificial intelligence diagnostic that can predict the likelihood of having COVID-19 based on the symptoms only: a vital tool indeed when testing is limited.

The citizen science initiatives highlighted here, COVID-19 Symptom Study, COVID-19 and You, and COVID Chronicles, may inspire you to contribute to them or find other projects where you can take an active role in developing better understanding of current and future epidemics.

COVID-19 Symptom Study
https://COVID.joinzoe.com/data
Epidemiology
Institutions: King's College London, ZOE
Launched: 25th March 2020
Your contribution helps you and researchers understand COVID-19 and the dynamics of the pandemic (UK, USA).
How: Submit your physical health status regularly.

COVID-19 and You
https://nquire.org.uk/mission/COVID-19-and-you/contribute
Social sciences
Institutions: The Open University, The Young Foundation
Launched: 7th April 2020
Your contribution helps you and researchers understand how COVID-19 is affecting households and communities across the world.
How: Fill in an online survey with choices and narratives.

In addition to supporting current research, your contribution could add to future inquiries as well. Collecting and archiving short personal stories ensures authentic data will be available when researchers in the future look back to us now with their research questions. Reliable data should be collected now, while we are still living in unprecedented times. It is especially important to record the experiences of people from less privileged backgrounds, in contrast to earlier pandemics where the voices of all but the upper and middle classes, and the political, legal and scholarly elite, have often been lost to history. COVID Chronicles, an archival initiative, is doing just that. COVID Chronicles is a joint project: BBC 4 PM collects and features some of the stories and The British Library archives them all for future academic inquiries.

COVID Chronicles
https://www.bbc.co.uk/news/entertainment-arts-52487414
History, social sciences
Institutions: BBC Radio 4, The British Library
Launched: 30th April 2020
Your contribution helps you and future researchers understand how people experience the COVID-19 pandemic in their daily life, at a personal level.
How: Submit a mini-essay (about 400 words) to BBC Radio 4 PM via e-mail: pm at bbc dot co dot uk. Your essay will be archived by The British Library and made available for future research.

The gradually easing lockdown and the anticipated long journey of national and global recovery generate a growing appetite to record, reflect on and analyse the COVID-19 epidemic's influence on our life. Not all "citizen science" projects observe high standards of privacy and ethical responsibility, however. Before joining in any research with public participation, consider the principles of citizen science suggested by the European Citizen Science Association and the questions below:

Five questions before joining a citizen science initiative

Can you contact the researchers and the institution(s) they belong to with your questions and concerns?
Is the research approach clear to you? In order words, is it clear to you what happens to your contribution, how it shapes the investigation and what new knowledge is expected?
Is your privacy protected? In other words, is the privacy policy clear to you, including how you can opt out any time and be sure that your data are deleted?
Are you contacted regularly about the progress of the research you are contributing to?
Are you gaining new transferable skills, new knowledge, insights and other benefits by participating in the research?

Further reading:

Bicker, A., Sillitoe, P., Pottier, J. (eds) 2004. Investigating Local Knowledge: New Directions, New Approaches. Aldershot : Ashgate.
BL Shelfmark YC.2009.a.7651, Document Supply m04/38392

Citizen Science Resources related to COVID-19 pandemic (annotated list) https://www.citizenscience.org/COVID-19/
[Accessed 18th June 2020]

Curtis, V. 2018. Online citizen science and the widening of academia: distributed engagement with research and knowledge production. Basingstoke, Hampshire: Palgrave Macmillan.
Available as an ebook in British Library reading rooms.

Open University. 2019. Citizen Science and Global Biodiversity (free online course) https://www.open.edu/openlearn/science-maths-technology/citizen-science-and-global-biodiversity/content-section-overview?active-tab=description-tab
[Accessed 18th June 2020]

Sillitoe, P. (ed). 2007. Local science vs global science: approaches to indigenous knowledge in international development. New York : Berghahn Books.
BL Shelfmark YC.2011.a.631, also available as an ebook in British Library reading rooms.

Written by Andrea Deri, Science Reference Team

Contributions from Polly Russell, Curator, COVID Chronicles, and Phil Hatfield, Head of the Eccles Centre for American Studies, are much appreciated.

Posted by The Science Team at 2:59 PM in BBC , Bioscience , Contemporary Britain , Curiosity , Data , Digital scholarship , Engagement , Open data , Research , Research collaboration , Science , Science communication , Social sciences | Permalink

07 May 2020

The Future of Research Outputs

By Susan Guthrie, Maja Maricevic and Catriona Manville

Earlier this year, the British Library and RAND Europe hosted a roundtable discussion on how research outputs – the different ways research can be disseminated – are changing. It brought together representatives from research funders, publishers, research institutes, government and universities to explore the issue and its implications.

Workshop participants discussed RAND Europe’s recent study for Research England that showed that researchers currently produce a diversity of output forms, the range of which is likely to increase. Although researchers expect to continue to produce journal articles and conference contributions, they also want and plan to diversify the outputs they produce, with a particular focus on those aimed at a wider, non-academic audience.

The British Library also presented its current work and experience in collecting, preserving and making accessible a range of research outputs such as research data, web and social media, as well as new and evolving output formats.

The discussion addressed the following five questions:

How do we define and identify a research output?

There are many different types of outputs from research, from traditional journal articles and books to more diverse examples such as computer code, artworks, blogs, datasets and peer review contributions. One of the challenges is to identify which are actually outputs for dissemination, and which represent a stage in the development of research on the pathway to producing those outputs. An example of the latter is a Github repository for managing and storing revisions of projects, which may be fluid and changing on an ongoing basis. Other products – for example social media exchanges – are a fixed point but may not represent a researcher’s final perspective on a topic, rather the emergence and discussion of views and ideas. This fluid and dynamic mix of different media emerging over time makes it challenging to understand what is a ‘research output’ as traditionally defined.

Where does responsibility lie?

Research is increasingly global and research outputs may span national borders – hence, drawing lines between what is and what is not ‘UK research’ is not straightforward. There is a limit on the extent to which a full record of all research endeavour can be provided. Different stakeholders – libraries, funders, institutions, publishers – can either look to shape and drive desirable changes in behaviour or respond to changes as they emerge from the ‘bottom up’. Funders in particular have the potential to drive researcher actions through the use of incentives.

How do we manage quality control?

As the range and nature of outputs broaden, questions emerge around how to assess the quality of the outputs and decide what is part of the scientific record. Peer review, the current approach, has its weaknesses. A key test of the quality and rigour of research is the extent of uptake and use by the academic community over time. In that sense, the change in types of outputs makes little difference to the ultimate assessment of their quality. However, as the volume of research products increase, alongside increasing concerns over reproducibility, fake news and the reliability of evidence, being able to point to legitimate and reliable sources may be of increasing value.

Do we have the support infrastructure for now and the future?

The growing diversity of research outputs creates new challenges in relation to the complex infrastructure needed to support their review, dissemination and storage across different players in the field e.g. funders, publishers and libraries. Identifying areas in which an intervention could make systems more efficient and futureproof could help but needs to be better understood. Securing digital platforms for sharing and collaborating on research could be part of these interventions, as could increasing digital archiving for discovery and access.

What are some possible solutions?

Permanent digital links to research outputs, which act as unique IDs for outputs to enable their consistent identification and referencing, may be a key part of the solution. Ensuring their consistent use, however, is a potential challenge and an important route forward to help make this problem more tractable. Participants discussed the successful example of DataCite in establishing an international solution. AI may also be part of the solution, in terms of discoverability of outputs. However, there are potential risks associated with this, such as biases, and a lack of knowledge around the way information is curated and presented by algorithms (for example, when using Google Scholar). Linked to these technological solutions is the need for data literacy, within and beyond the research community, as well as creating a culture of openness and transparency across all stages of the research cycle.

The changing nature of research outputs has the potential to affect a wide range of organisations and people in the sector. Joined-up thinking and action could help. As the diversity of research outputs increases, we have to make choices. We can either be reactive, responding to needs and challenges as they emerge, or proactive, to help shape and guide the nature and effective preservation of research outputs. A more proactive stance could help drive research towards better practice in information storage, sharing and communication, but requires early action and shared goals at a sector level. Continued dialogue and sharing of views on this topic could be important to make sure these issues are appropriately and adequately addressed.

Dr Susan Guthrie and Dr Catriona Manville are research leaders in science and innovation policy at RAND Europe. Maja Maricevic is head of higher education and science at the British Library.

Posted by The Science Team at 4:00 PM in Digital scholarship , Open access , Open data , Research , Science | Permalink

29 August 2017

I4OC: The British Library and open data

In August the British Library joined the Initiative for Open Citations as a stakeholder. The I4OC’s aim of promoting the availability of structured, separable, open citation data fits perfectly with the Library's established strategy for open metadata which has just marked its seventh anniversary.

In August 2010, responding to UK Government calls for increased access to public data to promote transparency, economic growth and research, the British Library launched the strategy by offering over 16m CC0 licensed records from its catalogue and national bibliography datasets. This initiative aimed to remove constraints created by restrictive licensing and library specific standards to enable wider community re-use. In doing so the Library aimed to unlock the value of the data while improving access to information and culture in line with its wider strategic objectives.

The initial release was followed in 2011 by the launch of the Library’s first Linked Open Data (LOD) bibliographic service. The Library believed Linked Open Data to be a logical evolutionary step for the established principle of freedom of access to information, offering trusted knowledge organisations a central role in the new information landscape. The development proved influential among the library community in moving the Linked Data debate from theory to practice.

Over 1,700 organisations in 123 countries now use the Library’s open metadata services with many more taking single files. The value of the Library’s open data work was recognised by the British National Bibliography linked dataset receiving a 5 star rating on the UK Government Data.gov.uk site and certification from the Open Data Institute (ODI). In 2016 the Library launched the http://data.bl.uk/ platform in order to offer copies of a range of its datasets available for research and creative purposes. In addition, the BL Labs initiative continues to explore new opportunities for public use of the Library’s digital collections and data in exciting and innovative ways. The British Library therefore remains committed to an open approach to enable the widest possible re-use of its rich metadata and generate the best return on the investment in its creation.

I4OC users by country

As the example of the British Library’s open data work shows, opening up metadata facilitates access to information, creates efficiencies and allows others to enhance existing and develop new services. This is particularly important for researchers and others who do not work for organisations with subscriptions to commercial citation databases. The British Library believes that opening up metadata on research facilitates both improved research information management and original research, and therefore benefits all.

The I4OC’s recent call to arms for its stakeholders is therefore very much in tune with the British Library’s open data work in promoting the many benefits of freely accessible citation data for scholars, publishers and wider communities. Such benefits proved compelling enough to enable the I4OC to secure publisher agreement for nearly half of indexed scholarly data to be made openly accessible. This data is now being used in a range of new projects and services including OpenCitations and Wikidata. It's encouraging to see I4OC spreading the open data ideal so successfully and it is to be hoped that it will also succeed in ensuring open citations become the default in future.

Correction: Image shows users of BL open data services by country, not I4OC

Posted by The Science Team at 2:17 PM in Data , Digital scholarship , Open access , Open data , Science , Science communication | Permalink

Tags: I4OC , open access , open citations , open metadata

05 September 2016

Social Media Data: What’s the use?

Team ScienceBL is pleased to bring you #TheDataDebates - an exciting new partnership with the AHRC, the ESRC and the Alan Turing Institute. In our first event on 21st September we’re discussing social media. Join us!

Every day people around the world post a staggering 400 million tweets, upload 350 million photos to Facebook and view 4 billion videos on YouTube. Analysing this mass of data can help us understand how people think and act but there are also many potential problems. Ahead of the event, we looked into a few interesting applications of social media data.

Politically correct?

During the 2015 General Election, experts used a technique called sentiment analysis to examine Twitter users’ reactions to the televised leadership debates¹. But is this type of analysis actually useful? Some think that tweets are spontaneous and might not represent the more calculated political decision of voters.

On the other side of the pond, Obama’s election strategy in 2012 made use of social media data on an unprecedented scale². A huge data analytics team looked at social media data for patterns in past voter characteristics and used this information to inform their marketing strategy - e.g. broadcasting TV adverts in specific slots targeted at swing voters and virtually scouring the social media networks of Obama supporters on the hunt for friends who could be persuaded to join the campaign as well.

Image from Flickr

In this year's US election, both Hillary Clinton and Donald Trump are making the most of social media's huge reach to rally support. The Trump campaign has recently released the America First app which collects personal data and awards points for recruiting friends³. Meanwhile Democrat nominee Clinton is building on the work of Barack Obama's social media team and exploring platforms such as Pinterest and YouTube⁴. Only time will tell who the eventual winner will be.

Playing the market

You know how Amazon suggests items you might like based on the items you’ve browsed on their site? This is a common marketing technique that allows companies to re-advertise products to users who have shown some interest in the brand but might not have bought anything. Linking browsing history to social media comments has the potential to make this targeted marketing even more sophisticated⁴.

Credit where credit’s due?

Many ‘new generation’ loan companies don’t use a traditional credit checks but instead gather other information on an individual - including social media data – and then decide whether to grant the loan⁵. Opinion is divided as to whether this new model is a good thing. On the one hand it allows people who might have been rejected by traditional checks to get credit. But critics say that people are being judged on data that they assume is private. And could this be a slippery slope to allowing other industries (e.g. insurance) to gather information in this way? Could this lead to discrimination?

Image from Flickr

What's the problem?

Despite all these applications there’s lots of discussion about the best way to analyse social media data. How can we control for biases and how do we make sure our samples are representative? There are also concerns about privacy and consent. Some social media data (like Twitter) is public and can be seen and used by anyone (subject to terms and conditions). But most Facebook data is only visible to people specified by the user. The problem is: do users always know what they are signing up for?

Image from Pixabay

Lots of big data companies are using anonymised data (where obvious identifiers like name and date of birth are removed) which can be distributed without the users consent. But there may still be the potential for individuals to be re-identified - especially if multiple datasets are combined - and this is a major problem for many concerned with privacy.

If you are an avid social media user, a big data specialist, a privacy advocate or are simply interested in finding out more join us on 21^st September to discuss further. Tickets are available here.

Katie Howe

Posted by The Science Team at 10:40 AM in Curiosity , Data , Digital scholarship , Engagement , Open data , Research , Science , Science communication , Science policy | Permalink

05 October 2015

New opportunities for collaborative PhD research exploring the British Library’s science collections

Applications for collaborative PhD research around the British Library’s science collections are now open to UK universities and other HEIs

The British Library is looking for university partners to co-supervise collaborative PhD research projects that will open up unexplored aspects of its science collections. Funding is available from the Arts & Humanities Research Council (AHRC) Collaborative Doctoral Partnerships programme, through which the Library works with UK universities or other eligible Higher Education Institutes around strategic research themes.

Our current CDP opportunities include a project to examine the culture and evolution of scientific research, drawing on scientists’ personal archives, and another project to develop digital tools for the investigation of scientific knowledge in the 17^th and 18^th centuries:

The Working Life of Scientists: Exploring the Culture of Scientific Research through Personal Archives

This project will involve a detailed mapping of the key personal relationships of 20th century British scientists to shed light on the nature, communication and reception of scientific research. It will draw on the Library’s Contemporary Archives and Manuscripts collections, which include personal archives and correspondence from the fields of computer science and programming, cybernetics and artificial intelligence, as well as evolutionary, developmental and molecular biology. As well as being situated within social and cultural history, particularly the history of science and the history of ideas, this cross-disciplinary project is applicable to research in areas such as social anthropology, sociology and social network analysis. It will open up a nuanced understanding of the BL’s collection of the personal archives of twentieth century British scientists. It will enable us to better exploit these valuable collections to research audiences across a number of disciplines.

Hans Sloane’s Books: Evaluating an Enlightenment Library

This Digital Humanities projectwill evaluate the library of Hans Sloane (1660-1753): physician, collector and posthumous ‘founding father’ of the British Museum. For over sixty years, Hans Sloane was a dominant figure on London’s intellectual and social landscape. At the heart of his vast collections stood a library of 45,000 books, which – alongside his voluminous correspondence and thousands of prints, drawings, specimens and artefacts – bears witness to his central position in a globalised network of scientific discovery. The CDP project will apply digital techniques to exploit the raw data on over 32,000 items in the Sloane Printed Books Catalogue, and will break new ground by developing digital tools to cross reference, contextualise and analyse the data. This will forge fresh insights into how medical and scientific knowledge was gathered and disseminated in the pre-Linnaean period, with relevance to the history of science, medicine and collecting.

Moving beyond our science collections, there is also a third CDP opportunity for a project on ‘Digital Publishing and the Reader’. This will investigate the changing nature of publishing in digital environments to consider how new communication technologies should be recorded or collected as part of a national collection of British written culture.

Applications are invited from academics to develop any of these research themes with a view to co-supervising a PhD project with the British Library from October 2016. Our HEI partners receive and administer the funds for a full PhD studentship from the AHRC and, in collaboration with the Library, oversee the research and training of the student. We provide the student with staff-level access to our collections, expertise and facilities, as well as financial support for research-related costs of up to £1,000 a year.

View further details and application guidelines.

To apply, send the application form to [email protected] by 27 November 2015.

Posted by The Science Team at 1:02 PM in Engagement , Humanities , Literature , Manuscripts , Modern history , Open data , Rare books , Research , Science , Science communication , Sound and vision | Permalink | Comments( 0)

06 February 2015

DataCite Case Study: ForestPlots.net at the Unviersity of Leeds

In June last year, we held a DataCite workshop hosted by the University of Glasgow. We've now turned our speaker's use of Digital Object Identifiers (DOIs) for rainforest data into a video and printed case study.

You can still find a short summary of that event here. Our thanks go to Gabriela Lopez-Gonzalez for taking the time to come and film with us.

We hope that this case study will help institutions promote the idea of data citation and use of DOIs for data to their researchers, and that this in turn will encourage more submission of data to institutional repositories.

A DataCite DOI is not just for data

During January we had also been trying to spread the word that DOIs from DataCite aren't necessarily just for data. We've been working with the British Library's EThOS service to look at how UK institutions might give DOIs to their electronic theses and dissertations.

There was an initial workshop to divine the issues in November 2014, and on 16^th January we held a bigger workshop, bringing more institutions together to look at how we might start to establish a common way of identifying e-theses in the UK.

The technical step of assigning a DOI to a thesis is relatively straightforward. Once an institution is working with DataCite (or CrossRef) they can use their established systems to assign a DOI to a thesis. But the policies surrounding the issue and management of this process are more complex. We're hoping that these workshops have helped everyone to pull in the same direction and collaborate on answers to common questions.

This work has given rise to a proposal to look at how to improve the connection between a thesis and the data it is built on. By triggering the consideration of sharing the data supporting a thesis, maybe we can "get 'em young" and introduce good data sharing practice as early in the research career as possible. Connecting the thesis and its data also increases the visibility of both, helping early career researchers to reap the benefits of their hard work sooner.

Watch this space to see what happens next!

Posted by The Science Team at 3:24 PM in Data , DataCite , Open data , Research | Permalink | Comments( 0)

12 February 2014

Is Necessity The Mother of Invention?

Scientific discovery and invention. What drives them? What connects them? Allan Sudlow and Katie Howe delve into the Library’s collections to uncover some answers.

Scientists have long used patents to protect their inventions and allow them opportunities to commercialise their work. Recent controversies in cancer and stem cell research have highlighted the social and ethical, as well as the economic implications of biomedical patents. We will be exploring these issues in our forthcoming TalkScience event on 4 March: Patently Obvious?

In the meantime, we have been taking a look back at what distinguishes a scientific discovery from an invention – and asking – is necessity really the mother of invention?

The Oxford English Dictionary attributes the first printed usage of the proverb ‘Necessity is the mother of invention’ to Richard Franck in his tome Northern Memoirs, first published in 1694:

“Art imitates nature, and necessity is the mother of invention; science also invites to study and practicks, but theory gives the prospect, and operation finishes the project.”

Frontispiece from Northern Memoirs, Calculated for the Meridian of Scotland, Richard Franck. (1694)

At the turn of the last century, the mathematician and philosopher Alfred North Whitehead took a different view on the origins of invention, and its relationship to scientific discovery, noting in The Aims of Education:

“…inventive genius requires pleasurable mental activity as a condition for its vigorous exercise. ‘Necessity is the mother of invention’ is a silly proverb. ‘Necessity is the mother of futile dodges’ is much nearer the truth. The basis of the growth of modern invention is science, and science is almost wholly the outgrowth of pleasurable intellectual curiosity.”

This insight from the past provides a rallying call to those that support the idea of ‘blue skies’ research and feel that scientific discovery and invention should be driven by curiosity rather than a strategy or a set of pre-defined rules. In contrast, O.T. Mason describes, very precisely, what he believes underpins the nature of invention in an article The Evolution of Invention from 1895, published in the first volume of the journal Science:

Of the thing or process, commonly called inventions.
Of the apparatus and methods used.
Of the rewards to the inventor.
Of the intellectual activities involved.
Of society

Fast-forward to the present, and the European Patent Convention defines – or rather doesn’t define - invention in terms of:

“…a non-exhaustive list of things which are not regarded as inventions. It will be noted that the items on this list are all either abstract (e.g. discoveries or scientific theories) and/or non-technical (e.g. aesthetic creations or presentations of information). In contrast to this, an "invention" … must be of both a concrete and a technical character”

So we see some distinction between discovery and invention: the abstract vs the concrete. But what – I hear you cry – about necessity?

The Human Genome Project (HGP), the world’s largest biological project to date, is a great example of necessity being a spur for collaborative discovery. The HGP’s aim was to determine the sequence of the three billion chemical building blocks that make up human DNA – the entire human genetic code. Many of the scientists involved saw the HGP as a race between public and commercial research interests. In particular: Craig Venter, an American genomic researcher
and entrepreneur; and John Sulston, an English Nobel Prize winning scientist and campaigner against the patenting of human genetic information.

Sir John Sulston, who oversaw the UK's contribution to the Human Genome Project.
© Wellcome Images, made available under CC BY-NC-ND 2.0

In his book The Common Thread, Sulston describes the moment when he realised that Venter’s company (Celera Genomics) parallel work to sequence the human genome with greater speed than academic efforts: “…had made everyone realise the absolute necessity of the publicly funded teams working together”. Thus, necessity drove greater international effort, and on the 26 June 2000, the HGP consortium announced that it had assembled a working draft of the sequence of the human genome.

Competing public and commercial interests persist in scientific discovery and invention, especially in relation to genetic information. Recent attempts to patent human gene sequences have raised questions over whether a sequence of DNA is an invention or a discovery and have highlighted some of the challenges in assessing the patentability of biomedical developments. Witness the recent legal battle involving diagnostics company Myriad Genetics in the US over predictive genetic testing for susceptibility to breast cancer. The US Supreme Court judged that human DNA was a ‘product of nature’, a basic tool of scientific and technological work, thereby placing it beyond the domain of patent protection. Amongst other caveats, this judgment declared that certain forms of DNA (cDNA) were patentable.

Will there always be a necessity to patent in this area of bioscience? Undoubtedly, but a balance needs to be struck. Necessity may drive invention but when it comes to Mother Nature, who decides? Come to TalkScience on 4 March to voice your opinion.

Posted by The Science Team at 2:00 PM in Bioscience , Curiosity , Open data , Research , Science , Science policy , TalkScience | Permalink | Comments( 1)

20 January 2014

Beautiful Science Preview

Johanna Kieniewicz spills a few beans on the upcoming British Library exhibition

We are now just a month out from the British Library’s first science exhibition: Beautiful Science: Picturing Data, Inspiring Insight. Life in our team right now is a whirlwind of writing captions, finalising commissions, testing interactives and liaising with our press office. But all for a good reason. Opening February 20^th, Beautiful Science will highlight the very best in graphical communication in science, linking classic diagrams from the Library’s collections to the work of contemporary scientists. The exhibition will cover the subject areas of public health, weather and climate and the tree of life, telling stories both of advances in science, as well as look at the way in which we communicate and visualise scientific data.

Picturing Data

Data is coming out our ears. From data collected by our mobile phones and movements about the city to the data acquired by scientists when sequencing genomes or smashing subatomic particles together, the quantities are vast. While a simple table of numbers is a form of data visualisation in itself, our human ability to scan, analyse and identify patterns and trends is limited.

William Farr, 1852, Report on the Mortality of Cholera in England 1848-1849

Whilst today we see a proliferation of data visualisation, it is hardly a new phenomenon, and might even be considered a rediscovery of the ‘Golden Age’ of statistical graphics of the late 19^th century. Like today, the Victorian period featured a confluence of new techniques for data collection, developments in statistics and advances in technology created an environment in which data graphics flourished. In Beautiful Science, we highlight a number of graphics from this period—some of which are well known, others of which may prove to be more of a surprise, such as this piece on cholera mortality by epidemiologist and statistician William Farr.

Inspiring Insight

The very best visualisations of scientific data, do not merely present it, but also inspire insight and reveal meaning. Data visualisation is both a tool through which we can analyse and interpret data, but also functions as a method by which we communicate its meaning. It is most powerful when it does both.

Circles of Life, Martin Krzywinski, 2013

In curating Beautiful Science, we were keen to highlight the ways in which the visualisation of data is integral to the scientific process, as well as the way cutting edge science is communicated. The Circos diagrams used to display genomic data do this very well. In Beautiful Science, you can examine a comparison of the human genome with both closely and distantly related animals. Here, you see that we are quite closely related to the chimpanzee (though we presume you knew that already). But what about a chicken or a platypus? You’ll have to come to the exhibition and see for yourself.

Beautiful Science

Should we impose an aesthetic upon the presentation of scientific information? Or is beauty indeed in the eye of the beholder? We take a rather agnostic position in this debate, and rather seek to inspire the exhibition visitor with both intriguing images and inspiring ideas. What is clear, however, is that scientists should take care and be thoughtful when producing their graphics. In a world where research impact is ever more important, producing images that compellingly communicate discoveries is of increasing importance.

NASA/Goddard Space Flight Center Scientific Visualization Studio

Compelling imagery is something at which the NASA Scientific Visualisation Studio excels. Something like a model of ocean currents might potentially be quite dry and dull. Originally developed for a scientific purpose, would not colour coded vectors increasing and decreasing size not do the job? With a leap of insight, they developed a visualisation that is both informative and inspiring. We hope you will watch it with awe in the entry to the exhibition, tracking the Gulf Stream as it moves water northwards towards the British Isles, bringing us our temperate climate.

Even More Beautiful Science

A fantastic programme of events will also accompany the exhibition. From serious debate to science comedy shows, competitions, workshops and family activities, we’ve developed a programme that’s designed to make you think. Please join us!

Beautiful Science runs from 20 February to 26 May, 2014, is sponsored by Winton Capital Management, and is free to the public.

Posted by The Science Team at 11:47 AM in Data , Data visualisation , Open data , Science communication | Permalink | Comments( 1)

06 December 2013

Visualising Research

This week we are excited to announce the launch of a data visualisation competition (and workshop), sponsored by the AHRC and BBSRC

We talk quite a lot about data on the Science Blog and have previously highlighted the role we are playing in helping researchers to discover, access or cite scientific data. But working at the British Library means we have the fantastic opportunity to bring our collections and contemporary research to the wider public through our exhibitions. Earlier in the year we gave you a taster of Beautiful Science - an exhibition launching in February 2014, that will explore scientific data visualisation from past to present. Some famous historical names, such as Florence Nightingale, knew the power of displaying data – her iconic diagram (pictured) not only enabled any viewer to quickly grasp the meaning but led to changes in the way those injured in war were treated.

As part of our celebration of all things data and our exhibition, we have been working with the Arts & Humanities Research Council and the Biotechnology & Biological Sciences Research Council on a competition that challenges entrants to bring UK Research Council data to life. An added bonus - we hope - is that the competition aims to encourage people from different disciplines to work together, since presenting complex data not only requires mathematical, computing or scientific skills but strong expertise in art and design. A key criteria for the judges will be whether the entries convey the meaning to a wide audience and so they will be looking for that combination of valid data that tells a compelling story.

Around £3 billion of Government funding is apportioned annually between the seven UK Research Councils, which are responsible for different discipline areas. The Research Councils then distribute that funding to their various communities on the basis of applications made by researchers, which are subject to independent, expert peer review. Applications are judged by considering a combination of factors, including their scientific excellence, timeliness and promise, strategic relevance, economic and social impacts, industrial and stakeholder relevance, value for money and staff training potential. Until recently it wasn’t easy to combine funding data from different Research Councils or to explore how it was distributed across the country. And the finer grained detail, while it may have been available from an individual Council, was difficult to tease out or integrate. Behind the scenes, Research Councils worked together to make details of the research they fund available from one place. The culmination of that commitment is Gateway to Research - a database that anyone can use. The data is available programmatically and under an open government licence which means that anyone is free to interrogate it – you can extract it all, download it to your own systems, apply your own analysis tools and generally think of things to do with it that no one else has done before.

The challenge of the competition is to use the Gateway to Research data to tell a compelling story that anyone will be able to understand. While designers, graphic artists, software developers and programmers may have a particular interest, anyone and everyone is invited to produce a visualisation (on a website) that will show how this public funding contributes to research in the UK. Details of the competition are here. Entries forms will be available from 27 January 2014 and the closing date is 21 March 2014. Our judges include Jackie Hunter, Chief Executive, BBSRC, Katy Borner, Victor H. Yngve Professor of Information Science, Indiana University and Guardian Digital Agency.

On 24 January 2014, we are holding a workshop at the British Library for anyone who wants to find out more. Please register if you want some inspiration, information about the Gateway to Research database and to meet potential collaborators. Representatives from the AHRC and BBSRC will be there on the day, as well as data visualisation evangelists (Guardian Digital Agency) and developers (Cottage Labs) who have worked with the data. We will also have Andrew Steele from Scienceogram who is using public data to make the case for science in the UK.

Lee-Ann Coleman

Posted by The Science Team at 2:00 PM in BBSRC , Data visualisation , Open data | Permalink | Comments( 0)

08 November 2013

Why not cite data?

Rachael Kotarski, our Content Expert for scientific datasets, explains why citing data as well as the article is the way forward.

In a previous post, Lee-Ann Coleman looked at citations in science, asking what should be cited, and what a citation means. The answers to these questions are not necessarily simple, but one response we have been hearing (and that we support), is that data needs to be cited.

Citing data not only gives credit to those who created or gathered it, but can also give some kudos to the repository that looks after it. Despite the fact that data is also key to verifying and validating research, it is not yet standard practice to cite it when writing a paper. And even if it is cited, it is rarely done in a way that allows you to identify and access that data.

Citation should connect the literature to its data foundations. Image source: Shutterstock.

As part of the Opportunities for Data Exchange (ODE) project, we investigated data citation and the ways in which data centres, publishers, libraries and researchers can encourage better data citation.

What does ‘better data citation’ look like and how do we encourage it to happen? We examined three aspects of current practice in order to answer this question:

How data is cited?
What data is cited?
Where is data cited within the article?

How to cite
A data citation needs to contain enough information to find and verify the data that was used, as well as give credit to those who spent considerable time/money/effort generating or collecting the data. The DataCite recommended data citation is just one example of how to include details that support these aims (and it’s pretty simple!):

Creator (publication year): Title. Publisher. Identifier.

What to cite
Data are not necessarily fixed, stable or homogenous objects, so citing them can be considerably more complicated than for articles. It is important for testing reproducibility that regardless of subsequent changes to the data or subsets of it, they are cited as used. Aspects such as the version used or date downloaded should also be encapsulated in the citation, where necessary. Linking users via an identifier (such as a DOI as used by DataCite) to the location of that exact version or subset of the data is also important. An example of citing a specific wave of data from GESIS demonstrates this:

Förster, Peter; Brähler, Elmar; Stöbel-Richter, Yve; Berth Hendrik (2012): Saxonian longitudinal study – wave 24, 2010. GESIS Data Archive, Cologne. ZA6242 Data file version 1.0.0, doi: 10.4232/1.11322

Where to cite in the article
Where you cite data in the article may depend on the form of the data being cited. For example, data obtained via colleagues but not widely available may be best mentioned in acknowledgements, and data identified by accession numbers could be cited inline in the body of the article. But the interviewees who participated in the ODE study largely advocated citation of datasets in the full reference list, to promote tracking and credit. In order to do this, data needs a full, stable citation, which also depends on reliable, long-term storage and management of the data. Of course publisher requirements play an important role. But that’s a post for another day!

These are the three ‘simple’ steps to better citation of data, but there are still cultural and behavioural barriers to sharing data. In the ODE report we concluded that the whole community - researchers, publishers, libraries and data centres - all have a role in promoting and encouraging data citation.

The recent Out of Cite, Out of Mind report has since updated and greatly extended the ODE work, with an excellent set of first principles for data citation:

CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal vol. 12 p. CIDCR1-CIDCR75 doi: 10.2481/dsj.OSOM13-043

I recommend it – and encourage anyone thinking about citing their data (or anyone else’s) to stop thinking and start doing it.

Posted by The Science Team at 2:00 PM in Data , DataCite , Open data , Research | Permalink | Comments( 1)

Science blog

12 posts categorized "Open data"