Science blog: Digital scholarship

23 January 2019

Lab notebooks - handwriting at the core of science

Page from Anne McLaren's notebook (shelfmark Add MS 83844) covering embryo transfer experiments in mice, 1950s. (Copyright estate of Anne McLaren)

Today is World Handwriting Day, and we thought we’d pay our respects to the most important role handwriting plays in science, one which you might not have heard of if you aren’t a practicing scientist. This is the “lab notebook”, a scientist’s daily diary of all their experiments, thoughts, and other scientific activities. Until relatively recently, these were always handwritten, as they were meant to record what, in detail, someone was doing as they did it. Waiting to create them until work was finished caused too much risk of forgetting or distorting something.

Lab notebooks grew out of the personal diaries and notebooks of individual researchers. Some notebooks by well-known scientists have become Library treasures in their own right. One of the most famous works in our Treasures of the British Library exhibition is the Codex Arundel, a collection of notes written by Leonardo da Vinci (although probably not in the order they were bound) in the sixteenth century. At the other extreme of history, the Treasures Gallery currently displays the biologist Anne McLaren's lab book on embryo transfer in mice. Outside the BL, most of the lifelong field and theoretical notebook collections of Charles Darwin are digitised and available online, as are some of Albert Einstein's most significant theoretical notebooks. At the other end of accessibility, some of the lab notebooks of Marie and Pierre Curie, held by the National Library of France, are reported to still be so radioactive that they are not safe to handle without protective clothing.

Laboratory notebooks later became an even more important record of exactly what was done, as lone researchers were replaced by academic and private-sector research groups, science and technology became ever-more important to society, and scientists were expected to describe their methods in detail so that they could be replicated and turned into innovative technologies, materials and treatments. Additionally, until quite recently, American patent law worked on a “first to invent” basis whereby the person who could prove that they had the idea for an invention first, or their employer, had the right to a patent. Laboratory notebooks were the main source of evidence for this. In recent years, scientific misconduct has become a higher-profile issue, as scientists worry about a “replicability crisis” where too many uncertain or exaggerated results have been published. Lab books help prove that the work was done as the researchers claim, or the detail expected in them make discrepancies easier to recognise. And the notebooks of eminent scientists are a rich source for scientific historians.

By the latter part of the twentieth century, some organisations had very detailed instructions for how laboratory notebooks should be completed and stored. Lab books had to be written exactly as the work was carried out, or as soon as possible – no jotting notes on scraps of paper and writing them up at the end of the day. Notebooks were considered the property of the employer or the university, and could not be removed from the lab. And they had to be clearly paginated with no chance of pages being removed or replaced.

Many laboratories still use paper notebooks, due to the ease of simply writing notes down as you go. In many types of science, electronic devices are at risk of being exposed to spillages or damaging electromagnetic conditions, or are simply unwieldy. Some researchers also like to keep their detailed records to themselves instead of sharing them with a group. Some research groups and organisations are now moving to electronic recording, but the lifetime of electronic data can be questionable due to failure to back up and the lifespan of media. Specifically-designed electronic laboratory data systems are more secure. They are more common in industry than academia, as academics are more independent and less likely to respond to top-down orders, and academic institutions can be less able to afford the necessary software and hardware. The advantages of electronic research notes systems are that you can save large amounts of original data directly into the system without retyping or printing it, clone records from earlier experiments to save time, search your records more easily, share data within the group easily, and track the history of records. Now data is often electronically recorded and can be directly copied into a laboratory system without a transcription stage. It is possible to use general project and collaboration software packages such as Evernote, SharePoint, or GoogleDrive but specifically-designed software is now available.

In 2011, Gregory Lang and David Botstein published a scanned copy of the entire lab notebook covering the research leading to a paper on yeast genetics, as an attachment to their e-journal article.

Modern lab books rarely find their way into the British Library collection, but our most famous example is the collection of Alexander Fleming, the discoverer of penicillin (also including records of earlier experiments by his mentor Sir Almroth Wright). As well as the material by Anne McLaren mentioned earlier, we also have some material from the photography pioneer Henry Fox Talbot, electrical inventor David Edward Hughes, and biologist Marilyn Monk.

Sources and further reading:
Barker, K, At the bench: a laboratory navigator, Cold Spring Harbor: Cold Spring Harbor Press, 2005. pp. 89-99. Shelfmark YK.2005.b.1888
Baykoucheva, S. Managing scientific information and research data, Oxford: Chandos Publishing, 2015. Available electronically in British Library reading rooms.
Bird, CL, Willoughby, C and Frey JG, "Laboratory notebooks in the digital era: the role of ELNs in record keeping for chemistry and other sciences", Chemical Society reviews, 2013, 42(20), pp. 8157-8175. Shelfmark (P) JB 00-E(105) or 3151.550000.
Elliott, CA, "Experimental data as a source for the history of science", The American archivist, 1974, 37(1), pp. 27-35. Shelfmark Ac. 1668 or 0810.390000, also available electronically in British Library reading rooms.
Holmes, FL, "Laboratory notebooks: can the daily record illuminate the broader picture", Proceedings of the American Philosophical Society, 1990, 134(4), pp.349-366. Shelfmark Ac. 1830 or 6630.500000, also available electronically in British Library reading rooms.
Stanley, JT and Lewandowski, HJ, "Lab notebooks as scientific communication: investigating development from undergraduate courses to graduate research", Physical review: physics education research, 2016, 12, 020129, freely available online at https://journals.aps.org/prper/pdf/10.1103/PhysRevPhysEducRes.12.020129.
Williams, M, Bozyczko-Coyne, D, Dorsey, B and Larsen, S, "Appendix 2: Laboratory notebooks and data storage", in Gallager, SR and Wiley, EA, Eds. Current protocols essential laboratory techniques, Hoboken: John Wiley & Sons, 2008. Shelfmark YK.2008.b.6299 or m09/.30081

Posted by The Science Team at 11:27 AM in Bioscience , Curiosity , Digital scholarship , Environmental science , Open access , Research , Science | Permalink

18 December 2018

Arabic science manuscripts from the British Library

The beginning of Kitāb al-sīrah al-falsafīyah, an autobiographical treatise by the physician and philosopher Abū Bakr Muḥammad ibn Zakarīyā al-Rāzī (Add MS 7473, f. 1v)

Today is World Arabic Language Day, so here's a reminder of the scientific content in our Qatar Digital Library digitisation project. Our friends on the Asian and African Studies blog created two lists of major scientific works digitised in the collection, including Arabic versions of classical scientific texts, some of which were lost from Western European culture until the Renaissance, and original works by great early scientists of the Arabic-speaking world, such as Quṭb al-Dīn al-Shīrāzī, Ibn Sīnā (Avicenna), Ibn Haytham (Alhazen), and Abū Bakr Muḥammad ibn Zakarīyā al-Rāzī (Rhazes).

Posted by The Science Team at 11:23 AM in Bioscience , Curiosity , Digital scholarship , Manuscripts , Medieval history , Middle East , Rare books , Science , Science communication | Permalink

12 November 2018

New psychology and nature databases on trial at the BL

Starting today, users in the British Library Reading Rooms can use two new databases from Alexander Street, which are on trial until mid-January 2019. The usage figures in the next two months will determine whether we take the databases permanently.

Psychological Experiments Online has information on some of the most famous (or notorious, given the dark conclusions of some of them) experiments in psychology since 1900, with articles, archive material, sound or video interviews with researchers and participants, and even recordings of the experiments themselves when available.

The BBC Landmark Video Collection has complete episodes of some of the BBC's most significant nature documentary series from the last fifteen years. All of them have full subtitles and searchable transcripts.

Note that to use these databases you will have to use our desk PCs within the Reading Rooms. For the full effect of sound and video material, you will need to use a PC with headphones, although most of those in the Science reading rooms are now fitted with them.

Please can you give any feedback to the enquiry desk staff, or to [email protected]

Posted by Philip Eagle, Subject Librarian - STM

Posted by The Science Team at 10:37 AM in BBC , Bioscience , Digital scholarship , Environmental science , Research , Science , Sound and vision | Permalink

03 April 2018

Augmented reality - it isn't just for catching mons.

The most recent GREATforImagination post covered an augmented reality app created by Nexus Studios for the US Presidential administration in 2016. Augmented reality is a halfway point towards the more famous virtual reality, in which CGI elements are added to a real-time image of the user's surroundings, using either a mobile device screen or virtual reality goggles. The most well-known applications at the moment are for entertainment, such as the famous game Pokemon Go, or our own use of it in our Harry Potter exhibition.

However, there are some more practical uses for augmented reality in the worlds of science and engineering.

The construction industry still largely uses 2-D documents to indicate what should be built. However, why not create augmented reality images of objects in situ for people to copy? Or why not help utilities workers "see" underground pipes before they start digging holes?

An obvious application is in the world of chemistry, where physical 3-D models of large molecules have been familiar for decades, but can take a long time to build. Digital models can be created much more quickly, and AR equipment allows scientists to interact with them with increasing realism. There's a freeware program to try it yourself, if you have some chemistry and computing knowledge.

AR can also be used in surgery, either for training purposes or to allow surgeons to "see" what they are doing during minimally invasive surgery.

(All the articles linked are open access, so you don't have to come to the Library to read them)

Posted by The Science Team at 5:19 PM in Bioscience , Curiosity , Data visualisation , Digital scholarship , Research , Science | Permalink

30 November 2017

Digital preservation and the Anne McLaren Papers

Today on International Digital Preservation Day we present a guest-post by Claire Mosier, Museum Librarian and Historian at American Museum of Western Art: The Anschutz Collection, concerning the digital files in the Anne McLaren Supplementary Papers (Add MS 89202) which have just been made available to researchers. As an MA student Claire worked as an intern at the British Library in 2015 helping to process digital material.

Dame Anne McLaren. Copyright James Brabazon

The developmental biologist Dame Anne McLaren was a great proponent of scientists sharing their work with the general public, and gave many presentations to scientists as well as the general public. Some of the notes, drafts, and finished products of these presentations are on paper, and others are in digital formats. The digital files of the Anne McLaren Supplementary Papers are comprised mostly of PowerPoint presentations and images. Digital records are more of a challenge to access, and give readers access to, as they are not always readily readable in their native format. This leads to unique challenges in determining and making available the content.

‘HongKong2003Ethics.ppt’ Page from the presentation ‘Ethical, Legal and Social Considerations of Stem Cell Research’, 2003, (Add MS 89202/12/16). Copyright the estate of Anne McLaren.

Throughout her career, McLaren gave presentations not only for educating others about her own work, but also on the social and ethical issues of scientific research. Many of her PowerPoint files are from presentations between 2002 and 2006 and cover the ethical, legal, moral, and social implications around stem cell therapy. These topics are addressed in the 2003 presentation ‘Ethical, Legal, and Social Considerations of Stem Cell Research’ (Add MS 89202/12/16), which briefly covers the historic and current stem cell research and legislation affecting it in different countries. A presentation from 2006 ‘Ethics and Science
of Stem Cell Research’ (Add MS 89202/12/160) goes into more detail, breaking ethical concerns into categories of personal, research, and social ethics. As seen in these presentations and others, Anne McLaren tried to present material in a way that would make sense to her audience, some of the presentations being introductions to a concept for the more general public, and others being very detailed on a narrower subject for those in scientific professions.

‘Pugwash 2006’ Page from the presentation ‘When is an Embryo not an Embryo’, 2006, (Add MS 89202/12/163). Copyright the estate of Anne McLaren.

From looking at her PowerPoint documents it seems McLaren’s goals were to educate her audience on scientific ideas and encourage them to think critically, whether they were scientists themselves or not. However, this is hard to confirm, as the PowerPoints are only partial artefacts of her presentations, and what she said during those presentations is not captured in the collection. While she did sometimes present her own views in the slides, she presented other viewpoints as well. This is seen in the presentation for the 2006 Pugwash Conference (Add MS 89202/12/163) titled ‘When is an Embryo not an Embryo’ which presents semantic, legislative, and scientific definitions of the term embryo before a slide reveals McLaren’s own views, then goes back to legislative definitions before the slideshow ends. The Pugwash Conferences on Science and World Affairs were created to ensure the peaceful application of scientific advances, and McLaren was a council member for many years.

***

Both the newly released Anne McLaren Supplementary Papers (Add MS 89202), along with the first tranche of McLaren’s papers (Add MS 83830-83981) are available to researchers via the British Library Explore Archives and Manuscripts Catalogue. Additionally one of Anne McLaren’s notebooks containing material from 1965 to 1968 (Add MS 83845) is on long-term display in the British Library’s Treasures Gallery.

Posted by The Science Team at 10:01 AM in Bioscience , Curiosity , Digital scholarship , Engagement , Manuscripts , Research , Research collaboration , Science , Science communication , Science policy | Permalink

Tags: Anne McLaren , digital manuscripts , history of science , manuscripts , reproductive biology , scientific ethics

29 August 2017

I4OC: The British Library and open data

In August the British Library joined the Initiative for Open Citations as a stakeholder. The I4OC’s aim of promoting the availability of structured, separable, open citation data fits perfectly with the Library's established strategy for open metadata which has just marked its seventh anniversary.

In August 2010, responding to UK Government calls for increased access to public data to promote transparency, economic growth and research, the British Library launched the strategy by offering over 16m CC0 licensed records from its catalogue and national bibliography datasets. This initiative aimed to remove constraints created by restrictive licensing and library specific standards to enable wider community re-use. In doing so the Library aimed to unlock the value of the data while improving access to information and culture in line with its wider strategic objectives.

The initial release was followed in 2011 by the launch of the Library’s first Linked Open Data (LOD) bibliographic service. The Library believed Linked Open Data to be a logical evolutionary step for the established principle of freedom of access to information, offering trusted knowledge organisations a central role in the new information landscape. The development proved influential among the library community in moving the Linked Data debate from theory to practice.

Over 1,700 organisations in 123 countries now use the Library’s open metadata services with many more taking single files. The value of the Library’s open data work was recognised by the British National Bibliography linked dataset receiving a 5 star rating on the UK Government Data.gov.uk site and certification from the Open Data Institute (ODI). In 2016 the Library launched the http://data.bl.uk/ platform in order to offer copies of a range of its datasets available for research and creative purposes. In addition, the BL Labs initiative continues to explore new opportunities for public use of the Library’s digital collections and data in exciting and innovative ways. The British Library therefore remains committed to an open approach to enable the widest possible re-use of its rich metadata and generate the best return on the investment in its creation.

I4OC users by country

As the example of the British Library’s open data work shows, opening up metadata facilitates access to information, creates efficiencies and allows others to enhance existing and develop new services. This is particularly important for researchers and others who do not work for organisations with subscriptions to commercial citation databases. The British Library believes that opening up metadata on research facilitates both improved research information management and original research, and therefore benefits all.

The I4OC’s recent call to arms for its stakeholders is therefore very much in tune with the British Library’s open data work in promoting the many benefits of freely accessible citation data for scholars, publishers and wider communities. Such benefits proved compelling enough to enable the I4OC to secure publisher agreement for nearly half of indexed scholarly data to be made openly accessible. This data is now being used in a range of new projects and services including OpenCitations and Wikidata. It's encouraging to see I4OC spreading the open data ideal so successfully and it is to be hoped that it will also succeed in ensuring open citations become the default in future.

Correction: Image shows users of BL open data services by country, not I4OC

Posted by The Science Team at 2:17 PM in Data , Digital scholarship , Open access , Open data , Science , Science communication | Permalink

Tags: I4OC , open access , open citations , open metadata

08 June 2017

Untangling academic publishing

Untangling Academic Publishing logo. Creator uncredited, published under CC-BY

On the 25^th of May we attended the launch of the report Untangling Academic Publishing by Aileen Fyfe and others (https://zenodo.org/record/546100). The report describes the history of scholarly publishing from the nineteenth century to the modern era of open access, “crises” in affordability of journals and books, and controversy over commercial publishers’ profits and competing business models.

The report discusses the post-WWII evolution of scholarly publishing from an original model where learned societies saw dissemination of research results as simply a part of their essential activity, with no expectations of profit and many copies of journals distributed free to public, academic and scholarly subscription libraries. After WWII an alliance became formed with profit-seeking scholarly publishers, under the pressure of the increasing quantity of publically-funded academic research and increasingly large numbers of universities and professional researchers in the developed world, and a growing proliferation of subdisciplines. Commercial publishers turned scholarly publication into a profitable business by setting up journals for subdisciplines without their own journals or learned societies, selling to institutions, and internationalising the market.

It was during this time that the current system of peer review was developed, and publication metrics became increasingly used to assess the prestige of individual academics and reward them with career progression and funding.

However, since the 1980s this period of close association between the interests of scholars and commercial publishers has ended, due to further expansion of the research base, reduced library budgets due to inflation and cuts in funding, and in the UK specifically issues related to exchange rates. University libraries have struggled to afford journal subscriptions and monograph purchases, leading to a vicious circle of declining sales and increasing costs. Increasingly scholars at all but the wealthiest institutions have found themselves unable to legally obtain material that they need to read, and resentment of the profit margins made by the “big four” commercial scholarly publishers in particular has developed.

Hopes that digital publication would allow cost-cutting have failed to materialise, with publishers arguing that the actual costs of distributing and printing hard copy publications are relatively small compared to editorial costs, and that providing online access mechanisms with the robustness and additional features that users want is not as cheap as some initial enthusiasts assumed. Open access, which covers a variety of business models not based on charging for access at the point of use, has been promoted for almost twenty years, but has failed to replace subscription publishing or, to a great extent, to challenge the market dominance of major commercial publishers, with much open access publishing based on the “gold” business model funded by article processing charges paid by authors or research funders, often offered by commercial publishers as an alternative. Hence universities often find themselves faced with paying both subscriptions and article processing charges instead of just subscriptions, and mechanisms offered by publishers to offset one against the other have been criticised as lacking transparency.

At the event, there were presentations by Dr. Fyfe, her co-author Stephen Curry (whose views can be found here), and David Sweeney, Executive Chair Designate of Research England. Mr. Sweeney welcomed the report for describing the situation without demonising any parties, and pointed out that publishers are adding value and innovating. He suggested that a major current issue is that academics who choose how to publish their work have no real connection to the way that it is paid for – either by their institutional libraries paying subscriptions or by funders paying APC’s – and hence are often not aware of this as an issue. It was pointed out in discussion after the event that the conversation about publishing models is still almost completely among librarians and publishers, with few authors involved unless they are very interested in the subject – the report is aimed partly at raising awareness of the issues among authors.

The general argument of the report is that it is time to look again at whether learned societies should be taking more of a role in research dissemination and maybe financially supporting it, with particular criticism of those learned societies who contract out production of their publications to commercial publishers and do not pay attention to those publishers’ policies and behaviour. Although there is no direct allusion, it is interesting that soon after the report’s launch, this post was published on Scholarly Kitchen, discussing the concept of society-funded publication and putting forward the name of “diamond open access” for it.

Posted by The Science Team at 2:32 PM in Digital scholarship , Open access , Science , Science communication , Science policy | Permalink

Tags: academic publishing , scholarly communication

03 February 2017

HPC & Big Data

Matt and Philip attended the HPC & Big Data conference on Wednesday 1^st February. This is an annual one-day conference on the uses of high-performance computing and especially on big data. “Big data” is used widely to mean very large collections of data in science, social science, and business.

There were some very interesting presentations over the day. Anthony Lee from our friends the Turing Institute discussed the Institute’s plans for the future and the potential of big data in general. The increasing amounts of data being created in “big science” scientific experiments and the world at large mean that the problems of research have shifted from data collection being the hard part to processing capabilities being overwhelmed by the sheer volume of data.

A presentation from the Earlham Institute and Verne Global revealed that Iceland could become a centre for high-performance computing in the future, thanks to its combination of cheap, green electricity from hydroelectric and geothermal power, high-bandwidth data links to other continents, and a cool climate which reduces the need for active cooling of equipment. HPC worldwide now consumes more energy than the entire airline industry and whole countries of the size and development level of Italy and Spain.

Dave Underwood of the Met Office described the Met Office’s acquisition of the largest HPC computer in Europe. He also pointed out the extreme male-biased demographic of the event, something that both Matt and Philip had noticed (although we admit, one of our female team members could have gone instead of Philip).

Luciano Floridi of Oxford University discussed the ethical issues of Big Data and pointed out that as intangibles become a greater portion of companies’ value, so scandal becomes more damaging to them. Current controversies involving behaviour on the internet suggest that moral principles of security, privacy, and freedom of speech may be increasingly conflicting with one another, leading to difficult questions of how to balance them.

JISC gave a presentation on their actual and planned shared HPC data centres, and invited representatives from our friends and neighbours at the Crick Institute, and the Wellcome Trust’s Sanger Institute on their IT plans. Alison Davis from Crick pointed out that an under-rated problem for academic IT departments is individual researchers’ desire to carry huge quantities of digital data with them when they move institutions, causing extra demand on storage and raising difficult issues of ownership.

Finally, Richard Self of the University of Derby gave an illuminating presentation on the potential pitfalls of “big data” in social science and business, such as the fact that the size of a sample does not guarantee that it is representative of the whole population, the probability of finding apparent correlations in a large sample that are created by chance and not causation, and the lack of guaranteed veracity. (For example, in one investigation 14% of geographical locations from mobile phone data were 65km or more out of place.)

Philip Eagle, Content Expert - STM

Posted by The Science Team at 2:14 PM in Bioscience , Data , Digital scholarship , Science | Permalink

05 September 2016

Social Media Data: What’s the use?

Team ScienceBL is pleased to bring you #TheDataDebates - an exciting new partnership with the AHRC, the ESRC and the Alan Turing Institute. In our first event on 21st September we’re discussing social media. Join us!

Every day people around the world post a staggering 400 million tweets, upload 350 million photos to Facebook and view 4 billion videos on YouTube. Analysing this mass of data can help us understand how people think and act but there are also many potential problems. Ahead of the event, we looked into a few interesting applications of social media data.

Politically correct?

During the 2015 General Election, experts used a technique called sentiment analysis to examine Twitter users’ reactions to the televised leadership debates¹. But is this type of analysis actually useful? Some think that tweets are spontaneous and might not represent the more calculated political decision of voters.

On the other side of the pond, Obama’s election strategy in 2012 made use of social media data on an unprecedented scale². A huge data analytics team looked at social media data for patterns in past voter characteristics and used this information to inform their marketing strategy - e.g. broadcasting TV adverts in specific slots targeted at swing voters and virtually scouring the social media networks of Obama supporters on the hunt for friends who could be persuaded to join the campaign as well.

Image from Flickr

In this year's US election, both Hillary Clinton and Donald Trump are making the most of social media's huge reach to rally support. The Trump campaign has recently released the America First app which collects personal data and awards points for recruiting friends³. Meanwhile Democrat nominee Clinton is building on the work of Barack Obama's social media team and exploring platforms such as Pinterest and YouTube⁴. Only time will tell who the eventual winner will be.

Playing the market

You know how Amazon suggests items you might like based on the items you’ve browsed on their site? This is a common marketing technique that allows companies to re-advertise products to users who have shown some interest in the brand but might not have bought anything. Linking browsing history to social media comments has the potential to make this targeted marketing even more sophisticated⁴.

Credit where credit’s due?

Many ‘new generation’ loan companies don’t use a traditional credit checks but instead gather other information on an individual - including social media data – and then decide whether to grant the loan⁵. Opinion is divided as to whether this new model is a good thing. On the one hand it allows people who might have been rejected by traditional checks to get credit. But critics say that people are being judged on data that they assume is private. And could this be a slippery slope to allowing other industries (e.g. insurance) to gather information in this way? Could this lead to discrimination?

Image from Flickr

What's the problem?

Despite all these applications there’s lots of discussion about the best way to analyse social media data. How can we control for biases and how do we make sure our samples are representative? There are also concerns about privacy and consent. Some social media data (like Twitter) is public and can be seen and used by anyone (subject to terms and conditions). But most Facebook data is only visible to people specified by the user. The problem is: do users always know what they are signing up for?

Image from Pixabay

Lots of big data companies are using anonymised data (where obvious identifiers like name and date of birth are removed) which can be distributed without the users consent. But there may still be the potential for individuals to be re-identified - especially if multiple datasets are combined - and this is a major problem for many concerned with privacy.

If you are an avid social media user, a big data specialist, a privacy advocate or are simply interested in finding out more join us on 21^st September to discuss further. Tickets are available here.

Katie Howe

Posted by The Science Team at 10:40 AM in Curiosity , Data , Digital scholarship , Engagement , Open data , Research , Science , Science communication , Science policy | Permalink

15 March 2016

Tunny and Colossus: Donald Michie and Bletchley Park

In honour of British Science Week Jonathan Pledge explores the work of Donald Michie, a code-breaker at Bletchley Park from 1942 to 1945. The Donald Michie papers are held at the British Library.

Donald Michie (1923-2007) was a scientist who made key contributions in the fields of cryptography, mammalian genetics and artificial intelligence (AI).

Copy of a photograph of Donald Michie taken while he was at Bletchley Park (Add MS 89072/1/5). Copyright the estate of Donald Michie/Crown Copyright.

In 1942, Michie began working at Bletchley Park in Buckinghamshire as a code-breaker under Max H. A. Newman. His role was to decrypt the German Lorenz teleprinter cypher - codenamed ‘Tunny’.

The Tunny machine was attached to a teleprinter and encoded messages via a system of two sets of five rotating wheels, named ‘psi’ and ‘chi’, by the code-breakers. The starting position of the wheels, known as a wheel pattern, was decided by a predetermined code before the operator entered the message. The encryption worked by generating an additional letter, derived from the addition of each letter generated by the psi and chi wheels to each letter of the unencrypted message entered by the operator. The addition worked by using a simple rule represented here as dots and crosses:

• + • = •

x + x = •

• + x = x

x + • = x

Therefore using these rules, M in the teleprinter alphabet, represented as: • • x x x, added to N: • • x x •, gives • • • • x, the letter T.

Detail of the Lorenz machine showing the encoding wheels. Creative Commons Licence.

In order for messages to be decrypted it was initially necessary to know the position of the encoding wheels before the message was sent. These were initially established by the use of ‘depths’. A depth occurred when the Tunny operator mistakenly repeated the same message with subtle textual differences without first resetting the encoding wheels.

A depth was first intercepted on 30 August 1941 and the encoding text was deciphered by John Tiltman. From this the working details of Tunny were established by the mathematician William Tutte without his ever having seen the machine itself; an astonishing feat. Using Tutte’s deduction the mathematician Alan Turing came up with a system for devising the wheel patterns; known as ‘Turingery’.

Turing, known today for his role in breaking the German navy’s ‘Enigma ‘code, was at the time best known for his 1936 paper ‘On Computable Numbers’ in which he had theorised about a ‘Universal Turing Machine’ which today we would recognise as a computer. Turing’s ideas on ‘intelligent machines’, along with his friendship, were to have a lasting effect on Michie and his future career in AI and robotics.

Between July and October 1942, all German Tunny messages were decrypted by hand. However changes to the way the cypher was generated meant that finding the wheel setting by hand was no longer feasible. It was again William Tutte who came up with a statistical method for finding the wheels settings and it was the mathematician Max Newman who suggested using a machine for processing the data.

Colossus computer [c 1944]. By the end of the War there were ten such machines at Bletchley. Crown Copyright.

Initially an electronic counter dubbed ‘Heath Robinson’ was used for data processing. However it was not until the engineer Thomas Flowers, designed and built Colossus, the world’s first large scale electronic computer, that wheel patterns and therefore the messages could be decrypted at speed. Michie too, along with Jack Good, played a part, discovering a way of using Colossus to dramatically reduce the processing time for ciphered texts.

The decrypting of Tunny messages was critical in providing the Allies with information on high level German military planning in particular for the Battle of Kursk in 1943 and surrounding preparations for the D-Day invasion of 1944

One of the great ironies is that much of this pioneering and critical work remained a state secret until 1996. It was only through Donald Michie’s tireless campaigning that the General Report on Tunny, written in 1945 by Michie, Jack Good and Geoffrey Timmins, was finally declassified by the British Government; providing proof of the code-breakers collective achievements during the War.

Pages from Donald Michie’s copy of the General Report on Tunny. (Add MS 89072/1/6). Crown Copyright.

Donald Michie at the British Library

The Donald Michie Papers at the British Library comprises of three separate tranches of material gifted to the library in 2004 and 2007. They consist of correspondence, notes, notebooks, offprints and photographs and are available to researchers through the British Library’s Explore Archives and Manuscripts catalogue at Add MS 88958, Add MS 88975 and Add MS 89072.

Jonathan Pledge: Curator of Contemporary Archives and Manuscripts, Public and Political Life

Read more about ciphers in the British Library's collections on Untold Lives

Posted by The Science Team at 2:42 PM in Curiosity , Data , Digital scholarship , Engagement , Research , Science , Science communication | Permalink | Comments( 1)

Science blog

20 posts categorized "Digital scholarship"

Lab notebooks - handwriting at the core of science

Arabic science manuscripts from the British Library

New psychology and nature databases on trial at the BL

Augmented reality - it isn't just for catching mons.

Digital preservation and the Anne McLaren Papers

I4OC: The British Library and open data

Untangling academic publishing

HPC & Big Data

Social Media Data: What’s the use?

Tunny and Colossus: Donald Michie and Bletchley Park

Science blog recent posts

Archives

Tags

Science links

Other British Library blogs