Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

12 October 2020

Fiction Readers Wanted for PhD Research Study

This a guest post is by British Library collaborative doctoral student Carol Butler, you can follow her on twitter as @fantomascarol.

Update: Due to a phenomenal response, Carol has recruited enough interviewees for the study, so the link to the application form has been removed (13/10/2020).

In 2016 I started a PhD project in partnership with the British Library and the Centre for Human-Computer Interaction Design (CHCID) at City, University of London. My research has focused on the phenomena of fiction authors interacting with readers through online media, such as websites, forums and social media, to promote and discuss their work. My aim is to identify potential avenues for redesigning or introducing new technology to better support authors and readers. I am now in my fourth and final year, aiming to complete my research this winter.

The internet has impacted how society interacts with almost everything, and literature has been no exception. It’s often thought that if a person or a business is not online, they are effectively invisible, and over the last ten years or so it has become increasingly common – expected, even - for authors to have an online presence allowing readers, globally, to connect with them.

Opportunities for authors and readers to interact together existed long before the internet, through events such as readings, signings, and festivals. The internet does not replace these – indeed, festivals have grown in popularity in recent years, and many have embraced technology to broaden their engagement outside of the event itself. However, unlike organised events, readers and authors can potentially interact online far more directly, outside of formal mediation. Perceived benefits from this disintermediation are commonly hailed – i.e. that it can break down access barriers for readers (e.g. geography and time, so they can more easily learn about the books they enjoy and the person behind the story), and help authors to better understand their market and the reception to their books. However, being a relatively new phenomenon, we don’t know much yet about how interacting with each other online may differ to doing so at a festival or event, and what complications the new environment may introduce to the experience, or even exacerbate. It is this research gap that my work has been addressing.

Early in my research, I conducted interviews with fiction authors and readers who use different online technologies (e.g. social media such as Twitter and Facebook, forums such as Reddit, or literary-specific sites such as GoodReads) to interact with other readers and authors. All participants generously shared their honest, open accounts about what they do, where and why, and where they encounter problems. It became clear that, although the benefits to being online are widely accepted and everyone had good experiences to report, in reality, people’s reasons for being online were riddled with contradictions, and, in some cases, it was debatable whether the positives outweighed the negatives, or whether the practice served a meaningful purpose at all. Ultimately – it’s complex, and not everything we thought we knew is necessarily as clear cut as it’s often perceived. 

This led me to make a U-turn in my research. Before working out how to improve technology to better support interactions as they currently stand, I needed to find out more about people’s motivations to be online, and to question whether we were focused on the right problem in the first place. From this I’ve been working to reframe how we, in the research field of Human-Computer Interaction, may understand the dynamics between authors and readers, by building a broader picture of context and influences in the literary field.

I’m going to write another blog post in the coming months to talk about what I’ve found, and what I think we need to focus on in the near future. In particular, I think it is important to improve support for authors, as many find themselves in a tricky position because of the expectation that they are available and public-facing, effectively 24/7. However, before I expand on that, I am about to embark on one final study to address some outstanding questions I have about the needs of their market – fiction readers. 

Over the next few weeks, I will be recruiting people who read fiction – whether they interact online about reading or not - to join me for what I am informally referring to as ‘an interview with props’. This study is happening a few months later than I’d originally intended, as restrictions in relation to Covid-19 required me to change my original plans (e.g. to meet people face-to-face). My study has ‘gone digital’, changing how I can facilitate the sessions, and what I can realistically expect from them.

I will be asking people to join me to chat online, using Zoom, to reflect on a series of sketched interface design ideas I have created, and to discuss their current thoughts about authors being available online. The design sketches represent deviations from the technology currently in common use - some significant, and some subtle. The designs are not being tested on behalf of any affiliated company, and neither do I necessarily anticipate any of them to be developed into working technology in the future. Ultimately, they are probes to get us talking about broader issues surrounding author and reader interactions, and I’m hoping that by getting peoples perspectives about them, I’ll learn more about why the designs *don’t* work, moreover why they do, to help inform future research and design work.

I’ve been ‘umming and ahhing’ about how best to share these designs with participants through a digital platform. Sitting together in the same room, as I’d originally planned, we could all move them around, pick them up, take a red pen to them, make notes on post-its, and sketch alternative ideas on paper. There are fantastic online technologies available these days, which have proved invaluable during this pandemic. But they can’t provide the same experience that being physically present together can (a predicament which, perhaps ironically, is fitting with the research problem itself!).

A screen image of the Miro platform, showing a drawing of a person wearing glasses, with a text box underneath saying Favourite Author
A sneaky peek at a sketch in the making, on Miro

I have decided to use a website called Miro.com to facilitate the study – an interactive whiteboard tool that allows participants to add digital post-it notes, doodles, and more. I’ve never used it before now, and to my knowledge there is no published research out there (yet) by others in my research field who have used it with participants, for me to learn from their experience. I think I must prepare myself for a few technical glitches! But I am hopeful that participants will enjoy the experience, which will be informal, encouraging, and in no way a judgement of their abilities with the technology. I am confident that their contribution will greatly help my work – and future work which will help authors and readers in the real world.

If anyone who is reading this is interested in participating, please do get in touch. Information about the study and how to contact me can be found here or please email [email protected].

Update: Due to a phenomenal response, Carol has recruited enough interviewees for the study, so the link to the application form has been removed (13/10/2020). Thanks to everyone who has applied.

05 October 2020

2020 New Media Writing Prize is Open

The New Media Writing Prize (NMWP) is in an annual international award, which encourages and promotes the best in new media writing; showcasing innovative digital fiction, poetry and journalism. The types of interactive writing that we have been examining, researching and tentatively collecting in our emerging formats work at the Library.

Last year we celebrated ten years of the prize, looking back over previous winning entries, with a Digital Conversation event at the British Library. Now we are looking forward to seeing what types of work will be entered into this year's prize.

NMWP logo, with a game controller on the N, a microphone on the M, headphones on the W and a pen pot on the P

If you are a writer of interactive works, then you may be interested to know that the 2020 New Media Writing prize is currently open for entries. You can nominate works via the online entry form at https://newmediawritingprize.co.uk/enter/. This year, there is only one category, the if:book UK New Media Writing prize. However, you can enter fiction, poetry, journalism, games, anything as long as it is interactive and makes use of digital media. The deadline is Friday 27th November 2020, 12 noon GMT, or for student entries, these must be entered by Friday 18th December 2020, 12 noon GMT. The organisers are especially encouraging entries from students and will give special consideration to entries from students at undergraduate or postgraduate level. 

There is one award of £1000 for the winner, and there will be commendations for shortlisted works, which the judges feel are deserving of a special mention. All the rules are here, and please do read the FAQs section of the NMWP website, which has more details about what the judges are looking for in entries. If you have a question that is not covered by the FAQ, then you can email the organisers at [email protected]. You may also want to check out the winners and shortlisted entries from the 2019 prize, which I blogged about here, for inspiration. If you do enter, then good luck!

A laptop and an old fashioned typewriter facing each other

This post is by Digital Curator Stella Wisdom (@miss_wisdom). 

25 September 2020

Making Data Into Sound

This is a guest post by Anne Courtney, Gulf History Cataloguer with the Qatar Digital Library, https://www.qdl.qa/en 

Sonification

Over the summer, I’ve been investigating the sonification of data. On the Qatar Project (QDL), we generate a large amount of data, and I wanted to experiment with different methods of representing it. Sonification was a new technique for me, which I learnt about through this article: https://programminghistorian.org/en/lessons/sonification.

 

What is sonification?

Sonification is the method of representing data in an aural format, rather than visual format, such as a graph. It is particularly useful for showing changes in data over time. Different trends are highlighted depending on the choices made during the process, in the same way as they would be when drawing a graph.

 

How does it work?

First, all the data must be put in the right format:

An example of data in Excel showing listed longitude points of
Figure 1: Excel data of longitude points where the Palsgrave anchored

Then, the data is used to generate a midi file. The Programming Historian provides an example python script for this, and by changing parts of it, it is possible to change the tempo, note length, scale, and other features.

Python script ready to output a midi file of occurrences of Anjouan over time
Figure 2: Python script ready to output a midi file of occurrences of Anjouan over time

Finally, to overlay the different midi files, edit them, and change the instruments, I used MuseScore, freely-downloadable music notation software. Other alternatives include LMMS and Garageband:

A music score with name labels of where the Discovery, Palsgrave, and Mary anchored on their journeys, showing different pitches and musical notations.
Figure 3: The score of the voyages of the Discovery, Palsgrave, and Mary, labelled to show the different places where they anchored.

 

The sound of authorities

Each item which the Qatar project catalogues has authority terms linked to it, which list the main subjects and places connected to the item. As each item is dated, it is possible to trace trends in subjects and places over time by assigning the dates of the items to the authority terms. Each authority term ends up with a list of dates when it was mentioned. By assigning different instruments to the different authorities, it is possible to hear how they are connected to each other.

This sound file contains the sounds of places connected with the trade in enslaved people, and how they intersect with the authority term ‘slave trade’. The file begins in 1700 and finishes in 1900. One of the advantages of sonification is that the silence is as eloquent as the data. The authority terms are mentioned more at the end of the time period than the start, and so the piece becomes noisier as the British increasingly concern themselves with these areas. The pitch of the instruments is determined, in this instance, by the months of the records in which they are mentioned.

Authorities

The authority terms are represented by these instruments:

Anjouan: piccolo

Madagascar: cello

Zanzibar: horn

Mauritius: piano

Slave Trade: tubular bell

 

Listening for ships

Ships

This piece follows the journeys of three ships from March 1633 to January 1637. In this example, the pitch is important because it represents longitude; the further east the ships travel, the higher the pitch. The Discovery and the Palsgrave mostly travelled together from Gravesend to India, and they both made frequent trips between the Gulf and India. The Mary set out from England in April 1636 to begin her own journey to India. The notes represent the time the ships spent in harbour, and the silence is the time spent at sea. The Discovery is represented by the flute, the Palsgrave by the violin, and the Mary by the horn.

23 September 2020

Mapping Space, Mapping Time, Mapping Texts

For many people, our personal understanding of time has been challenged during the covid-19 pandemic, with minutes, hours and days of the week seeming to all merge together into "blursday", without our previous pre covid-19 routines to help us mark points in time.

Talking of time, the AHRC-funded Chronotopic Cartographies research project has spent the last few years investigating how we might use digital tools to analyse, map, and visualise the spaces, places and time within literary texts. It draws on the literary theorist Mikhail Bakhtin's concept of the 'chronotope': a way of describing how time and place are linked and represented in different literary genres.

To showcase research from this project, next Tuesday (29th September 2020) we are co-hosting with them an online interdisciplinary conference: "Mapping Space, Mapping Time, Mapping Texts". 

Many blue dots connected with purple lines, behind text saying Mapping Space, Mapping Time, Mapping Texts

The "Mapping Space, Mapping Time, Mapping Texts" registration page is here. Once you have signed up, you will receive an email with links to recorded keynotes and webinar sessions. You will also received an email with links to the Flickr wall of virtual research posters and hangout spaces, on the morning of the conference.

The conference will go live from 09.00 BST, all webinars and live Q&A sessions will be held in Microsoft Teams. If you don't have Teams installed, you can do so before the event here. We appreciate that many participants will be joining from different time zones and that attendees may want to dip in and out of sessions; so please join at whatever pace suits you.

Our keynote speakers: James Kneale, Anders Engberg-Pederson and Robert T. Tally Jr have provided recordings of their presentations and will be joining the event for live Q&A sessions over the course of the day. You can watch the keynote recordings at any time, but if you want to have the conference experience, then log in to the webinars at the times below so you can participate "live" across the day. Q&A sessions will be held after each keynote at the times below. 

Schedule:

9.00 BST: Conference goes live, keynotes and posters available online, urls sent via email.

9.30: Short introduction and welcome from Sally Bushell

10.00-11.00: First Keynote: James Kneale

11.00-11.30: Live Q&A (chaired by Rebecca Hutcheon)

2.00-3.00: Second Keynote: Anders Engberg-Pedersen

3.00-3.30: Live Q&A (chaired by Duncan Hay)

5.00-6.00: Third Keynote: Robert T. Tally Jr

6.00-6.30: Live Q&A (chaired by Sally Bushell)

In the breaks between sessions, please do browse the online Flickr wall of research posters and hang out in conference virtual chat room.

We very much look forward to seeing you on-screen, on the day (remember it is Tuesday, not Blursday!).

This post is by Digital Curator Stella Wisdom (@miss_wisdom

18 September 2020

Hiring a new Wikimedian in Residence

Are you passionate about helping people and organisations build and preserve open knowledge to share and use freely? Have you got experience organising online events, workshops and training sessions? Then you may be interested in applying to be our new Wikimedian in Residence.

In collaboration with Wikimedia UK, the British Library is working on contributing and improving content, data, and metadata, across the Wikimedia family of platforms.

I recently ran a “World of Wikimedia” series of remote guest lectures for Library staff, to inspire my colleagues, and to further assist with this work, the Library is hiring a Wikimedian in Residence to join the Digital Scholarship team, on a part-time basis (18 hours per week) for 12 months.

8 people standing outside the entrance of the British Library
A Wikipedians in Residence group photo, taken at GLAMcamp London, 15-16 September 2012 (photo by Rock drum, Wikimedia Commons / CC-BY-SA-3.0)

Since hosting a successful Wikipedian in Residence in 2012 (this was Andrew Gray, who is standing second in from the right in the above photo, you can read about his residency here), many staff across the British Library have engaged with Wikimedia projects, holding edit-a-thons, and adding digital collections to Wikimedia Commons.

Now, with generous funding from the Eccles Centre for American Studies, we are looking for a proactive and self-motivated individual who can coordinate and support these activities. Furthermore, we are hoping for someone who can really help the Library to actively engage with the Wikidata, Wikibase and Wikisource platforms and communities. Increasing the visibility and enrichment of data, collections, and research materials, which the Library holds about underrepresented populations.

If this sounds like something you can do, then please do apply. The vacancy ref is 03423, closing date is 8th October 2020 and the interview date is 23rd October 2020. The post is part time 2.5 days per week, for 12 months, and initially work will be done remotely, in light of the current COVID 19 situation. However, longer term, it is likely that there will be a mix of remote and on site working.

During my time working in the Library, we have hosted a number of wonderful residencies, including Christopher Green, Rob Sherman and Sarah Cole, who each brought fresh skills, knowledge and enthusiasm, into the Library. So I very much hope that this new residency will do the same.

This post is by Digital Curator Stella Wisdom (@miss_wisdom

14 September 2020

Digital geographical narratives with Knight Lab’s StoryMap

Visualising the journey of a manuscript’s creation

Working for the Qatar Digital Library (QDL), I recently catalogued British Library oriental manuscript 2361, a musical compendium copied in Mughal India during the reign of Aurangzeb (1618-1707; ruled from 1658). The QDL is a British Library-Qatar Foundation collaborative project to digitise and share Gulf-related archival records, maps and audio recordings as well as Arabic scientific manuscripts.

Portrait of Aurangzeb on a horse
Figure 1: Equestrian portrait of Aurangzeb. Mughal, c. 1660-70. British Library, Johnson Album, 3.4. Public domain.

The colophons to Or. 2361 fourteen texts contain an unusually large – but jumbled-up – quantity of information about the places and dates it was copied and checked, revealing that it was largely created during a journey taken by the imperial court in 1663.

Example of handwritten bibliographic information: Colophon to the copy of Kitāb al-madkhal fī al-mūsīqī by al-Fārābī
Figure 2: Colophon to the copy of Kitāb al-madkhal fī al-mūsīqī by al-Fārābī, transcribed in Delhi, 3 Jumādá I, 1073 hijrī/14 December 1662 CE, and checked in Lahore, 22 Rajab 1073/2 March 1663. Or. 2361, f. 240r.

Seeking to make sense of the mass of bibliographic information and unpick the narrative of the manuscript’s creation, I recorded all this data in a spreadsheet. This helped to clarify some patterns- but wasn’t fun to look at! To accompany an Asian and African Studies blog post, I wanted to find an interactive digital tool to develop the visual and spatial aspects of the story and convey the landscapes and distances experienced by the manuscript’s scribes and patron during its mobile production.

Screen shot of a spreadsheet of copy data for Or. 2361 showing information such as dates, locations, scribes etc.
Figure 3: Dull but useful spreadsheet of copy data for Or. 2361.

Many fascinating digital tools can present large datasets, including map co-ordinates. However, I needed to retell a linear, progressive narrative with fewer data points. Inspired by a QNF-BL colleague’s work on Geoffrey Prior’s trip to Muscat, I settled on StoryMap, one of an expanding suite of open-source reporting, data management, research, and storytelling tools developed by Knight Lab at Northwestern University, USA.

 

StoryMap: Easy but fiddly

Requiring no coding ability, the back-end of this free, easy-to-use tool resembles PowerPoint. The user creates a series of slides to which text, images, captions and copyright information can be added. Links to further online media, such as the millions of images published on the QDL, can easily be added.

Screen shot of someone editing in StoryMap
Figure 4: Back-end view of StoryMap's authoring tool.

The basic incarnation of StoryMap is accessed via an author interface which is intuitive and clear, but has its quirks. Slide layouts can’t be varied, and image manipulation must be completed pre-upload, which can get fiddly. Text was faint unless entirely in bold, especially against a backdrop image. A bug randomly rendered bits of uploaded text as hyperlinks, whereas intentional hyperlinks are not obvious.

 

The mapping function

StoryMap’s most interesting feature is an interactive map that uses OpenStreetMap data. Locations are inputted as co-ordinates, or manually by searching for a place-name or dropping a pin. This geographical data links together to produce an overview map summarised on the opening slide, with subsequent views zooming to successive locations in the journey.

Screen shot showing a preview of StoryMap with location points dropped on a world map
Figure 5: StoryMap summary preview showing all location points plotted.

I had to add location data manually as the co-ordinates input function didn’t work. Only one of the various map styles suited the historical subject-matter; however its modern street layout felt contradictory. The ‘ideal’ map – structured with global co-ordinates but correct for a specific historical moment – probably doesn’t exist (one for the next project?).

Screen shot of a point dropped on a local map, showing modern street layout
Figure 6: StoryMap's modern street layout implies New Delhi existed in 1663...

With clearly signposted advanced guidance, support forum, and a link to a GitHub repository, more technically-minded users could take StoryMap to the next level, not least in importing custom maps via Mapbox. Alternative platforms such as Esri’s Classic Story Maps can of course also be explored.

However, for many users, Knight Lab StoryMap’s appeal will lie in its ease of usage and accessibility; it produces polished, engaging outputs quickly with a bare minimum of technical input and is easy to embed in web-text or social media. Thanks to Knight Lab for producing this free tool!

See the finished StoryMap, A Mughal musical miscellany: The journey of Or. 2361.

 

This is a guest post by Jenny Norton-Wright, Arabic Scientific Manuscripts Curator from the British Library Qatar Foundation Partnership. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

11 September 2020

BL Labs Public Awards 2020: enter before NOON GMT Monday 30 November 2020! REMINDER

The sixth BL Labs Public Awards 2020 formally recognises outstanding and innovative work that has been carried out using the British Library’s data and / or digital collections by researchers, artists, entrepreneurs, educators, students and the general public.

The closing date for entering the Public Awards is NOON GMT on Monday 30 November 2020 and you can submit your entry any time up to then.

Please help us spread the word! We want to encourage any one interested to submit over the next few months, who knows, you could even win fame and glory, priceless! We really hope to have another year of fantastic projects to showcase at our annual online awards symposium on the 15 December 2020 (which is open for registration too), inspired by our digital collections and data!

This year, BL Labs is commending work in four key areas that have used or been inspired by our digital collections and data:

  • Research - A project or activity that shows the development of new knowledge, research methods, or tools.
  • Artistic - An artistic or creative endeavour that inspires, stimulates, amazes and provokes.
  • Educational - Quality learning experiences created for learners of any age and ability that use the Library's digital content.
  • Community - Work that has been created by an individual or group in a community.

What kind of projects are we looking for this year?

Whilst we are really happy for you to submit your work on any subject that uses our digital collections, in this significant year, we are particularly interested in entries that may have a focus on anti-racist work or projects about lock down / global pandemic. We are also curious and keen to have submissions that have used Jupyter Notebooks to carry out computational work on our digital collections and data.

After the submission deadline has passed, entries will be shortlisted and selected entrants will be notified via email by midnight on Friday 4th December 2020. 

A prize of £150 in British Library online vouchers will be awarded to the winner and £50 in the same format to the runner up in each Awards category at the Symposium. Of course if you enter, it will be at least a chance to showcase your work to a wide audience and in the past this has often resulted in major collaborations.

The talent of the BL Labs Awards winners and runners up over the last five years has led to the production of remarkable and varied collection of innovative projects described in our 'Digital Projects Archive'. In 2019, the Awards commended work in four main categories – Research, Artistic, Community and Educational:

BL_Labs_Winners_2019-smallBL  Labs Award Winners for 2019
(Top-Left) Full-Text search of Early Music Prints Online (F-TEMPO) - Research, (Top-Right) Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction - Artistic
(Bottom-Left) John Faucit Saville and the theatres of the East Midlands Circuit - Community commendation
(Bottom-Right) The Other Voice (Learning and Teaching)

For further detailed information, please visit BL Labs Public Awards 2020, or contact us at [email protected] if you have a specific query.

Posted by Mahendra Mahey, Manager of British Library Labs.

07 September 2020

When is a persistent identifier not persistent? Or an identifier?

This guest post is by Jez Cope, Data Services Lead with Research Services at the British Library. He is on Twitter @jezcope.

Ever wondered what that bar code on the back of every book is? It’s an ISBN: an International Standard Book Number. Every modern book published has an ISBN, which uniquely identifies that book, and anyone publishing a book can get an ISBN for it whether an individual or a huge publishing house. It’s a little more complex than that in practice but generally speaking it’s 1 book, 1 ISBN. Right? Right.

Except…

If you search an online catalogue, such as WorldCat or The British Library for the ISBN 9780393073775 (or the 10-digit equivalent, 0393073777) you’ll find results appear for two completely different books:

  1. Waal FD. The Bonobo and the Atheist: In Search of Humanism Among the Primates. New York: W. W. Norton & Co.; 2013. 304 p. http://www.worldcat.org/oclc/1167414372
  2. Lodge HC. The Storm Has Many Eyes; a Personal Narrative. 1st edition. New York: New York Norton; 1973. http://www.worldcat.org/oclc/989188234

A screen grab of the main catalogue showing a search for ISBN 0393073777 with the above two results

In fact, things are so confused that the cover of one book gets pulled in for the other as well. Investigate further and you’ll see that it’s not a glitch: both books have been assigned the same ISBN. Others have found the same:

“However, if the books do not match, it’s usually one of two issues. First, if it is the same book but with a different cover, then it is likely the ISBN was reused for a later/earlier reprinting. … In the other case of duplicate ISBNs, it may be that an ISBN was reused on a completely different book. This shouldn’t happen because ISBNs are supposed to be unique, but exceptions have been found.” — GoodReads Librarian Manual: ISBN-10, ISBN-13 and ASINS

While most publishers stick to the rules about never reusing an ISBN, it’s apparently common knowledge in the book trade that ISBNs from old books get reused for newer books, sometimes accidentally (due to a typo), sometimes intentionally (to save money), and that has some tricky consequences.

I recently attended a webinar entitled “Identifiers in Heritage Collections - how embedded are they?” from the Persistent Identifiers as IRO Infrastructure (“HeritagePIDs”) project, part of AHRC’s Towards a National Collection programme. As quite often happens, the question was raised: what Persistent Identifier (PID) should we use for books and why can’t we just use ISBNs? Rod Page, who gave the demo that prompted this discussion, also wrote a short follow-up blog post about what makes PIDs work (or not) which is worth a look before you read the rest of this.

These are really valid questions and worth considering in more detail, and to do that we need to understand what makes a PID special. We call them persistent, and indeed we expect some sort of guarantee that a PID remains valid for the long term, so that we can use it as a link or placeholder for the referent without worrying that the link will get broken. But we also expect PIDs to be actionable: it can be made into a valid URL by following some rules: so that we can directly obtain the object referenced or at least some information about it.

Actionability implies two further properties: an actionable identifier must be

  1. Unique: guaranteed to have only one identifier for a given object (of a given type); and
  2. Unambiguous: guaranteed that a single identifier refers to only one object

Where does this leave us with ISBNs?

Well first up they’re not actionable to start with: given an ISBN, there’s no canonical way to obtain information about the book referenced, although in practice there are a number of databases that can help. There is, in fact, an actionable ISBN standard: ISBN-A permits converting an ISBN into a DOI with all the benefits of the underlying DOI and Handle infrastructure. Sadly, creation of an ISBN-A isn’t automatic and publishers have to explicitly create the ISBN-A DOI in addition to the already-create ISBN; most don’t.

More than that though, it’s hard to make them actionable since ISBNs fail on both uniqueness and unambiguity. Firstly, as seen in the example I gave above, ISBNs do get recycled, They’re not supposed to be:

“Once assigned to a monographic publication, an ISBN can never be reused to identify another monographic publication, even if the original ISBN is found to have been assigned in error.” — International ISBN Agency. ISBN Users’ Manual [Internet]. Seventh Edition. London, UK: International ISBN Agency; 2017 [cited 2020 Jul 23]. Available from: https://www.isbn-international.org/content/isbn-users-manual

Yet they are, so we can’t rely on their precision.[1]

Secondly, and perhaps more problematic in day-to-day use, a given book may have multiple ISBNs. To an extent this is reasonable: different editions of the same book may have different content, or at the very least different page numbering, so a PID should be able to distinguish these for accurate citation. Unfortunately the same edition of the same book will frequently have multiple ISBNs; in particular each different format (hardback, paperback, large print, ePub, MOBI, PDF, …) is expected to have a distinct ISBN. Even if all that changes is the publisher, a new ISBN is still created:

“We recently encountered a case where a publisher had licensed a book to another publisher for a different geographical market. Both books used the same ISBN. If the publisher of the book changes (even if nothing else about the book has changed), the ISBN must also change.” — Everything you wanted to know about the ISBN but were too afraid to ask

Again, this is reasonable since the ISBN is primarily intended for stockkeeping by book sellers[2], and for them the difference between a hardback and paperback is important because they differ in price if nothing else. This has bitten more than one librarian when trying to merge data from two different sources (such as usage and pricing) using the ISBN as the “obvious” merge key. It makes bibliometrics harder too, since you can’t easily pull out a list of all citations of a given edition in the literature, just from a single ISBN.

So where does this leave us?

I’m not really sure yet. ISBNs as they are currently specified and used by the book industry aren’t really fit for purpose as a PID. But they’re there and they sort-of work and establishing a more robust PID for books would need commitment and co-operation from authors, publishers and libraries. That’s not impossible: a lot of work has been done recently to make the ISSN (International Standard Serial Number, for journals) more actionable.

But perhaps there are other options. Where publishers, booksellers and libraries are primarily interested in IDs for stock management, authors, researchers and scholarly communications librarians are more interested in the scholarly record as a whole and tracking the flow of ideas (and credit for those) which is where PIDs come into their own. Is there an argument for a coalition of these groups to establish a parallel identifier system for citation & credit that’s truly persistent? It wouldn’t be the first time: ISNIs (International Standard Name Identifiers) and ORCIDs (Open Researcher and Contributor IDs) both identify people, but for different purposes in different roles and with robust metadata linking the two where possible.

I’m not sure where I’m going with this train of thought so I’ll leave it there for now, but I’m sure I’ll be back. The more I dig into this the more there is to find, including the mysterious, long-forgotten and no-longer accessible Book Item & Component Identifier proposal. In the meantime, if you want a persistent identifier and aren’t sure which one you need these Guides to Choosing a Persistent Identifier from Project FREYA should get you started.


  1. Actually, as my colleague pointed out, even DOIs potentially have this problem, although I feel they can mitigate it better with metadata that allows rich expression of relationships between DOIs.  ↩︎

  2. In fact, the newer ISBN-13 standard is simply an ISBN-10 encoded as an “International Article Number”, the standard barcode format for almost all retail products, by sticking the “Bookland” country code of 978 on the front and recalculating the check digit. ↩︎