Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

10 February 2022

In conversation: Meet Silvija Aurylaitė, the new British Library Labs Manager

The newly appointed manager of the British Library Labs (BL Labs), Silvija Aurylaitė, is excited to start leading the BL Labs Labs transformation with a new focus on computational creative thinking. The BL Labs is a welcoming space for everyone curious about computational research and using the British Library’s digital collections. We welcome all researchers - data scientists, digital humanists, artists, creative practitioners, and everyone curious about digital research.

Image of BL Labs Manager Silvija Aurylaite
Introducing Silvija Aurylaitė, new manager of BL Labs

Find out more from Silvija, in conversation with Maja Maricevic, BL Head of Higher Education and Science.

 

Maja: The Labs have a proud history of experimenting and innovating with the British Library’s digital collections. Can you tell us more about your own background?

Silvija: Ever since I discovered the BL Labs in London 8 years ago, I have been immersed into the world of experimentation with digital collections. I started researching collections from open GLAMs (galleries, libraries, archives and museums) around the world and the implications of copyright and licensing for creative reuse. In a large ecosystem of open digital collections, my special interest has been identifying content for people to use to bring to life their creative ideas such as new design works.

Inspired by the Labs, I started developing my own curatorial web project, which won the Europeana Creative Design Challenge in 2015. The award gave me the chance to work with a team of international experts to learn new skills in areas such as IT, copyright and social entrepreneurship. This experience later evolved into the ‘Revivo Images’, a pilot website that gives guidance on open image collections around the world, which are carefully selected for quality, reliability of copyright and licence information, with explanations how to use the databases. It was a result of collaboration with a great interdisciplinary team including an IT lead, programmers, curators, designers and a copywriter.

All this gave me invaluable experience in overseeing a digital collections web project from vision to implementation. I learned about curating content from across collections, building an image database and mapping metadata using various standards. We also used AI and human input to create keywords and thematic catalogs and designed a simple minimalist user interface.

What I most enjoyed about this journey, actually, was meeting a great range of creative people in many creative fields, from professional animators to students looking for a theme for their BA final thesis - and learning what excited them most, and what barriers they faced in using open collections. I met many of them at various art festivals, universities, design schools and events where I delivered talks and creative workshops in my free time to spread the word about open digital collections for creativity. For two years I was also responsible for the ‘Bridgeman Education’ online database, one of the largest digital image collections with over 1.300.000 images from the GLAM sector, designed for the use of art images in higher education curricula. I had the opportunity to talk to many librarians, lecturers and students from around the world about what they find most useful in this new digital turn.

As a result of this, I am particularly excited about introducing the Labs to university students: from students in computer science departments with coding skills to researchers in social sciences and humanities, to creativity champions in fashion, graphic design or jewelry, who might be attracted to aesthetic qualities of our collections or those looking to pick up creative coding skills.

The landscape has changed a lot in the last 8 years since I learned about the Labs, and I gradually started my own journey of learning code and algorithmic thinking. Already in my previous role in the British Library, as the Rights Officer for the Heritage Made Digital project, we approached digital collections as data. Now we are all embracing computational data science methods to gain new insights into digital collections, and that is what the future British Library Labs is going to celebrate.

 

Maja: You have a strong connection to the BL Labs since you were the Labs volunteer 8 years ago. What most inspired you when you first heard of the Labs?

Silvija: Personally, the Labs were my first professional experience abroad after my MA studies in intellectual history at the American university in Budapest, and happened to be one of the main incentives to stay in London.

This city has attracted me for its serendipity - you can have a great range of urban experiences from attending the oldest special interest societies and visiting antiquarian bookshops to meeting founders of latest startups in their regular gatherings and getting up to speed with the mindset of perpetual innovation.

When I first heard about the Labs in one of its public events, this sentence struck me: “experiment with the BL digital collections to create something new”, with the “new” being undefined and open. I had this idea of a perpetuity - the possibility of endlessly combining the knowledge and aesthetics of the past, safeguarded by one of the biggest libraries of the world, with the creative visions, skills and technology of today and tomorrow.

Such endless new experiences of digital collections can be accelerated by creating a dedicated space for experimentation - a collider or a matchmaker - that contributes to the diverse serendipitous urban experience of London itself. This is how I see the Labs.

Looking from a user point of view, I am particularly excited about the ‘semiotic democracy’, or ‘the ability of users to produce and disseminate new creations and to take part in public cultural discourse’[1] (Stark, 2006). I believe this new playful approach to digitise out-of-copyright cultural materials will fundamentally change the way we see GLAMs. We’ll look at them less and less as spaces that are only there to learn about the past as it used to be, as a recipient, and more and more as a co-creator, able to enter into a meaningful dialogue and reshape meanings, narratives and experiences.

 

Maja: Prior to Labs appointment, you also have a significant rights management experience. What have you learned that will be useful for the Labs?

Silvija: It was a delight to work with Matthew Lambert, the Head of Copyright, Policy & Assurance, for the Heritage Made Digital project, led by Sandra Tuppen, in setting up the British Library’s copyright workflow for both current and historical digitisation projects. This project now allows users to explore the BL’s digital images in the Universal Viewer with attributed rights statements and usage terms.

These last 3.5 years was a great exercise in dealing with very large, often very messy, data to create complex systems, policies and procedures which allow oversight of all important aspects of the digital data including copyright and licencing, data protection and sensitivities. Of course, such work in the Library is of massive importance because it affects the level of freedom we later have to experiment, reuse and do further research based on this data.

Personally, the Heritage Made Digital project is also very precious to me because of its collaborative nature. They use MS SharePoint tool to facilitate data contributions from across many departments in the BL. And they are just fantastic at promoting and celebrating digitisation as a common effort to make content publicly accessible. I will definitely use this experience to suggest solutions on how to register and document both the BL’s datasets and related reuse projects as a similar collaborative project within the Library.

 

Maja: There is so much that is changing in digital research all the time. Are there particular current developments that you find exciting and why?

Silvija: Yes! First, I find the moment of change itself exciting - there is no book about the tools we use today that won’t be running out of date tomorrow. This is a good neuroplasticity exercise that trains the mind not to sleep and be constantly attentive to new developments and opportunities.

Second, I absolutely love to see how many people, from creators to researchers and library staff, are gradually and naturally embracing code languages. With this comes associated critical thinking, such as the ability to surpass often outdated old database interfaces to reveal exciting data insights simply by having a liberating package of new digital skills.

And, third, I am super excited about the possibility of upscaling and creating a bigger impact with existing breakthrough projects and brilliant ideas relating to the British Library’s data. I believe this could be done by finding consensus on how we want to register and document data science initiatives - finalised, ongoing and most wanted, both internally and externally - and then by promoting this knowledge further.

This would allow us to enter a new stage of the BL Labs. The new ecosystem of re-use would promote sustainability, reproducibility, adaptation and crowdsourced improvement of existing projects, giving us new super powers!

↩︎ Stark, Elisabeth (2006). Free culture and the internet: a new semiotic democracy. opendemocracy.net (June 20). URL: https://www.opendemocracy.net/en/semiotic_3662jsp

09 February 2022

A Manuscript Reunited – and a IIIF Viewer Issue

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

Last year we saw a meticulous detective work by Arabic Scientific Manuscripts Curator Bink Hallum (British Library Qatar Foundation Partnership) to digitally reunite a thousand-year-old manuscript ending up in two different libraries. This is a Christian Arabic miscellany with texts on herbal remedies, medicine, astrology and other topics. Some of this manuscript’s folios are located at the British Library (on Qatar Digital Library: Or. 8857), others at the Biblioteca Ambrosiana in Milan (X 201 sup) - and some have been lost from the original manuscript.

Luckily, not only that both libraries have digitised their respective folios of this manuscript – they also made the digitised images available in IIIF. Being well aware of the benefits of IIIF, Bink has demonstrated in his blog post how to use the Mirador viewer to view both manuscript shelfmarks side by side. This virtual reunification of manuscript folios held in different collections has the potential to encourage and facilitate further research into Abbasid scientific traditions amongst Christian monastic communities.

Screenshot from Mirador, showing the two manuscript shelfmarks side by side (taken from Bink’s blog post)
Screenshot from Mirador, showing the two manuscript shelfmarks side by side (taken from Bink’s blog post)

 

Clearly, I found this to be a very useful example of using IIIF to bring together different instances of the same digital object into one environment. However, I wanted to take this work even further, and create a IIIF manifest which would include all existing folios – and in the right order. Attending Glen Robson’s 5-day IIIF Online Workshop a couple of weeks ago, I could finally give this a go.

 

The most straightforward way for me to do that was to grab the QDL manifest and Ambrosiana’s Biblioteca Digitale manifest, and open them in Atom text editor. Editing the JSON files directly, I copied the canvases of one manifest and pasted them into the other, and then removed canvases with images of the book binding and empty folios. The result looked promising – I could now view the united manuscript on the Universal Viewer.

This looks perfect when viewing a single page at a time (‘Single page view’). However, the unified manuscript looks quite strange when selecting the Universal Viewer’s ‘Two page view’, looking at the transition between the last folio of QDL, and the first folio of Ambrosiana.

The united manifest, as viewed on the Universal Viewer, using ‘Two page view’
The united manifest, as viewed on the Universal Viewer, using ‘Two page view’

 

The reason for this visual mismatch is that the images served from Qatar Digital Library are much larger than Ambrosiana’s, and the viewer does not scale them in relation to one another. Trying to tinker with canvas sizes, absolute or relative, within the manifest itself did not help. Apparently, this has been an issue with other use cases – see for example this discussion on GitHub, or Glen’s investigations of this issue when viewing items of different sizes on Mirador and the UV. Hopefully this scaling issue could be addressed, so we can all enjoy the consolidation of IIIF collection items in their full glory.

It should also be pointed out that although we have digitally reunited the two identified manuscript fragments, some folios are still missing: 36 folios from the beginning of the BL manuscript, 2 folios between folios 13 and 14 of the BL manuscript, 32 folios between the end of the BL manuscript and the beginning of the Ambrosiana manuscript, and a further 14 folios from within the Ambrosiana manuscript. We hope that these missing folios will be discovered one day, and the original manuscript shall be complete once again!

 

07 February 2022

New PhD Placements on Enhanced Curation: Hybrid Archives and Emerging Formats

The British Library is accepting applications for the new round of 2022 PhD Placement opportunities: there are 15 projects available across Library departments, all starting from June 2022 onwards and ending before March 2023. Two of the projects within the Contemporary British Collections department focus on Enhanced Curation as an approach to add to the research value of an archival object or digital publication.

Developing an enhanced curation framework for contemporary hybrid archives (2022-CB-HAC)” will outline a framework for Enhanced Curation in relation to contemporary hybrid archives. These archival collections are the record of the creative and professional lives of prominent individuals in UK society, containing both paper and digital material.  So far we have defined Enhanced Curation as the means by which the research value of these records can be enhanced through the creation, collection, and interrogation of the contextual information which surrounds them.

Luckily, we’re in a privileged position – most of our archive donors are living individuals who can illuminate their creative practice for us in real-time. Similarly, with forensic techniques, we’re capturing more data than ever before when we acquire an archive. The truly live questions are then – how can we use this position to best effect? What can we do with what we’re already collecting? What else should we be collecting? And how can we represent this data in engaging and enlightening new ways for the benefit of everyone, including our researchers and exhibition audiences?

Enhanced Curation, as we see it, is about bringing these dynamic collections to life for as many people as possible.  In approaching these questions, the chosen student will engage in a mixture of theoretical and practical work – first outlining the relevant debates and techniques in and around curation, archival science, museology and digital humanities, and then recommending a course of action for one particular hybrid personal archive. This is a collaborative exercise, though, and they will be provided with hands-on training for working with (and getting the most out of) this growing collection area by specialist curatorial staff at the Library.

Photograph of a floppy disk and its case
Floppy disk from the Will Self archive.

Collecting complex digital publications: Testing an enhanced curation method (2022-CB-EF)” focuses on the Library collection of emerging formats. Emerging formats are defined as born-digital publications whose structure, technical dependencies and highly interactive nature challenge our traditional collection methods. These publications include apps, such as the interactive adventure 80 Days, as well as digital interactive narratives, such as the examples collected in the UK Web Archive Interactive Narratives and New Media Writing Prize collections. Collection and preservation of these digital formats in their entirety might not always be possible: there are many challenges and implications in terms of technical capabilities, software and hardware dependencies, copyright restrictions and long-term solutions that are effective against technical obsolescence.

The collection and creation of contextual information is one approach to fill in the gaps and enhance curation for these digital publications. The placement student will helps us test a collection matrix for contextual information relating to emerging formats, which include – but is not limited to – webpages, interviews, reviews, blog posts and screenshots/screencast of usage of a work. These might be collected using a variety of methods (e.g. web archiving, direct transfer from the author, etc.) as well as created by the student themselves (e.g. interviews with the author, video recordings of usage, etc.) Through this placement, the student will have the opportunity to participate in a network of cultural heritage institutions concerned with the preservation of digital publications while helping develop one of the Library contemporary collections.

Photograph of a man looking at an iPad screen and reading an app
Interacting with the American Interior app on iPad.

Both PhD Placements are offered for 3 months full time, or part-time equivalent. They can be undertaken as hybrid placements (i.e. remotely, with some visits to the British Library building in London, St. Pancras), with the option of a fully remote placement for “Collecting complex digital publications: Testing an enhanced curation method”.

Applications for all 2022/23 PhD Placements close on Friday 25 February 2022, 5pm GMT. The application form and guidelines are available online here. Please address any queries to [email protected]

This post is by Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari and Callum McKean, Digital Lead Curator, Contemporary Archives and Manuscripts.

26 January 2022

Which Came First: The Author or the Text? Wikidata and the New Media Writing Prize

Congratulations to the 2021 New Media Writing Prize (NMWP) winners, which were announced at a Bournemouth University online event recently: Joannes Truyens and collaborators (Main Prize), Melody MOU (Student Award) and Daria Donina (FIPP Journalism Award 2021). The main prize winner ‘Neurocracy’ is an experimental dystopian narrative that takes place over 10 episodes, through Omnipedia, an imagined future version of Wikipedia in 2049. So this seemed like a very apt jumping off point for today’s blog post, which discusses a recent project where we added NMWP data to Wikidata.

Screen image of Omnipediaan imagined futuristic version of Wikipedia from Neurocracy by Joannes Truyens
Omnipedia, an imagined futuristic version of Wikipedia from Neurocracy by Joannes Truyens

Note: If you wish to read ‘Neurocracy’ and are prompted for a username and password, use NewMediaWritingPrize1 password N3wMediaWritingPrize!. You can learn more about the work in this article and listen to an interview with the author in this podcast episode.

Working With Wikidata

Dr Martin Poulter describes learning how to work with Wikidata as being like learning a language. When I first heard this description, I didn’t understand: how could something so reliant on raw data be anything like the intricacies of language learning?

It turns out, Martin was completely correct.

Imagine a stack of data as slips of paper. Each slip has an individual piece of data on it: an author’s name, a publication date, a format, a title. How do you start to string this data together so that it makes sense?

One of the beautiful things about Wikidata is that it is both machine and human readable. In order for it to work this way, and for us to upload it effectively, thinking about the relationships between these slips of paper is essential.

In 2021, I had an opportunity to see what Martin was talking about when he spoke about language, as I was asked to work with a set of data about NMWP shortlisted and winning works, which the British Library has collected in the UK Web Archive. You can read more about this special collection here and here

Image of blank post-it notes and a hand with a marker pen preparing to write on one.

About the New Media Writing Prize

The New Media Writing Prize was founded in 2010 to showcase exciting and inventive stories and poetry that integrate a variety of digital formats, platforms, and media. One of the driving forces in setting up and establishing the prize was Chris Meade, director of if:book uk, a ‘think and do tank’ for exploring digital and collaborative possibilities for writers and readers. He was the lead sponsor of the if:book UK New Media Writing Prize, and the Dot Award, which he created in honour of his mother, Dorothy, and he chaired every NMWP awards evening since 2010. Very sadly Chris passed away on 13th January 2022 and the recent 2021 awards event was dedicated to Chris and his family.

Recognising the significance of the NMWP, in recent years the British Library created the New Media Writing Prize Special Collection as part of its emerging formats work. With 11 years of metadata about a born digital collection, this was an ideal data set for me to work with in order to establish a methodology for working with Wikidata uploads in the Library.

Last year I was fortunate to collaborate with Tegan Pyke, a PhD placement student in the Contemporary British Publications Collections team, supervised by Guilia Carla Rossi, Curator for Digital Publications. Tegan's project examined the digital preservation challenges of complex digital objects, developing and testing a quality assurance process for examining works in the NMWP collection. If you want to read more about this project, a report is available here.  For the Wikidata work Tegan and Giulia provided two spreadsheets of data (or slips of paper!), and my aim was to upload linked data that covered the authors, their works, and the award itself - who had been shortlisted, who had won, and when.

Simple, right?

Getting Started

I thought so - until I began to structure my uploads. There were some key questions that needed to be answered about how these relationships would be built, and I needed to start somewhere. Should I upload the authors or the texts first? Should I go through the prize year by year, or be led by other information? And what about texts with multiple authors?

Suddenly it all felt a bit more intimidating!

I was fortunate to attend some Wikidata training run by Wikimedia UK late last year. Martin was our trainer, and one piece of advice he gave us was indispensable: if you’re not sure where to start, literally write it out with pencil and paper. What is the relationship you’re trying to show, in its simplest form? This is where language framing comes in especially useful: thinking about the basic sentence structures I’d learned in high school German became vital.

Image shows four simple sentences: Christine Wilks won NMWP in 2010. Christine Wilks wrote Underbelly. Underbelly won NMWP in 2010. NMWP was won by Christine Wilks in 2010. Christine is circled in green, NMWP in people, and Underbelly in yellow.  QIDs are listed: Q108810306, highlighted in green Q108459688, highlighted in purple Q109237591, highlighted in yellow  Properties are listed: P166, highlighted in blue P800, highlighted in turquoise P585, highlighted in orange
Image by the author, notes own.

The Numbers Bit

You can see from this image how the framework develops: specific items, like nouns, are given identification numbers when they become a Wikidata item. This is their QID. The relationships between QIDs, sort of like the adjectives and verbs, are defined as properties and have P numbers. So Christine Wilks is now Q108810306, and her relationship to her work, Underbelly, or Q109237591, is defined with P800 which means ‘notable work’.

Q108810306 - P800 - Q109237591

You can upload this relationship using the visual editor on Wikidata, by clicking fields and entering data. If you have a large amount of information (remember those slips of paper!) tools like QuickStatements become very useful. Dominic Kane blogged about his experience of this system during his British Library student placement project in 2021.

The intricacies of language are also very important on Wikidata. The nuance and inference we can draw from specific terms is important. The concept of ‘winning’ an award became a subject of semantic debate: the taxonomy of Wikidata advises that we use ‘award received’ in the case of a literary prize, as it’s less of an active sporting competition than something like a marathon or an athletic event.

Asking Questions of the Data

Ultimately we upload information to Wikidata so that it can be queried. Querying uses SPAQRL, a language which allows users to draw information and patterns from vast swathes of data. Querying can be complex: to go back to the language analogy, you have to phrase the query in precisely the right way to get the information you want.

One of the lessons I learned during the NMWP uploads was the importance of a unifying property. Users will likely query this data with a view to surveying results and finding patterns. Each author and work, therefore, needed to be linked to the prize and the collection itself (pictured above). By adding this QID to the property P6379 (‘has works in the collection’), we create a web of data that links every shortlisted author over the 11 year time period.

Getting Started

To have a look at some of the NMWP data, here are some queries I prepared earlier. Please note that data from the 2021 competition has not yet been uploaded!

Authors who won NMWP

Works that won NMWP

Authors nominated for NMWP

Works nominated for NMWP

If you fancy trying some queries but don’t know where to start, I recommend these tutorials:

Tutorials

Resources About SPARQL

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian

23 December 2021

Three crowdsourcing opportunities with the British Library

Digital Curator Dr Mia Ridge writes, In case you need a break from whatever combination of weather, people and news is around you, here are some ways you can entertain yourself (or the kids!) while helping make collections of the British Library more findable, or help researchers understand our past. You might even learn something or make new discoveries along the way!

Your help needed: Living with Machines

Mia Ridge writes: Living with Machines is a collaboration between the British Library and the Alan Turing Institute with partner universities. Help us understand the 'machine age' through the eyes of ordinary people who lived through it. Our refreshed task builds on our previous work, and includes fresh newspaper titles, such as the Cotton Factory Times.

What did the Victorians think a 'machine' was - and did it matter where you lived, or if you were a worker or a factory owner? Help us find out: https://www.zooniverse.org/projects/bldigital/living-with-machines

Your contributions will not only help researchers - they'll also go on display in our exhibition

Image of a Cotton Factory Times masthead
You can read articles from Manchester's Cotton Factory Times in our crowdsourced task

 

Your help needed: Agents of Enslavement? Colonial newspapers in the Caribbean and hidden genealogies of the enslaved

Launched in July this year, Agents of Enslavement? is a research project which explores the ways in which colonial newspapers in the Caribbean facilitated and challenged the practice of slavery. One goal is to create a database of enslaved people identified within these newspapers. This benefits people researching their family history as well as those who simply want to understand more about the lives of enslaved people and their acts of resistance.

Project Investigator Graham Jevon has posted some insights into how he processes the results to the project forum, which is full of fascinating discussion. Join in as you take part: ​​https://www.zooniverse.org/projects/gjevon/agents-of-enslavement

Your help needed: Georeferencer

Dr. Gethin Rees writes: The community have now georeferenced 93% of 1277 maps that were added from our War Office Archive back in July (as mentioned in our previous newsletter).  

Some of the remaining maps are quite tricky to georeference and so if there is a perplexing map that you would like some guidance with do get in contact with myself and our curator for modern mapping  by emailing [email protected] and we will try to help. Please do look forward to some exciting news maps being released on the platform in 2022!

21 December 2021

Intro to AI for GLAM

Earlier this year Daniel van Strien and I teamed up with colleagues Mike Trizna from the Smithsonian and Mark Bell at the National Archives, UK in a Carpentries Lesson Development Study Group with an eye to developing an Introduction to AI for GLAM (Galleries, Libraries, Archives and Museums) lesson for eventual inclusion in Library Carpentry. The commitment was a ten-week program running between 8 February and 23 April 2021 with weekly 1hr Study Group discussion calls and "homework" tasks requiring at least 3-4 hours each week.

The result is the framework and foundations for what we hope will be a useful, ever evolving and continuously collaboratively written workshop that can provide a gentle and practical introduction for GLAM to the world of machine learning and its implications for the sector. Developed with the GLAM practitioner in mind, this beta course aims to offer an entry point for staff in cultural heritage institutions to begin to support, participate in, and undertake in their own right, machine learning-based research and projects with their collections.

Screenshot of Intro to GLAM course page

View the beta lessons at https://carpentries-incubator.github.io/machine-learning-librarians-archivists/index.html

We had the honour of running a 3-hour bitesize online version of the workshop as part of the AI4LAM Les Futurs Fantastiques Conference (#FF21) early in December. In a bit of an experiment, we delivered it using Mentimeter, hoping to bring some fresh interactivity into what could feel like a long virtual workshop. I'm happy to report it was good fun and the mode very well received in the feedback from instructors and participants alike. 

The full video presentation recording is available to view at FF21 workshop: Carpentries Incubator Introduction to AI for GLAM - Zoom as well as our slides (PDF).

00:08:07 Intro to AI & Machine Learning: A brief overview (Mark Bell, The National Archives)

00:46:09 What is ML good at? (Mike Trizna / @miketrizna, Smithsonian)

01:26:35 Managing bias (Nora McGregor / @ndalyrose, British Library)

02:01:02 Machine learning projects (Daniel van Strien / @vanstriendaniel, British Library)

Have a look at these wonderful live sketch notes taken during the session by the talented Mélanie Leroy-Terquem (@mleroyterquem)!

Notebook page spread showing illustrations of key points in workshop

If you would like to contribute to the further development of these lessons, all the content and materials can be found over on the lesson GitHub  and we'd love to hear from you! 

This blog post is by Nora McGregor, Digital Curator, British Library. She's on Twitter as @ndalyrose.

10 December 2021

Legacies of Catalogue Descriptions: prioritising agendas and actions

The Legacies of Catalogue Descriptions and curatorial Voice project: Opportunities for Digital Scholarship is enabling transformational impacts in digital scholarship within cultural institutions by opening up new and important directions for computational, critical and curatorial analysis of collection catalogues. Over the past year and a half the project has actively engaged with colleagues across the cultural heritage sector to discuss the project approach and develop training materials for the computational analysis of legacy catalogue data.

As the project draws to a close, we invite members of the community to join the final project workshops in February 2022 to set shared agendas and agree next steps. The UK-based event will be hosted by the Digital Humanities Hub, University of Southampton (Covid-19 situation permitting) and the US-based event will be held online. Both workshops will work towards a single co-produced output: an infographic explaining the problem area, our shared priorities and next steps for action.

In anticipation of these events we thought we would share a summary of our July workshop which was attended by over 40 participants from our target beneficiary communities in the UK and US. At the event members of the project team spoke briefly on aspects of their research, before leading participatory breakout sessions that explored the themes in greater detail.

James Baker (Southampton) argued that historical research into legacy cataloguing can usefully form the basis for reparative re-description and social justice work in cultural institutions. Rossitza Atanassova (British Library) reported on the utility of the project methodology and tools for accelerating institutional responses to contemporary challenges and how the capacity building work aligns with the Library’s Anti-Racism Project action plan.

Cynthia Roman (Lewis Walpole Library) discussed her investigations into the history of cataloguing at the Library in relation to the transmission of curatorial voice from the British Museum to the Lewis Walpole Library records for Georgian satirical prints. Andrew Salway (Sussex) described what computational methods and process were used to detect the spatial and temporal transmission of the satirical prints data between catalogues.

Peter Leonard (Yale University Library DH Laboratory) introduced experimental computational work that uses machine learning techniques to produce new texts and images based on historic catalogue data and prints, thus opening up further possibilities for studying features in the real data. In the breakout sessions there was a demonstration of some of the tools developed by the project and an exploration of how to present legacy descriptions in collection catalogues and flag up any issues with users. These tools and other resources are included in the workshop report aimed to encourage and enable further critical reflections on catalogues’ legacies.

We hope that some of you will be interested in joining the final project events. To book your place please use the contact details on the events page.

Rossitza Atanassova, James Baker, Cynthia Roman

07 December 2021

Digital transformations and the pandemic: the Digital Scholarship view

Many things have happened in the past couple of years. Following the closure of physical spaces with the first lockdown, and with reduced access to library systems, British Library staff had to swiftly transition to new ways of working. In this blog post I will looked at the ways in which this transformation has been experienced from the perspective of the Library’s Digital Research team. How did we change the way we work, what changes have we encouraged and witnessed, and what practices should we keep for the future?

Let’s look at our Digital Scholarship Training Programme (DSTP), one of our flagship activities. Created in 2012, this programme aims to develop the skills and knowledge of Library staff to support emerging areas of scholarship. With the swift change to working from home, we transitioned the delivery of our training events from onsite to a fully online delivery. We also started recording training sessions, so that members of staff could watch later if they could not attend. This transition took place very early in the pandemic, with the first online training event on offer on 17 March 2020 (“Library Carpentry Workshop: Tidy Data”).

Screenshot from our “Introduction to Emerging Formats” course
Screenshot from our “Introduction to Emerging Formats” course

 

We’ve seen a steady growth in attendance for our training programme. It became easier to provide more training opportunities online, which saw the number of attendees almost doubling between 2018-2019 and 2020-2021, from 819 to 1,552. At the same time, this has become less costly for us, saving on expenses of travel and subsistence. Our training programme has been well subscribed and more people could attend our events due to several reasons: we transitioned quickly to online delivery, when some members of staff could not do their regular tasks and needed some positive distractions; the Library has been encouraging learning and personal development; and, we don’t have to cap number of attendees as before – there are no room capacity limitations.

The shift to online delivery has many benefits: We can deliver more events more easily; we can now invite speakers from abroad – so there’s better international exposure; thanks to the recordings, our training offer is available for staff to use after the training takes place; more staff members learn and develop themselves in different areas of digital scholarship; and, as a result, members of staff make our digital collections more accessible online – which is great for our users.

Screenshot from a 21st Century Curatorship staff talk entitled “Identify yourself! (Almost) Everything you wanted to know about persistent identifiers but were afraid to ask”
Screenshot from a 21st Century Curatorship staff talk entitled “Identify yourself! (Almost) Everything you wanted to know about persistent identifiers but were afraid to ask”

 

Admittedly, online training does have its shortcomings. For example, the instructor may not be able to immediately identify and resolve problems that course attendees run into. In addition, many of us prefer face-to-face interaction. However, looking into the feedback we received from attendees, it is overwhelmingly positive – staff would like to have the option of onsite or online delivery, and enjoys the availability of recordings. It is therefore unlikely that we’ll go back to fully onsite delivery. A mixed delivery of our training programme looks like a good way forward. For example, if the training event involves more listening than doing, then online delivery is probably better.

Moving on to digital collections. At the beginning of the pandemic, Library users and staff members lost access to the physical collection. With that lack of access, attention was naturally turned to our digital collections. These became the only source of content for the Library to create engaging online content, to promote materials to researchers, and to provide access to its collections. We have seen colleagues using and communicating digital collections during this time, using digital tools and platforms, resulting in an increase of guest blog posts on the Digital Scholarship blog (about two-thirds of our posts in 2020-2021 were guest posts).

Collages created by Hannah Nagle using the British Library's Flickr image collection
Collages created by Hannah Nagle using the British Library's Flickr image collection

 

There are so many examples to choose from. See for instance the story map created by Jenny Norton-Wright, curator for Arabic scientific manuscripts from the British Library Qatar Foundation Partnership project, visualising the journey during which a musical compendium was written in the 17th century. In fact, this was just one of many digital initiatives coming from the Qatar project – many of which are a result of their Imaging Hack Days – days set aside for the team to use their creative and technical skills to ‘hack’ the material in the digitised collection. From Hannah Nagle’s brilliant guide on how to make collages using images from the British Library Flickr collection, through the Watermarks project unveiling hidden watermarks in manuscripts, to making data into sound and investigating Arabic verb forms, there is something for everyone. You can read more about these and other projects in Laura Parsons’ blog posts here and here.

This past year has also seen more of us harnessing the power of IIIF to tell stories to online audiences. Earlier this year one of our Hack & Yack workshops was based around the topic of ‘Making interactive online exhibits and teaching resources with IIIF Manifests’ by exploring a tool called Exhibit. This tool was created by Mnemoscene for the University of St Andrews, and it allows people to create online exhibits. Several curators and other staff members used this IIIF-powered tool to showcase collection items. These included, for example, an exhibit created by Sara Hale from the Heritage Made Digital programme dedicated to Japanese Design Books; or another exhibit prepared by Jana Igunma, curator for Thai, Lao and Cambodian collections, focusing on an illustrated Thai cat treatise.

Screenshot from Sarah Hale’s Japanese Design Books exhibit
Screenshot from Sara Hale’s Japanese Design Books exhibit

 

We have also seen a noticeable increased engagement during the pandemic with our crowdsourcing projects. These became more popular than ever, especially early on when some people could not perform their usual duties or go out, and needed something positive to do. Colleagues have witnessed a very high demand for crowdsourcing tasks, and have received many positive comments and feedback, about how participating and contributing to projects have helped raise the morale during these difficult times. These projects include, for example, In the Spotlight, Living with Machines tasks, Agents of Enslavement, Canadian Wildlife, and the Georeferencer. We now have a landing page for British Library crowdsourcing projects, check it out.

Other crowdsourcing work was done internally by the Collection Metadata team – the ‘crowd’ being British Library staff! Members of staff helped enhancing the metadata of legacy records. For example, colleagues with specific language skills were able to assist with checking machine-assigned language codes, identifying languages and adding keywords to records. Library staff were also adding information such as place and date of publication, genres, and editions, to books digitised as part a partnership with Microsoft.

Screenshot from the Agents of Enslavement project on Zooniverse
Screenshot from the Agents of Enslavement project on Zooniverse

 

Working with the Wikimedia family platforms, such as Wikidata, Wikibase and Wikisource, has also been on the rise come pandemic. Earlier this year, the team was joined by Lucy Hinnie, our Wikimedian-in-Residence. Lucy noted repeated references to the way the pandemic has shifted people's attention towards Wikimedia – more prioritisation of Wikimedia-related projects. One such British Library use case was inspired by a Wikisource project taking place at the National Library of Scotland, correcting OCRed text of 3,000 Scottish chapbooks. A staff talk delivered by Gavin Willshaw, then at the NLS, inspired digital curator Tom Derrick’s Bengali Wikisource project, which included two proofreading competitions for digitised and OCRed Bengali books, as part of the Two Centuries of Indian Print project.

Research Libraries UK (RLUK) called this the “Digital Shift” – “an umbrella term for the analogue-digital transition of many library services, operations, collections, and audience interactions.” The “Digital Shift” includes a lot of different things, but from our perspective, it is plain to see that the Covid-19 pandemic has accelerated this digital transformation – and, as long as this is of benefit to our users, we will keep transforming, adjusting, and exploring new directions.

 

This blog post is by Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.