Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

29 May 2020

IIIF Week 2020

As a founding member of the International Image Interoperability Framework Consortium (IIIF), here at the British Library we are looking forward to the upcoming IIIF Week, which has organised a programme of free online events taking place during 1-5 June.

IIIF Week sessions will discuss digital strategy for cultural heritage, introduce IIIF’s capabilities and community through introductory presentations and demonstrations of use cases. Plus explore the future of IIIF and digital research needs more broadly. 

IIIF logo with text saying International Image Interoperability Framework

Converting the IIIF annual conference into a virtual event held using Zoom, provides an opportunity to bring together a wider group of the IIIF community. Enabling many to attend, including myself, who otherwise would not have been able join the in-person event in Boston, due to budget, travel restrictions, and other obligations.

Both IIIF newbies and experienced implementers will find events scheduled at convenient times, to allow attendees to form regional community connections in their parts of the world. Attendees can sign up for all events during the week, or just the ones that interest them. Proceedings will be in English unless otherwise indicated, and all sessions will be recorded, then made available following the conference on the IIIF YouTube channel.

To those who know me, it will come as no surprise that I’m especially looking forward to the Fun with IIIF session on Friday 5 June, 4-5pm BST, facilitated by Tristan Roddis from Cogapp. Most of the uses of the International Image Interoperability Framework (IIIF) have focused on scholarly and research applications. This session, however, will look at the opposite extreme: the state of the art for creating playful and fun applications of the IIIF APIs. From tile puzzles, to arcade games, via terapixel fractals, virtual galleries, 3D environments, and the Getty's really cool Nintendo Animal Crossing integration.

In addition to the IIIF Week programme, aimed for anyone wanting a more in-depth and practical hands-on teaching, there is a free workshop on getting started with IIIF, the week following the online conference. This pilot course will run over 5 days between 8-12 June, participation is limited to 25 places, available on a first come, first served basis. It will cover:

  • Getting started with the Image API
  • Creating IIIF Manifests with the Bodleian manifest editor
  • Annotating IIIF resources and setting up an annotation server
  • Introduction to various IIIF tools and techniques for scholarship

Tutors will assist participants to create a IIIF project and demonstrate it on a zoom call at the end of the week.

You can view and sign up for IIIF Week events at https://iiif.io/event/2020/iiifweek/. All attendees are expected to adhere to the IIIF Code of Conduct and encouraged to join the IIIF-Week Slack channel for ongoing questions, comments, and discussion (you’ll need to join the IIIF Slack first, which is open to anyone).

For following and participating in more open discussion on twitter, use the hashtags #IIIF and #IIIFWeek, and if you have any specific questions about the event, please get in touch with the IIIF staff at [email protected].

See you there :-)

This post is by Digital Curator Stella Wisdom (@miss_wisdom

21 May 2020

The British Library Simulator

The British Library Simulator is a mini game built using the Bitsy game engine, where you can wander around a pixelated (and much smaller) version of the British Library building in St Pancras. Bitsy is known for its compact format and limited colour-palette - you can often recognise your avatar and the items you can interact with by the fact they use a different colour from the background.

The British Library building depicted in Bitsy
The British Library Simulator Bitsy game

Use the arrow keys on your keyboard (or the WASD buttons) to move around the rooms and interact with other characters and objects you meet on the way - you might discover something new about the building and the digital projects the Library is working on!

Bitsy works best in the Chrome browser and if you’re playing on your smartphone, use a sliding movement to move your avatar and tap on the text box to progress with the dialogues.

Most importantly: have fun!

The British Library, together with the other five UK Legal Deposit Libraries, has been collecting examples of complex digital publications, including works made with Bitsy, as part of the Emerging Formats Project. This collection area is continuously expanding, as we include new examples of digital media and interactive storytelling. The formats and tools used to create these publications are varied, and allow for innovative and often immersive solutions that could only be delivered via a digital medium. You can read more about freely-available tools to write interactive fiction here.

This post is by Giulia Carla Rossi, Curator of Digital Publications (@giugimonogatari).

20 May 2020

Bringing Metadata & Full-text Together

This is a guest post by enthusiastic data and metadata nerd Andy Jackson (@anjacks0n), Technical Lead for the UK Web Archive.

In Searching eTheses for the openVirus project we put together a basic system for searching theses. This only used the information from the PDFs themselves, which meant the results looked like this:

openVirus EThOS search results screen
openVirus EThOS search results screen

The basics are working fine, but the document titles are largely meaningless, the last-modified dates are clearly suspect (26 theses in the year 1600?!), and the facets aren’t terribly useful.

The EThOS metadata has much richer information that the EThOS team has collected and verified over the years. This includes:

  • Title
  • Author
  • DOI, ISNI, ORCID
  • Institution
  • Date
  • Supervisor(s)
  • Funder(s)
  • Dewey Decimal Classification
  • EThOS Service URL
  • Repository (‘Landing Page’) URL

So, the question is, how do we integrate these two sets of data into a single system?

Linking on URLs

The EThOS team supplied the PDF download URLs for each record, but we need a common identifer to merge these two datasets. Fortunately, both datasets contain the EThOS Service URL, which looks like this:

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.755301

This (or just the uk.bl.ethos.755301 part) can be used as the ‘key’ for the merge, leaving us with one data set that contains the download URLs alongside all the other fields. We can then process the text from each PDF, and look up the URL in this metadata dataset, and merge the two together in the same way.

Except… it doesn’t work.

The web is a messy place: those PDF URLs may have been direct downloads in the past, but now many of them are no longer simple links, but chains of redirects. As an example, this original download URL:

http://repository.royalholloway.ac.uk/items/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

Now redirects (HTTP 301 Moved Permanently) to the HTTPS version:

https://repository.royalholloway.ac.uk/items/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

Which then redirects (HTTP 302 Found) to the actual PDF file:

https://repository.royalholloway.ac.uk/file/bf7a78df-c538-4bff-a28d-983a91cf0634/1/10090181.pdf

So, to bring this all together, we have to trace these links between the EThOS records and the actual PDF documents.

Re-tracing Our Steps

While the crawler we built to download these PDFs worked well enough, it isn’t quite a sophisticated as our main crawler, which is based on Heritrix 3. In particular, Heritrix offers details crawl logs that can be used to trace crawler activity. This functionality would be fairly easy to add to Scrapy, but that’s not been done yet. So, another approach is needed.

To trace the crawl, we need to be able to look up URLs and then analyse what happened. In particular, for every starting URL (a.k.a. seed) we want to check if it was a redirect and if so, follow that URL to see where it leads.

We already use content (CDX) indexes to allow us to look up URLs when accessing content. In particular, we use OutbackCDX as the index, and then the pywb playback system to retrieve and access the records and see what happened. So one option is to spin up a separate playback system and query that to work out where the links go.

However, as we only want to trace redirects, we can do something a little simpler. We can use the OutbackCDX service to look up what we got for each URL, and use the same warcio library that pywb uses to read the WARC record and find any redirects. The same process can then be repeated with the resulting URL, until all the chains of redirects have been followed.

This leaves us with a large list, linking every URL we crawled back to the original PDF URL. This can then be used to link each item to the corresponding EThOS record.

This large look-up table allowed the full-text and metadata to be combined. It was then imported into a new Solr index that replaced the original service, augmenting the records with the new metadata.

Updating the Interface

The new fields are accessible via the same API as before – see this simple search as an example.

The next step was to update the UI to take advantage of these fields. This was relatively simple, as it mostly involved exchanging one field name for another (e.g. from last_modified_year to year_i), and adding a few links to take advantage of the fact we now have access to the URLs to the EThOS records and the landing pages.

The result can be seen at:

EThOS Faceted Search Prototype

The Results

This new service provides a much better interface to the collection, and really demonstrates the benefits of combining machine-generated and manually curated metadata.

New openVirus EThOS search results interface
New improved openVirus EThOS search results interface

There are still some issues with the source data that need to be resolved at some point. In particular, there are now only 88,082 records, which indicates that some gaps and mismatches emerged during the process of merging these records together.

But it’s good enough for now.

The next question is: how do we integrate this into the openVirus workflow? 

 

18 May 2020

Tree Collage Challenge

Today is the start of Mental Health Awareness Week (18-24 May 2020) and this year’s theme is kindness. In my opinion this starts with being kinder to yourself and there are many ways to do this. As my colleague Hannah Nagle recently reminded me in her recent blog post, creative activities can help you to relax, lift your mood and enable you to express yourself. Also, I personally find that spending time in green spaces and appreciating nature is of great benefit to my mental wellbeing.  UK mental health charity Mind promote ecotherapy and have a helpful section on their website all about nature and mental health.

However, I appreciate that it is not always possible for people to get outside to enjoy nature, especially in the current corona pandemic situation. However, there are ways to bring nature into our homes, such as listening to recordings of bird songs, looking at pictures, and watching videos of wildlife and landscapes. For more ideas on digital ways of connecting to nature, I suggest checking out “Nature and Wellbeing in the Digital Age” by Sue Thomas, who believes we don’t need to disconnect from the internet to reconnect with the earth, sea and sky.

Furthermore, why not participate in this year’s Urban Tree Festival (16-24 May 2020), which is completely online. There is a wide programme of talks and activities, including meditation, daily birdsong, virtual tours, radio and a book club. The festival also includes some brilliant art activities.

Urban Tree Festival logo with a photograph depicting a tree canopy
Urban Tree Festival 2020

Save Our Street Trees Northampton have invited people to create a virtual urban forest in their windows, by building a tree out of paper, then adding leaves every day to slowly build up a tree canopy. People are then encouraged to share photos of their paper trees on social media tagging them #NewLeaf.

Another Urban Tree Festival art project is Branching out with Ruth Broadbent, where people are invited to co-create imaginary trees by observing and drawing selected branches and foliage from sections of different trees. These might be seen from gardens or windows, from photos or from memory.

Paintings and drawings of trees are also celebrated in the Europeana’s Trees in Art online gallery, which has been launched by the festival today, to showcase artworks, which depict trees in urban and rural landscapes, from the digitised collections of museums, galleries, libraries and archives across Europe, including tree book illustrations from the British Library.

Thumbnail pictures of paintings of trees from a website gallery
Europeana Trees in Art online gallery

Not wanting to be left out of the fun, here at the British Library, we have set a Tree Collage Challenge, which invites you to make artistic collages featuring trees and nature, using our book illustrations from the British Library’s Flickr account.

This collection of over a million Public Domain images can be used by anyone for free, without copyright restrictions. The images are illustrations taken from the pages of 17th, 18th and 19th century books. You can read more about them here.

As a starting point, for finding images for your collages, you may find it useful to browse themed albums.  In particular the Flora & Fauna albums are rich resources for finding trees, plants, animals and birds.

To learn how to make digital collages, my colleague Hannah Nagle has written a handy guide, to help get you started. You can download this here.

We hope you have fun and we can’t wait to see your collage creations! So please post your pictures to Twitter and Instagram using #GreatTree and #UrbanTreeFestival. British Library curators will be following the challenge with interest and showcasing their favourite tree collages in future blog posts, so watch this space!

This post is by Digital Curator Stella Wisdom (@miss_wisdom

14 May 2020

Searching eTheses for the openVirus project

This is a guest post by Andy Jackson (@anjacks0n), Technical Lead for the UK Web Archive and enthusiastic data-miner.

Introduction

The COVID-19 outbreak is an unprecedented global crisis that has prompted an unprecedented global response. I’ve been particularly interested in how academic scholars and publishers have responded:

It’s impressive how much has been done in such a short time! But I also saw one comment that really stuck with me:

“Our digital libraries and archives may hold crucial clues and content about how to help with the #covid19 outbreak: particularly this is the case with scientific literature. Now is the time for institutional bravery around access!”
– @melissaterras

Clearly, academic scholars and publishers are already collaborating. What could digital libraries and archives do to help?

Scale, Audience & Scope

Almost all the efforts I’ve seen so far are focused on helping scientists working on the COVID-19 response to find information from publications that are directly related to coronavirus epidemics. The outbreak is much bigger than this. In terms of scope, it’s not just about understanding the coronavirus itself. The outbreak raises many broader questions, like:

  • What types of personal protective equipment are appropriate for different medical procedures?
  • How effective are the different kinds of masks when it comes to protecting others?
  • What coping strategies have proven useful for people in isolation?

(These are just the examples I’ve personally seen requests for. There will be more.)

Similarly, the audience is much wider than the scientists working directly on the COVID-19 response. From medical professions wanting to know more about protective equipment, to journalists looking for context and counter-arguments.

As a technologist working at the British Library, I felt like there must be some way I could help this situation. Some way to help a wider audience dig out any potentially relevant material we might hold?

The openVirus Project

While looking out for inspiration, I found Peter Murray-Rust’s openVirus project. Peter is a vocal supporter of open source and open data, and had launched an ambitious attempt to aggregate information relating to viruses and epidemics from scholarly publications.

In contrast to the other efforts I’d seen, Peter wanted to focus on novel data-mining methods, and on pulling in less well-known sources of information. This dual focus on text analysis and on opening up underutilised resources appealed to me. And I already had a particular resource in mind…

EThOS

Of course, the British Library has a very wide range of holdings, but as an ex-academic scientist I’ve always had a soft spot for EThOS, which provides electronic access to UK theses.

Through the web interface, users can search the metadata and abstracts of over half a million theses. Furthermore, to support data mining and analysis, the EThOS metadata has been published as a dataset. This dataset includes links to institutional repository pages for many of the theses.

Although doctoral theses are not generally considered to be as important as journal articles, they are a rich and underused source of information, capable of carrying much more context and commentary than a brief article[1].

The Idea

Having identified EThOS as source of information, the idea was to see if I could use our existing UK Web Archive tools to collect and index the full-text of these theses, build a simple faceted search interface, and perform some basic data-mining operations. If that worked, it would allow relevant theses to be discovered and passed to the openVirus tools for more sophisticated analysis.

Preparing the data sources

The links in the EThOS dataset point to the HTML landing-page for each theses, rather than to the full text itself. To get to the text, the best approach would be to write a crawler to find the PDFs. However, it would take a while to create something that could cope with the variety of ways the landing pages tend to be formatted. For machines, it’s not always easy to find the link to the actual theses!

However, many of the universities involved have given the EThOS team permission to download a copy of their theses for safe-keeping. The URLs of the full-text files are only used once (to collect each thesis shortly after publication), but have nevertheless been kept in the EThOS system since then. These URLs are considered transient (i.e. likely to ‘rot’ over time) and come with no guarantees of longer-term availability (unlike the landing pages), so are not included in the main EThOS dataset. Nevertheless, the EThOS team were able to give me the list of PDF URLs, making it easier to get started quickly.

This is far from ideal: we will miss theses that have been moved to new URLs, and from universities that do not take part (which, notably, includes Oxford and Cambridge). This skew would be avoided if we were to use the landing-page URLs provided for all UK digital theses to crawl the PDFs. But we need to move quickly.

So, while keeping these caveats in mind, the first task was to crawl the URLs and see if the PDFs were still there…

Collecting the PDFs

A simple Scrapy crawler was created, one that could read the PDF URLs and download them without overloading the host repositories. The crawler itself does nothing with them, but by running behind warcprox the web requests and responses (including the PDFs) can be captured in the standardised Web ARChive (WARC) format.

For 35 hours, the crawler attempted to download the 130,330 PDF URLs. Quite a lot of URLs had already changed, but 111,793 documents were successfully downloaded. Of these, 104,746 were PDFs.

All the requests and responses generated by the crawler were captured in 1,433 WARCs each around 1GB in size, totalling around 1.5TB of data.

Processing the WARCs

We already have tools for handling WARCs, so the task was to re-use them and see what we get. As this collection is mostly PDFs, Apache Tika and PDFBox are doing most of the work, but the webarchive-discovery wrapper helps run them at scale and add in additional metadata.

The WARCs were transferred to our internal Hadoop cluster, and in just over an hour the text and associated metadata were available as about 5GB of compressed JSON Lines.

A Legal Aside

Before proceeding, there’s legal problem that we need to address. Despite being freely-available over the open web, the rights and licenses under which these documents are being made available can be extremely varied and complex.

There’s no problem gathering the content and using it for data mining. The problem is that there are limitations on what we can redistribute without permission: we can’t redistribute the original PDFs, or any close approximation.

However, collections of facts about the PDFs are fine.

But for the other openVirus tools to do their work, we need to be able to find out what each thesis are about. So how can we make this work?

One answer is to generate statistical summaries of the contents of the documents. For example, we can break the text of each document up into individual words, and count how often each word occurs. These word frequencies are a no substitute for the real text, but are redistributable and suitable for answering simple queries.

These simple queries can be used to narrow down the overall dataset, picking out a relevant subset. Once the list of documents of interest is down to a manageable size, an individual researcher can download the original documents themselves, from the original hosts[2]. As the researcher now has local copies, they can run their own tools over them, including the openVirus tools.

Word Frequencies

second, simpler Hadoop job was created, post-processing the raw text and replacing it with the word frequency data. This produced 6GB of uncompressed JSON Lines data, which could then be loaded into an instance of the Apache Solr search tool [3].

While Solr provides a user interface, it’s not really suitable for general users, nor is it entirely safe to expose to the World Wide Web. To mitigate this, the index was built on a virtual server well away from any production systems, and wrapped with a web server configured in a way that should prevent problems.

The API this provides (see the Solr documentation for details) enables us to find which theses include which terms. Here are some example queries:

This is fine for programmatic access, but with a little extra wrapping we can make it more useful to more people.

APIs & Notebooks

For example, I was able to create live API documentation and a simple user interface using Google’s Colaboratory:

Using the openVirus EThOS API

Google Colaboratory is a proprietary platform, but those notebooks can be exported as more standard Jupyter Notebooks. See here for an example.

Faceted Search

Having carefully exposed the API to the open web, I was also able to take an existing browser-based faceted search interface and modify to suite our use case:

EThOS Faceted Search Prototype

Best of all, this is running on the Glitch collaborative coding platform, so you can go look at the source code and remix it yourself, if you like:

EThOS Faceted Search Prototype – Glitch project

Limitations

The main limitation of using word-frequencies instead of full-text is that phrase search is broken. Searching for face AND mask will work as expected, but searching for “face mask” doesn’t.

Another problem is that the EThOS metadata has not been integrated with the raw text search. This would give us a much richer experience, like accurate publication years and more helpful facets[4].

In terms of user interface, the faceted search UI above is very basic, but for the openVirus project the API is likely to be of more use in the short term.

Next Steps

To make the search more usable, the next logical step is to attempt to integrate the full-text search with the EThOS metadata.

Then, if the results look good, we can start to work out how to feed the results into the workflow of the openVirus tool suite.

 


1. Even things like negative results, which are informative but can be difficult to publish in article form. ↩︎

2. This is similar data sharing pattern used by Twitter researchers. See, for example, the DocNow Catalogue. ↩︎

3. We use Apache Solr a lot so this was the simplest choice for us. ↩︎

4. Note that since writing this post, this limitation has been rectified. ↩︎

 

07 May 2020

How to make art when we’re working apart

Like many at the British Library, the last couple of months has seen my role as a Senior Imaging Support Officer drastically change. While working from home, it has been impossible for me to carry on my normal digitisation work. Despite this, the time spent at home has reminded me why the work I do can be so important. Digitised collections have become essential during this time in allowing institutions to carry on engaging with the public at a time when their doors are closed. The potential and opportunity accessing heritage collections from the comfort of your own home can bring is being highlighted every day.

Research and academic use of online collections is recognised. However, the vast creative potential is still not always identified by users and institutions. This is something I have been interested in for a long time and I am always keen to produce work that changes people’s perceptions of how people can use online collections. The Hack Days I have taken part in over the last couple of years at the British Library have given me a chance to explore this, producing creative responses to the Qatar Digital Library. Whether through zine making, print making or colourisation, I have been able to produce this work thanks to the digitised material made available for free.

The format and layout of websites designed to showcase digitised collections is normally more geared towards researchers and academics. When using online collections creatively, I want to work with a website that is image focused rather than text based. As a result, I have found institution’s uploading to Flickr to be the easiest to use. The Library of Congress, The National Archives and The National Archives of Estonia are just some of the accounts I have used in the past. This meant that when I found myself working from home, one of the first things I wanted to do was work with the British Library’s Flickr account. With over a million images uploaded in 2013, there are some incredible images to use and I have long recognised its potential for use in creative projects. All of these images have been uploaded without copyright restrictions so that they can be used by anyone for free. Yes - including commercially! The images are taken from the pages of 17th, 18th and 19th century books. Digitised by Microsoft, they gifted the scanned images to the British Library allowing them to be released into the Public Domain.You can read more about this collection here.

It’s also important to note this certainly isn’t the first time the British Library Flickr images have been used creatively. Among others, artists like David Normal, Michael Takeo Magruder, Mario Klingemann and Jiayi Chong have all produced very different work using the book illustrations from this collection.

Animated collage created by Jiayi Chong, using Creature, a Cutting-edge 2D Animation Software

Getting Started

I knew I wanted to produce a series of collages but without access to a printer I was unable to print any of the images out. However, downloading the images allowed me to begin making digital collages. I used both Microsoft Word and Adobe Photoshop to create them. The collages I produced were in a similar vein to some of the images and artwork included in zines I have made in the past. These were normally made to illustrate a piece of text or data. However, this time I wanted to do something purely for fun, bringing random images together with lots of colour to create weird and wonderful images.

I began scrolling through the photo stream, collecting images into a folder on my computer. I enjoyed the serendipitous nature the Flickr interface brings and there were so many images I found just by chance rather than by using the search bar. There are also useful albums under different themes, including cats and cycling, which helped during my initial search. You can see some of the collages I created below.

4 collage images, a woman holding a large pair of scissors, a classical sculpture of a man holding a snake, an angel on a map and a profile of a man's face in front of a mountain range
Collages created by Hannah Nagle using the British Library's Flickr image collection

As I was creating them, I realised that not only were they enjoyable to make, the process was giving me something to put my mind to. It took me away from the stress and the uncertainty of everything going on around me and once again proved the benefits creativity has on mental health. Studies have shown that getting creative can have a similar affect to meditating. It helps you to relax, lifts your mood and allows you to express yourself in ways you wouldn’t normally. Additionally, there’s no wrong or right when making art. It can be a bit of freedom without any pressure or stress - something so many of us need right now.

The Guide

Having realised and experienced the benefits of this myself, I began putting together a guide on how to make collages using images from the British Library Flickr page. ‘How to make art when we’re working apart’ contains simple instructions on how to produce a collage, both digital and physical. If you don’t have access to printer, the guide focuses on using Microsoft Word to produce a digital collage. I am also currently producing a second guide for anyone interested in gaining a basic understanding of how to use Adobe Photoshop. Included is a list of artists who use collage in their work, along with a simple ‘formula’ of how to make a collage for those who are feeling a bit stuck. The guide is for anyone interested in creating a bit of art while stuck indoors. Even if you’re not creatively minded, it’s easy to use and ideal for anyone looking to give themselves a small distraction from the situation we all find ourselves in.

Cover image for "How to make art when we’re working apart" guide featuring a drawing of a row of houses
Cover image for "How to make art when we’re working apart" guide

Click here to download the "How to make art when we're working apart" guide.

This is a guest post by Hannah Nagle (@hannagle) who works in the Imaging Team for the British Library Qatar National Library PartnershipYou can follow the British Library Qatar National Library Partnership on Twitter at @BLQatar.

06 May 2020

What did you call me?!

This guest blog post is by Michael St John-McAlister, Western Manuscripts Cataloguing Manager at the British Library.

The coronavirus lockdown is a good opportunity to carry out some of those house-keeping tasks that would never normally get done (and I do not mean re-grouting the bathroom). Anticipating that we would be sent home and knowing I would be limited in the work I could do at home, I asked IT to download all the name authorities in our archives and manuscripts cataloguing system (all 324,106 of them) into a spreadsheet that I would be able to work on at home.

Working through the names, looking for duplicate records, badly-formed names, and typos, my eye was caught by the variety of epithets that have been used over 267 years of manuscripts cataloguing.

For the uninitiated, an epithet is part of a name authority or index term, in the form of a short descriptive label, used to help distinguish people of the same name. Imagine you are writing a biography of a John Smith. You search the Explore Archives and Manuscripts catalogue for any relevant primary sources, only to find three entries for Smith, John, 1800-1870. How would you know which John Smith’s letters and diaries to call up for your research? (Humour me: let us assume our three Smiths all have the same vital dates, unlikely I know, and that the papers are not fully catalogued so the catalogue descriptions of the papers themselves cannot help you decide as they would normally).

Now imagine your catalogue search for John Smith turned up the following entries instead:

Smith, John, 1810-1880, baker

Smith, John, 1810-1880, butcher

Smith, John, 1810-1880, candlestick maker

Instantly, you can see which of the three John Smiths is relevant to your ground-breaking research into the history of candlestick making in the West Riding in the early Victorian era.

The epithet is one element of a well-formed index term and it tends to be a position in life (King of Jordan; Queen of Great Britain and Ireland), a former or alternative name (née Booth; pseudonym ‘Jane Duncan’), a career or occupation (soldier; writer), or a relationship to another person (husband of Rebecca West; son of Henry VII).

Scrolling through the spreadsheet, in amongst the soldiers, writers, composers, politicians, Earls of this, and Princesses of that, I stumbled across a fascinating array of epithets, some obvious, some less so.

There are plenty of examples of the perhaps slightly everyday, but important all the same: bricklayer; plumber; glazier; carpenter. As well as the trades common to us today, some of the trades used as epithets seem very much of their times: button-maker; coach and harness maker; dealer in fancy goods; butterman; copperplate printer; hackney coachman.

Those from the edges of law-abiding society loom large, with people described as burglar and prisoner (presumably the former led to his becoming the latter), convict, assassin, murderer, pickpocket, forger, felon, regicide, and rioter. There are even 50 pirate’s wives in the catalogue (but only seven pirates!). The victims of conflict and persecution also crop up, including prisoner of war, martyr, and galley slave, as well as, occasionally, their tormentors (inquisitor, head jailer, arms dealer).

Some of the epithets have a distinct air of mystery about them (codebreaker; conspirator; spy; alchemist; child prodigy; fugitive; renegade priest; hermit; recluse; mystic; secret agent; intercept operator; dream interpreter) whilst others exude a certain exoticism or loucheness: casino owner; dance band leader; acrobat; mesmerist; jazz poet; pearl fisher; showman; diamond tycoon; charioteer.

Many of the epithets relate to services provided to others. Where would the great and the good be without people to drive them around, manage their affairs, assist in their work, take their letters, make their tea, cook their food, and treat them when they fall ill. So, Marcel Proust’s chauffeur, Charlie Chaplin’s business manager, Gustav Holsts’s many amanuenses, Laurence Olivier’s secretary, Virginia Woolf’s charwoman, as well as her cook, and HG Wells’s physician all make appearances in the catalogue.

Then there are the epithets which are less than useful and do not really enlighten us about their subjects: appraiser (of what?); connoisseur (ditto); purple dyer (why only purple?); political adventurer; official. The less said about the usefulness, or otherwise, of epithets such as Mrs, widow, Mr Secretary, and Libyan the better.  Some fall into the ‘What is it?’ category: coastwaiter (and landwaiter, for that matter); pancratiast; paroemiographer; trouvère.*

Another interesting category contains epithets of people with more than one string to their bow. One’s mind boggles at the career path of the ‘music scribe and spy’, or the ‘inn-keeper, gunner, and writer on mathematics’; is awed by the variety of skills of the ‘composer and physician’; marvels at the multi-talented ‘army officer, footballer, and Conservative politician’; and wonders what occurred in someone’s life to earn them the epithet ‘coach-painter and would-be assassin’.

As we have discovered, an epithet can help identify individuals, thus making the reader’s life easier, but if all else fails, and it is not possible to say who someone is, you can always say who they are not. Hence one of our manuscripts cataloguing forbears leaving us with Barry, Garrett; not Garrett Barry of Lisgriffin, county Cork as an index term.

  • a type of Customs officer; ditto; a participant in a boxing or wrestling contest, esp. in ancient Greece; a writer or collector of proverbs; a medieval epic poet.

This guest blog post is by Michael St John-McAlister, Western Manuscripts Cataloguing Manager at the British Library.

 

04 May 2020

VisibleWikiWomen 2020 Campaign

May the 4th be with you!

When I think of Star Wars, one of the first characters that comes to mind, is brave, quick witted and feisty Princess Leia, General of the Resistance, played by the unforgettable Carrie Fisher. Leia is a role model for nerdy girls throughout the galaxy! Sadly I don’t have any photos of the time I went a friend’s fancy dress party as Leia, wearing a long floaty white high necked gown, and sporting the cinnamon bun hairstyle (this was when I had much longer hair), but I remember having an absolute blast pretending to be one of my heros for an evening :-)

However, we don’t have to look as far as the fictional planet of Alderaan to find female heros and role models. #VisibleWikiWomen is an annual campaign to make all women, especially black, brown, indigenous and trans women, visible on Wikipedia and the broader internet. This global campaign brings together Wikimedians, feminist and women’s organisations, and cultural institutions in a worldwide effort to reduce the gender gap and the lack of images of women in the biggest online free encyclopedia.

#VisibleWikiWomen campaign logo image; silhouette of a woman taking a photograph with a camera
#VisibleWikiWomen campaign logo image

Due to COVID-19, the world is going through a collective experience of deep anxiety and uncertainty. It is a deeply important time for collective solidarity and support. The work of female artists, actresses, writers and musicians is entertaining us and lifting our spirits during the long days of lockdown. However, we often miss “seeing” and appreciating the women who are part of the critical infrastructure of care that keeps us going in times like this: health workers, carers, cashiers, cleaners, cooks, activists, scientists, policy-makers and so many more. 

Next weekend, 9-12 May 2020, is the #VisibleWikiWomen Edit-a-thon: Women in critical infrastructures of care. To acknowledge, affirm, support and raise awareness of these incredible women. During a time where we isolate ourselves physically, #VisibleWikiWomen is an opportunity where we can come together virtually, to introduce and celebrate online, the faces, work, and wisdom of women who have often been missing from the world’s shared knowledge and histories. 

The goal of this online event is to gather and upload, good quality images of women, which are in the public domain, or under free license, to Wikimedia Commons (the image file repository for Wikipedia) under the VisibleWikiWomen category and have fun! These images could be photographs or drawings of women, as well as images of their work, with proper consent. If you are not sure where to start, there will be some online training sessions on how to upload images to Commons and also group conversations, where participants can ask questions and share their experiences participating in the campaign.

The Edit-a-thon is being organised by:

Schedule for the online event is:

  • May 9 (Saturday) - online training at 12pm UTC (English session) and 3pm UTC (Spanish session). Each session will be a 1:30 hour video-call
  • From May 9 to May 12 - uploading images to Wikimedia Commons at each participant's preferred time
  • May 11 (Monday) - Q&A online session for troubleshooting and discussing issues, at 2pm UTC (English session) and 5pm UTC (Spanish session)

Many other organisations have joined as institutional partners, including Wikimedia UK and the International Image Interoperability Framework (IIIF) Consortium, who have asked their member institutions, including the British Library, to identify and encourage reuse of openly licensed digitised images that fit the criteria for this campaign. For more information, check out the “Guide for Cultural and Memory Institutions to make women visible on Wikipedia” created by Whose Knowledge?. If you use any digitised British Library images, please let us know (by emailing digitalresearch(at)bl(dot)uk), as we always love to hear how people have used our collections.

Logo images of the VisibleWikiWomen partner organisations
Logos of some of the VisibleWikiWomen partner organisations

In the British Library we have some experience of running Wikipedia edit-a-thons to help address the gender imbalance; we have held a number of successful Wiki-Food and (mostly) Women edit-a-thons, led by Polly Russell. Also, for International Women’s Day in 2019, the British Library & Qatar National Library Partnership, organised an Imaging Hack Day, which produced interactive photographs, story maps and a zine.

People editing Wikipedia pages
Photograph of the second British Library Wiki-Food and (mostly) Women edit-a-thon on 6th July 2015

Our landmark exhibition, Unfinished Business: The Fight for Women’s Rights, was due to open in the Library last month. Unfortunately due to the COVID-19 lockdown, the on-site exhibition is postponed. However, in the meantime, we are exploring women’s rights via our online channels, alongside writers, artists and activists. Our first offering is a tribute to writer Mary Wollstonecraft, a podcast featuring historian Dan Snow, Lady Hale, campaigner Bee Rowlatt, scholar Professor Emma Clery, actor Saffron Burrows and musician Jade Ellins, paying homage to the foremother of feminism.

Good luck to all those taking part in the #VisibleWikiWomen 2020 campaign, May the FORCE be with you!

This post is by Jedi Librarian, Jocasta Nu, sorry I just wanted to link to Wookieepedia! It is actually written by Digital Curator (which is a just as cool job title as a Jedi Librarian) Stella Wisdom (@miss_wisdom