Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

21 April 2020

Clean. Migrate. Validate. Enhance. Processing Archival Metadata with Open Refine

This blogpost is by Graham Jevon, Cataloguer, Endangered Archives Programme

Creating detailed and consistent metadata is a challenge common to most archives. Many rely on an army of volunteers with varying degrees of cataloguing experience. And no matter how diligent any team of cataloguers are, human error and individual idiosyncrasies are inevitable.

This challenge is particularly pertinent to the Endangered Archives Programme (EAP), which has hitherto funded in excess of 400 projects in more than 90 countries. Each project is unique and employs its own team of one or more cataloguers based in the particular country where the archival content is digitised. But all this disparately created metadata must be uniform when ingested into the British Library’s cataloguing system and uploaded to eap.bl.uk.

Finding an efficient, low-cost method to process large volumes of metadata generated by hundreds of unique teams is a challenge; one that in 2019, EAP sought to alleviate using freely available open source software Open Refine – a power tool for processing data.

This blog highlights some of the ways that we are using Open Refine. It is not an instructional how-to guide (though we are happy to follow-up with more detailed blogs if there is interest), but an introductory overview of some of the Open Refine methods we use to process large volumes of metadata.

Initial metadata capture

Our metadata is initially created by project teams using an Excel spreadsheet template provided by EAP. In the past year we have completely redesigned this template in order to make it as user friendly and controlled as possible.

But while Excel is perfect for metadata creation, it is not best suited for checking and editing large volumes of data. This is where Open Refine excels (pardon the pun!), so when the final completed spreadsheet is delivered to EAP, we use Open Refine to clean, validate, migrate, and enhance this data.

Replicating repetitive tasks

Open Refine came to the forefront of our attention after a one-day introductory training session led by Owen Stephens where the key takeaway for EAP was that a sequence of functions performed in Open Refine can be copied and re-used on subsequent datasets.

This encouraged us to design and create a sequence of processes that can be re-applied every time we receive a new batch of metadata, thus automating large parts of our workflow.

No computer programming skills required

Building this sequence required no computer programming experience (though this can help); just logical thinking, a generous online community willing to share their knowledge and experience, and a willingness to learn Open Refine’s GREL language and generic regular expressions. Some functions can be performed simply by using Open Refine’s built-in menu options. But the limits of Open Refine’s capabilities are almost infinite; the more you explore and experiment, the further you can push the boundaries.

Initially, it was hoped that our whole Open Refine sequence could be repeated in one single large batch of operations. The complexity of the data and the need for archivist intervention meant that it was more appropriate to divide the process into several steps. Our workflow is divided into 7 stages:

Migration
Dates
Languages and Scripts
Related subjects
Related places and other authorities
Uniform Titles
Digital content validation

Each of these stages performs one or more of four tasks: clean, migrate, validate, and enhance.

Task 1: Clean

The first part of our workflow provides basic data cleaning. Across all columns it trims any white space at the beginning or end of a cell, removes any double spaces, and capitalises the first letter of every cell. In just a few seconds, this tidies the entire dataset.

Task 1 Example: Trimming white space (menu option)

Trimming whitespace on an individual column is an easy function to perform as Open Refine has a built in “Common transform” that performs this function.

Although this is a simple function to perform, we no longer need to repeatedly select this menu option for each column of each dataset we process because this task is now part of the workflow that we simply copy and paste.

Task 1 Example: Capitalising the first letter (using GREL)

Capitalising the first letter of each cell is less straightforward for a new user as it does not have a built-in function that can be selected from a menu. Instead it requires a custom “Transform” using Open Refine’s own expression language (GREL).

Having to write an expression like this should not put off any Open Refine novices. This is an example of Open Refine’s flexibility and many expressions can be found and copied from the Open Refine wiki pages or from blogs like this. The more you copy others, the more you learn, and the easier you will find it to adapt expressions to your own unique requirements.

Moreover, we do not have to repeat this expression again. Just like the trim whitespace transformation, this is also now part of our copy and paste workflow. One click performs both these tasks and more.

Task 2: Migrate

As previously mentioned, the listing template used by the project teams is not the same as the spreadsheet template required for ingest into the British Library’s cataloguing system. But Open Refine helps us convert the listing template to the ingest template. In just one click, it renames, reorders, and restructures the data from the human friendly listing template to the computer friendly ingest template.

Task 2 example: Variant Titles

The ingest spreadsheet has a “Title” column and a single “Additional Titles” column where all other title variations are compiled. It is not practical to expect temporary cataloguers to understand how to use the “Title” and “Additional Titles” columns on the ingest spreadsheet. It is much more effective to provide cataloguers with a listing template that has three prescriptive title columns. This helps them clearly understand what type of titles are required and where they should be put.

The EAP team then uses Open Refine to move these titles into the appropriate columns (illustrated above). It places one in the main “Title” field and concatenates the other two titles (if they exist) into the “Additional Titles” field. It also creates two new title type columns, which the ingest process requires so that it knows which title is which.

This is just one part of the migration stage of the workflow, which performs several renaming, re-ordering, and concatenation tasks like this to prepare the data for ingest into the British Library’s cataloguing system.

Task 3: Validate

While cleaning and preparing the data for migration is important, it also vital that we check that the data is accurate and reliable. But who has the time, inclination, or eye stamina to read thousands of rows of data in an Excel spreadsheet? What we require is a computational method to validate data. Perhaps the best way of doing this is to write a bespoke computer program. This indeed is something that I am now working on while learning to write computer code using the Python language (look out for a further blog on this later).

In the meantime, though, Open Refine has helped us to validate large volumes of metadata with no programming experience required.

Task 3 Example: Validating metadata-content connections

When we receive the final output from a digitisation project, one of our most important tasks is to ensure that all of digital content (images, audio and video recordings) correlate with the metadata on the spreadsheet and vice versa.

We begin by running a command line report on the folders containing the digital content. This provides us with a csv file which we can read in Excel. However, the data is not presented in a neat format for comparison purposes.

Restructuring data ready for validation comparisons

For this particular task what we want is a simple list of all the digital folder names (not the full directory) and the number of TIFF images each folder contains. Open Refine enables just that, as the next image illustrates.

Constructing the sequence that restructures this data required careful planning and good familiarity with Open Refine and the GREL expression language. But after the data had been successfully restructured once, we never have to think about how to do this again. As with other parts of the workflow, we now just have to copy and paste the sequence to repeat this transformation on new datasets in the same format.

Cross referencing data for validation

With the data in this neat format, we can now do a number of simple cross referencing checks. We can check that:

Each digital folder has a corresponding row of metadata – if not, this indicates that the metadata is incomplete
Each row of metadata has a corresponding digital folder – if not, this indicates that some digital folders containing images are missing
The actual number of TIFF images in each folder exactly matches the number of images recorded by the cataloguer – if not this may indicate that some images are missing.

For each of these checks we use Open Refine’s cell.cross expression to cross reference the digital folder report with the metadata listing.

In the screenshot below we can see the results of the first validation check. Each digital folder name should match the reference number of a record in the metadata listing. If we find a match it returns that reference number in the “CrossRef” column. If no match is found, that column is left blank. By filtering that column by blanks, we can very quickly identify all of the digital folders that do not contain a corresponding row of metadata. In this example, before applying the filter, we can already see that at least one digital folder is missing metadata. An archivist can then investigate why that is and fix the problem.

Task 4: Enhance

We enhance our metadata in a number of ways. For example, we import authority codes for languages and scripts, and we assign subject headings and authority records based on keywords and phrases found in the titles and description columns.

Named Entity Extraction

One of Open Refine’s most dynamic features is its ability to connect to other online databases and thanks to the generous support of Dandelion API we are able to use its service to identify entities such as people, places, organisations, and titles of work.

In just a few simple steps, Dandelion API reads our metadata and returns new linked data, which we can filter by category. For example, we can list all of the entities it has extracted and categorised as a place or all the entities categorised as people.

Not every named entity it finds will be accurate. In the above example “Baptism” is clearly not a place. But it is much easier for an archivist to manually validate a list of 29 phrases identified as places, than to read 10,000 scope and content descriptions looking for named entities.

Clustering inconsistencies

If there is inconsistency in the metadata, the returned entities might contain multiple variants. This can be overcome using Open Refine’s clustering feature. This identifies and collates similar phrases and offers the opportunity to merge them into one consistent spelling.

Linked data reconciliation

Having identified and validated a list of entities, we then use other linked data services to help create authority records. For this particular task, we use the Wikidata reconciliation service. Wikidata is a structured data sister project to Wikipedia. And the Open Refine reconciliation service enables us to link an entity in our dataset to its corresponding item in Wikidata, which in turn allows us to pull in additional information from Wikidata relating to that item.

For a South American photograph project we recently catalogued, Dandelion API helped identify 335 people (including actors and performers). By subsequently reconciling these people with their corresponding records in Wikidata, we were able to pull in their job title, date of birth, date of death, unique persistent identifiers, and other details required to create a full authority record for that person.

Creating individual authority records for 335 people would otherwise take days of work. It is a task that previously we might have deemed infeasible. But Open Refine and Wikidata drastically reduces the human effort required.

Summary

In many ways, that is the key benefit. By placing Open Refine at the heart of our workflow for processing metadata, it now takes us less time to do more. Our workflow is not perfect. We are constantly finding new ways to improve it. But we now have a semi-automated method for processing large volumes of metadata.

This blog puts just some of those methods in the spotlight. In the interest of brevity, we refrained from providing step-by-step detail. But if there is interest, we will be happy to write further blogs to help others use this as a starting point for their own metadata processing workflows.

Posted by Digital Research Team at 12:58 PM

Tags

Data, Digital scholarship, Experiments, Humanities, LIS research, Projects, Tools

20 April 2020

BL Labs Research Award Winner 2019 - Tim Crawford - F-Tempo

Posted on behalf of Tim Crawford, Professorial Research Fellow in Computational Musicology at Goldsmiths, University of London and BL Labs Research Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Introducing F-TEMPO

Early music printing

Music printing, introduced in the later 15th century, enabled the dissemination of the greatest music of the age, which until that time was the exclusive preserve of royal and aristocratic courts or the Church. A vast repertory of all kinds of music is preserved in these prints, and they became the main conduit for the spread of the reputation and influence of the great composers of the Renaissance and early Baroque periods, such as Josquin, Lassus, Palestrina, Marenzio and Monteverdi. As this music became accessible to the increasingly well-heeled merchant classes, entirely new cultural networks of taste and transmission became established and can be traced in the patterns of survival of these printed sources.

Music historians have tended to neglect the analysis of these patterns in favour of a focus on a canon of ‘great works’ by ‘great composers’, with the consequence that there is a large sub-repertory of music that has not been seriously investigated or published in modern editions. By including this ‘hidden’ musical corpus, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time.

Online resources of music and how to read them

Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music is also available online in great quantity in the form of PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. These are a surrogate for paper-based books used in traditional musicology, but offer few advantages beyond convenience. What they don’t allow is full-text search, unlike the text-based online materials which are increasingly the subject of ‘distant reading’ in the digital humanities.

With good score images, Optical Music Recognition (OMR) programs can sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised symbols. The results often amount to musical gibberish, severely limiting the usefulness of OMR for creating large digital score collections. Our OMR program is Aruspix, which is highly reliable on good images, even when they have been digitised from microfilm.

Here is a screen-shot from Aruspix, showing part of the original page-image at the top, and the program’s best effort at recognising the 16th-century music notation below. It is not hard to see that, although the program does a pretty good job on the whole, there are not a few recognition errors. The program includes a graphical interface for correcting these, but we don’t make use of that for F-TEMPO for reasons of time – even a few seconds of correction per image would slow the whole process catastrophically.

The Aruspix user-interface

Finding what we want – error-tolerant encoding

Although OMR is far from perfect, online users are generally happy to use computer methods on large collections containing noise; this is the principle behind the searches in Google Books, which are based on Optical Character Recognition (OCR).

For F-TEMPO, from the output of the Aruspix OMR program, for each page of music, we extract a ‘string’ representing the pitch-name and octave for the sequence of notes. Since certain errors (especially wrong or missing clefs or accidentals) affect all subsequent notes, we encode the intervals between notes rather than the notes themselves, so that we can match transposed versions of the sequences or parts of them. We then use a simple alphabetic code to represent the intervals in the computer.

Here is an example of a few notes from a popular French chanson, showing our encoding method.

A few notes from a Crequillon chanson, and our encoding of the intervals

F-TEMPO in action

F-TEMPO uses state-of-the-art, scalable retrieval methods, providing rapid searches of almost 60,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks are different. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content are usually well-matched.

Here is a screen-shot from the demo interface to F-TEMPO. The ‘query’ image is on the left, and searches are done by hitting the ‘Enter’ or ‘Return’ key in the normal way. The list of results appears in the middle column, with the best match (usually the query page itself) highlighted and displayed on the right. As other results are selected, their images are displayed on the right. Users can upload their own images of 16th-century music that might be in the collection to serve as queries; we have found that even photos taken with a mobile phone work well. However, don’t expect coherent results if you upload other kinds of image!

F-Tempo-User Interface

The F-TEMPO web-site can be found at: http://f-tempo.org

Click on the ‘Demo’ button to try out the program for yourself.

What more can we do with F-TEMPO?

Using the full-text search methods enabled by F-TEMPO’s API we might begin to ask intriguing questions, such as:

‘How did certain pieces of music spread and become established favourites throughout Europe during the 16th century?’
‘How well is the relative popularity of such early-modern favourites reflected in modern recordings since the 1950s?’
‘How many unrecognised arrangements are there in the 16th-century repertory?’

In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we are finding more such unexpected concordances, and can sometimes identify the composers of works labelled in some printed sources as by ‘Incertus’ (Uncertain). We have also uncovered some interesting conflicting attributions which could provoke interesting scholarly discussion.

Early Music Online and F-TEMPO

From the outset, this project has been based on the Early Music Online (EMO) collection, the result of a 2011 JISC-funded Rapid Digitisation project between the British Library and Royal Holloway, University of London. This digitised about 300 books of early printed music at the BL from archival microfilms, producing black-and-white images which have served as an excellent proof of concept for the development of F-TEMPO. The c.200 books judged suitable for our early methods in EMO contain about 32,000 pages of music, and form the basis for our resource.

The current version of F-TEMPO includes just under 30,000 more pages of early printed music from the Polish National Library, Warsaw, as well as a few thousand from the Bibliothèque nationale, Paris. We will soon be incorporating no fewer than a further half-a-million pages from the Bavarian State Library collection in Munich, as soon as we have run them through our automatic indexing system.

(This work was funded for the past year by the JISC / British Academy Digital Humanities Research in the Humanities scheme. Thanks are due to David Lewis, Golnaz Badkobeh and Ryaan Ahmed for technical help and their many suggestions.)

Posted by Digital Research Team at 11:01 AM

Tags

BL Labs, Data, Digital scholarship, Experiments, Humanities, LIS research, Manuscripts, Music, Printed books, Research collaboration, Sound and vision, Tools

16 April 2020

BL Labs Community Commendation Award 2019 - Lesley Phillips - Theatre History

EXPLORING THEATRE HISTORY WITH BRITISH LIBRARY PLAYBILLS AND NEWSPAPERS

Posted on behalf of Lesley Phillips, a former Derbyshire local studies librarian in the UK and BL Labs Community Commendation Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Lesley explains how the British Library's digital collections of playbills and digtised newspapers enabled her to compile a detailed account of the career of the actor-manager John Faucit Saville in the East Midlands 1843-1855.

John Faucit Saville was born in Norwich in 1807, the son of two actors then performing with the Norwich Company as Mr and Mrs Faucit. His parents separated when he was 14 years old and just entering on his stage career. His mother, then a leading actress at Drury Lane, moved in with the celebrated actor William Farren, and continued to perform as Mrs Faucit, while his father became a manager and changed his surname to Saville (his real name).

Oxberry's Dramatic Biography (1825) records his father's grief:

On the evening that the fatal news [of his wife's desertion] reached him [Mr John Faucit] left the theatre and walked over the beach. His lips trembled and he was severely agitated. Many persons addressed him, but he broke from them and went to the house of a particular friend. The facts were then known only to himself. Though a man of temperate habits, he drank upwards of two bottles of wine without being visibly affected. He paced the room and seemed unconscious of the presence of anyone. To his friend's inquiries he made no reply. He once said “My heart is almost broke, but you will soon know why”.

(C.E. Oxberry (ed.) Oxberry's Dramatic Biography and Histrionic Anecdotes. Vol. III (1825) pp. 33-34, Memoir of William Farren)

Despite the rift between his parents, John Faucit Saville had all the advantages that famous friends and relatives could bring in the theatrical world, but during his time as an aspiring actor it soon became clear that he would never be a great star. In 1841 he began to put his energies into becoming a manager, like his father before him. He took a lease of Brighton Theatre in his wife's home town, but struggled to make a success of it.

Like the other managers of his day he was faced with a decline in the fashion for rational amusements and the rise of 'beer and circuses'. This did not deter him from making a further attempt at establishing a theatrical circuit. For this he came to the East Midlands and South Yorkshire, where the decline of the old circuit and the retirement of Thomas Manly had laid the field bare for a new man. Saville must surely have had great confidence in his own ability to be successful here, given that the old, experienced manager had begun to struggle.

Saville took on the ailing circuit, and soon discovered that he was forced to make compromises. He was careful to please the local authorities as to the respectability of his productions, and yet managed to provide more lowbrow entertainments to bring in the audiences. Even so, after a few years he was forced to reign in his ambitions and eventually reduce his circuit, and he even went back on tour as an itinerant actor from time to time to supplement his income. Saville's career had significant implications for the survival of some of the theatres of the East Midlands, as he lived through the final disintegration of the circuit.

Over the years, John Faucit Saville's acting career had taken him to Paris, Edinburgh, and Dublin, as well as many parts of England. Without the use of digital online resources it would be almost impossible to trace a career such as his, to explore his background, and bring together the details of his life and work.

Newspaper article from 29 January 1829 detailing the benefit performance for Mr Faucit entitled 'Clandestine Marriage' at the Theatre Royal Brighton

The digitised newspapers of the British Newspaper Archive https://www.britishnewspaperarchive.co.uk enabled me to uncover the Saville family origins in Bedford, and to follow John Faucit Saville's career from the heights of the London stage, to management at Brighton and then to the Midlands.

Newspaper article detailing benefit performance for Mr JF Saville at Theatre Royal Derby on Friday May 23, 1845, play entitled 'Don Caesar de Bazan' or 'Martina the Gypsy'

The dataset of playbills available to download from the British Library web site https://data.bl.uk/playbills/pb1.html enabled me to build up a detailed picture of Saville's work, the performers and plays he used, and the way he used them. It was still necessary to visit some libraries and archives for additional information, but I could never have put together such a rich collection of information without these digital resources.

My research has been put into a self-published book, filled with newspaper reviews of Saville's productions, and stories about his company. This is not just a narrow look at regional theatre; there are also some references to figures of national importance in theatre history. John Faucit Saville's sister, Helen Faucit, was a great star of her day, and his half-brother Henry Farren made his stage debut in Derbyshire with Saville's company. John Faucit Saville's wife Marianne performed with Macready on his farewell tour and also played at Windsor for Queen Victoria. The main interest for me, however, was the way theatre history reveals how national and local events impacted on society and public behaviour, and how the theatre connected with the life of the ordinary working man and woman.

Front cover of my self-published book about John Faucit Saville

If you are interested in playbills generally, you might want to help British Library provide more information about individual ones through a crowdsourcing project, entitled 'In the Spotlight'.

Posted by Digital Research Team at 3:48 PM

Tags

BL Labs, Data, Digital scholarship, Experiments, Humanities, Literature, Manuscripts, Music, Printed books, Projects

15 April 2020

Rapidly pivoting to online delivery of a Library Carpentry course

Add comment Comments (0)

This blogpost is by Jez Cope, Data Services Lead in the British Library’s Research Infrastructure Services team with contributions from Nora McGregor, Digital Curator, British Library Digital Research Team.

Nora wrote a piece the other day about Learning in Lockdown, suggesting a number of places you can find online resources to learn from while working from home. She also mentioned that we were running our own experiments on this, having been forced by circumstance to pivot our current Library Carpentry course to online delivery for colleagues stuck at home under lockdown. This post is an attempt to summarise some of the things we’ve learned so far about that.

From in-person to online

A series of Library Carpentry workshops were planned last month as part of our regular staff Digital Scholarship Training Programme. It was a collaboration between Sarah Stewart and me from Research Infrastructure Services, and Nora McGregor, Daniel van Strien and Deirdre Sullivan from Digital Scholarship, two teams in the Collections division of the British Library.

The original plan was to run three, slightly personalised for the British Library context, 2-hour workshops at weekly intervals, in person at our flagship site at St Pancras, London, for roughly 15 staff members:

We also planned to do an optional fourth session covering Python & Jupyter Notebooks. All four sessions were based on material from the Library Carpentry community, which includes a significant percentage of what we call “live coding”: the instructor demonstrates use of a tool or programming language live with a running explanation, and participants follow along, duplicating what the instructor does on their own workstation/laptop and asking questions as they arise.

The team agreed (the Friday before, eek!) to try running Session 1: Tidy Data fully online via Zoom instead of face-to-face. By that point though the Library was still open, many of the staff attending were either already working remotely, or expecting to shortly, so we thought we’d get a jump on trying to run the sessions online rather than force staff into a small enclosed training room!

So we ran that first session online, and then asked the participants what they thought: would they like us to postpone the rest of the course until we could run it face-to-face, or at least until we’ve all got more used to remote ways of working. The overwhelming response was that everyone would like to continue the rest of the workshops as planned, so we did! Below we've put together just some of our first reflections and things we've learned from pivoting to online delivery of a Library Carpentry style workshop.

Our experiences & tips

It's a good time to reflect on your teaching practice and learn a bit more about how people learn. If you only read one book on this subject, make it “How Learning Works Seven Research-Based Principles for Smart Teaching” (Ambrose et al, 2010), which does a great job of busting some common learning myths and presents research-backed principles with guidance on how to implement them practically.

In-person workshops, particularly of a technical nature, will not directly translate into an equivalent online session, so don’t even try! The latter should be much shorter than what you would expect to deliver in person. The key is to minimise cognitive load: brains work best when they can concentrate on one thing at a time in relatively short bursts. Right now, everyone is already a bit overtaxed than normal just trying to adapt to the new state of affairs, so be prepared to cover a lot less material, perhaps over shorter more frequent sessions if necessary, than you might otherwise expect.

With that in mind, we found it useful to use our live online session time primarily as a way to get people set up and familiar with the technology and coursework, and to give them enough background information to instill confidence in them to continue the learning in more depth in their own time. We feel the Library Carpentry lessons are very well suited for this kind of live + asynchronous approach.

Before your session

Manage expectations from the outset. Be clear with participants about what they can expect from the new online session, particularly if it is a modification of a course typically given in person. Especially right now, many people are having to start using online tools that they’re unfamiliar with, so make sure everyone understands that’s ok, and that time (and resource) will be built into the course to help everyone navigate any issues. Stress that patience (and forgiveness!) with themselves, each other, the instructor, and the process is essential!

Decide what tools you’re going to use and test them out to become familiar with them. If possible, give your participants an opportunity to try things out beforehand too, so they’re not learning the tools at the same time as learning your content.

If your training is of a technical nature, it can be helpful to survey participants ahead of time about what sort of computing environment they have at home. We found it useful to get a sense of what operating systems folks would be using so that we could be prepared for the inevitable Mac vs. Windows questions and whether or not they were familiar with videoconferencing tools and such.
Share course materials with participants (especially pre-course setup instructions and anticipated schedule) well ahead of time. It can be much harder to follow along remotely, and easier to get lost if you get distracted by a call of nature or family member. Providing structure, eliminating surprises and giving everyone time to acclimate to material ahead of time will help the session run smoothly.

During your session

Turn on your video; people like to be able to see who’s teaching them, IDK, I guess it’s a human thing. Evidence on whether this actually improves learning is patchy, but there is good evidence that learners prefer it. On the flipside, you might encourage your participants, who can, to turn on their video if possible, as this can help the presenter connect with the class.
Take some time at the start to make sure everyone is aware and familiar with the features of the conferencing tool you're using. At a minimum make sure everyone is aware of the mechanisms available to them for participating and communicating during the session. We used Zoom to deliver this course and found it was helpful to point out that the "Group" view setting is more ideal than the "Speaker" view which will flit around too much if there is any background noise, that everyone should mute their microphones when not speaking, where the chat box can be accessed for asking questions, and how to use the "raise hand" feature when answering a question from the instructor. The latter is useful in getting a quick read of the whole class on whether or not participants need help at certain stages.
Assign one or two people specifically to monitor any backchannels, such as chat boxes or Slack, if you’re using them, as it’s really hard to do this while also leading the session. These people can also summarise key points from the main session in the chat.
If using a shared online notes document (like Google Docs or HackMD) break the ice by asking everyone to do a simple task with it, like adding their name to a list of attendees. Keep the use of supplemental resources simple though, try not to send attendees off in too many directions too often as many folks with small laptop screens will find it difficult to navigate between lots of different windows and links too frequently.
Don’t forget to make time for breaks! Concentrating on your screen is hard work at the best of times, so it’s really important for both learners and teachers to have regular breaks during the session.

After your session

Send round links to any materials that learners didn’t receive before the session, especially things that came up in discussion that aren’t recorded in your slides or notes. Another good reason for having someone dedicated to monitoring the chat is they can also be on hand to ensure any good advice or examples or links from the chat session is collected before it closes and disappears (our current policy is to not collect an automatic transcription with Zoom sessions).
Give people a channel to stay in touch, ask further questions and generally feel a bit less alone in their learning after the session; this could be a Slack team, a mailing list, a wiki or whatever works for you and your learners.
Make sure you have a mechanism in place to gather honest feedback from attendees and make adjustments for the next time around. Practice makes perfect!

Conclusions

This is a learning process for all of us, even those who are experienced teachers, so don’t be afraid to try things out and make mistakes (you will anyway!). We’d love to hear more about your experiences. Drop us a line in the comments or email [email protected]!

Posted by Digital Research Team at 8:18 AM

Tags

Collaborations, Data, Digital scholarship, Events, Tools

14 April 2020

BL Labs Artistic Award Winner 2019 - The Memory Archivist - Lynda Clark

Posted on behalf of Lynda Clark, BL Labs Artistic Award Winner 2019 by Mahendra Mahey, Manager of BL Labs.

My research, writing and broader critical practice are inextricably linked. For example, the short story “Ghillie’s Mum”, recently nominated for the BBC Short Story Award, was an exploration of fraught parent / child relationships, which fed into my interactive novella Writers Are Not Strangers, which was in turn the culmination of research into the way readers and players respond to writers and creators both directly and indirectly.

“The Memory Archivist” BL Labs Artistic award winner 2019, offers a similar blending of creative work, research and reflection. The basis for the project was the creation of a collection of works of interactive fiction for the UK Web Archive (UKWA) as part of an investigation into whether it was possible to capture interactive works with existing web archiving tools. The project used WebRecorder and Web ACT to add almost 200 items to the UKWA. An analysis of these items was then undertaken, which indicated various recurring themes, tools and techniques used across the works. These were then incorporated into “The Memory Archivist” in various ways.

Opening screen for the Memory Archivist

The interactive fiction tool Twine was the most widely used by UK creators across the creative works, and was therefore used to create “The Memory Archivist”. Key themes such as pets, public transport and ghosts were used as the basis for the memories the player character may record. Elements of the experience of, and challenges relating to, capturing interactive works (and archival objects more generally) were also incorporated into the narrative and interactivity. When the player-character attempts to replay some of the memories they have recorded, they will find them captured only partially, or with changes to their appearance.

There were other, more direct, ways in which the Library’s digital content was included too, in the form of repurposing code. ‘Link select’ functionality was adapted from Jonathan Laury’s Ostrich and CSS style sheets from Brevity Quest by Chris Longhurst were edited to give certain sections their distinctive look. An image from the Library’s Flickr collection was used as the central motif for the piece not only because it comes from an online digital archive, but because it is itself a motif from an archive – a French 19th Century genealogical record. Sepia tones were used for the colour palette to reflect the nostalgic nature of the piece.

Example screen shots from the Memory Archivist

Together, these elements aim to emphasise the fact that archives are a way to connect memories, people and experiences across time and space and in spite of technological challenges, while also acknowledging that they can only ever be partial and decontextualised.

The research into web archiving was presented at the International Internet Preservation Consortium in Zagreb and the Digital Preservation Coalition’s Web Archiving & Preservation Working Group event in Edinburgh.

Other blog posts from Lynda's related work are available here:

Posted by Digital Research Team at 3:00 PM

Tags

BL Labs, Collaborations, Contemporary Britain, Data, Digital scholarship, Experiments, Legal deposit, Modern history, Writing

08 April 2020

Legacies of Catalogue Descriptions and Curatorial Voice: a new AHRC project

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the School of History, Art History and Philosophy, University of Sussex. James has a background in the history of the printed image, archival theory, art history, and computational analysis. He is author of The Business of Satirical Prints in Late-Georgian England (2017), the first monograph on the infrastructure of the satirical print trade circa 1770-1830, and a member of the Programming Historian team.

I love a good catalogue. Whether describing historic books, personal papers, scientific objects, or works of art, catalogue entries are the stuff of historical research, brief insights into a many possible avenues of discovery. As a historian, I am trained to think critically about catalogues and the entries they contain, to remember that they are always crafted by people, institutions, and temporally specific ways of working, and to consider what that reality might do to my understanding of the past those catalogues and entries represent. Recently, I've started to make these catalogues my objects of historical study, to research what they contain, the labour that produced them, and the socio-cultural forces that shaped that labour, with a particular focus on the anglophone printed catalogue circa 1930-1990. One motivation for this is purely historical, to elucidate what I see as an important historical phenomenon. But another is about now, about how those catalogues are used and reused in the digital age. Browse the shelves of a university library and you'll quickly see that circumstances of production are encoded into the architecture of the printed catalogue: title pages, prefaces, fonts, spines, and the quality of paper are all signals of their historical nature. But when their entries - as many have been over the last 30 years - are moved into a database and online, these cues become detached, and their replacement – a bibliographic citation – is insufficient to evoke their historical specificity, does little to help alert the user to the myriad of texts they are navigating each time they search an online catalogue.

It is these interests and concerns that underpin "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship", a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library. This 12-month project funded by the Arts and Humanities Research Council aims to open up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Our pilot research will investigate the temporal and spatial legacy of a catalogue I know well - the landmark ‘Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum’, produced by Mary Dorothy George between 1930 and 1954, 1.1 million words of text to which all scholars of the long-eighteenth century printed image are indebted, and which forms the basis of many catalogue entries at other institutions, not least those of our partners at the Lewis Walpole Library. We are particularly interested in tracing the temporal and spatial legacies of this catalogue, and plan to repurpose corpus linguistic methods developed in our "Curatorial Voice" project (generously funded by the British Academy) to examine the enduring legacies of Dorothy George's "voice" beyond her printed volumes.

Participants at the Curatorial Voices workshop, working in small groups and drawing images on paper.

Some things we got up to at our February 2019 Curatorial Voice workshop. What a difference a year makes!

But we also want to demonstrate the value of these methods to cultural institutions. Alongside their collections, catalogues are central to the identities and legacies of these institutions. And so we posit that being better able to examine their catalogue data can help cultural institutions get on with important catalogue related work: to target precious cataloguing and curatorial labour towards the records that need the most attention, to produce empirically-grounded guides to best practice, and to enable more critical user engagement with 'legacy' catalogue records (for more info, see our paper ‘Investigating Curatorial Voice with Corpus Linguistic Techniques: the case of Dorothy George and applications in museological practice’, Museum & Society, 2020).

A table with boxes of black and red lines which visualise the representation of spacial and non-spacial sentence parts in the descriptions of the satirical prints.

An analysis of our BM Satire Descriptions corpus (see doi.org/10.5281/zenodo.3245037 for how we made it and doi.org/10.5281/zenodo.3245017 for our methods). In this visualization - a snapshot of a bigger interactive - one box represents a single description, red lines are sentence parts marked ‘spatial’, and black lines are sentence parts marked as ‘non-spatial’. This output was based on iterative machine learning analysis with Method52. The data used is published by ResearchSpace under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Over the course of the "Legacies" project, we had hoped to run two capability building workshops aimed at library, archives, and museum professionals. The first of these was due to take place at the British Library this May, and the aim of the workshop was to test our still very much work-in-progress training module on the computational analysis of catalogue data. Then Covid-19 hit and, like most things in life, the plan had to be dropped.

The new plan is still in development, but the project team know that we need input from the community to make the training module of greatest benefit to that community. The current plan is that in late summer we will run some ad hoc virtual training sessions on computational analysis of catalogue data. And so we are looking for library, archives, and museum professionals who produce or work with catalogue data to be our crash test dummies, to run through parts of the module, to tell us what works, what doesn't, and what is missing. If you'd be interested in taking part in one of these training sessions, please email James Baker and tell me why. We look forward to hearing from you.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

Posted by Digital Research Team at 10:00 AM

Tags

Data, Digital scholarship, LIS research, Projects, Research collaboration, Tools

Technorati Tags: catalogue data reuse, computational analysis, curatorial voice, legacy catalogue descriptions

06 April 2020

Poetry Mobile Apps

This is a guest post by Pete Hebden, a PhD student at Newcastle University, currently undertaking a practice-led PhD; researching and creating a poetry app. Pete has recently completed a three month placement in Contemporary British Published Collections at the British Library, where he assisted curators working with the UK Web Archive, artists books and emerging formats collections, you can follow him on Twitter as @Pete_Hebden.

As part of my PhD research, I have been investigating how writers and publishers have used smartphone and tablet devices to present poetry in new ways through mobile apps. In particular, I’m interested in how these new ways of presenting poetry compare to the more familiar format of the printed book. The mobile device allows poets and publishers to create new experiences for readers, incorporating location-based features, interactivity, and multimedia into the encounter with the poem.

Since the introduction of smartphones and tablet computers in the early 2010s, a huge range of digital books, e-literature, and literary games have been developed to explore the possibilities of this technology for literature. Projects like Ambient Literature and the work of Editions at Play have explored how mobile technology can transform story-telling and narrative, and similarly my project looks at how this technology can create new experiences of poetic texts.

Below are a few examples of poetry apps released over the past decade. For accessibility reasons, this selection has been limited to apps that can be used anywhere and are free to download. Some of them present work written with the mobile device in mind, while others take existing print work and re-mediate it for the mobile touchscreen.

Puzzling Poetry (iOS and Android, 2016)

Dutch developers Studio Louter worked with multiple poets to create this gamified approach to reading poetry. Existing poems are turned into puzzles to be unlocked by the reader word-by-word as they use patterns and themes within each text to figure out where each word should go. The result is that often new meanings and possibilities are noticed that might have been missed in a traditional linear reading experience.

Screen capture image of the Puzzling Poetry app

This video explains and demonstrates how the Puzzling Poetry app works:

Translatory (iOS, 2016)

This app, created by Arc Publications, guides readers in creating their own English translations of contemporary foreign-language poems. Using the digital display to see multiple possible translations of each phrase, the reader gains a fresh understanding of the complex work that goes into literary translation, as well as the rich layers of meaning included within the poem. Readers are able to save their finished translations and share them through social media using the app.

Screen capture image of the Translatory app

Poetry: The Poetry Foundation app (iOS and Android, 2011)

At nearly a decade old, the Poetry Foundation’s Poetry app was one of the first mobile apps dedicated to poetry, and has been steadily updated by the editors of Poetry magazine ever since. It contains a huge array of both public-domain work and poems published in the magazine over the past century. To help users find their way through this, Poetry’s developers created an entertaining and useful interface for finding poems with unique combinations of themes through a roulette-wheel-style ‘spinner’. The app also responds to users shaking their phone for a random selection of poem.

Screen capture image of The Poetry Foundation app

ABRA: A Living Text (iOS, 2014)

A collaboration between the poets Amaranth Borsuk and Kate Durbin, and developer Ian Hatcher, the ABRA app presents readers with a range of digital tools to use (or spells to cast) on the text, which transform the text and create a unique experience for each reader. A fun and unusual way to encounter a collection of poems, giving the reader the opportunity to contribute to an ever-shifting, crowd-edited digital poem.

Screen capture image of the ABRA app

This artistic video below demonstrates how the ABRA app works. Painting your finger and thumb gold is not required!

I hope you feel inspired to check out these poetry apps, or maybe even to create your own.

Posted by Digital Research Team at 10:47 AM

Tags

Contemporary Britain, Experiments, Literature, Projects, Research collaboration, Writing

30 March 2020

Just stand-up and Kanban!

This is a guest post by Laura Parsons, Digitisation Workflow Administrator for the British Library's Qatar Foundation Partnership, on Twitter as @laurakpar

It takes unexpected and extreme world events, such as a pandemic and forced lock down, to make you realise the value of things and routines you previously took for granted. In the Workflow Administration team of the British Library / Qatar Foundation Partnership Project, one of our everyday, normal, taken-for-granted activities is our daily stand-up meeting at our Kanban board, complete with post-it notes, magnets and coloured pens. We thought we would explain our stand-up and Kanban process, how it helps us and how it has changed, and what we are doing now.

Time lapse video of our Kanban board showing it changing over 2 months from October 2019 to January 2020

Who are we?

The Workflow team is responsible for helping manage items through all the stages of the digitisation project workflow. It is a diverse role where we use problem solving, innovation and cross-team communication. Tasks range from administering our Microsoft SharePoint database that tracks the items we are digitising, to assisting the various teams throughout the workflow with technical questions and issues, and working to create the end product that is uploaded to the Qatar Digital Library. To help us complete these tasks and to ensure we juggle the variety of work, we manage our individual and team work using post-it notes on our Kanban board and by participating in a stand-up meeting.

Stand-up

At 9.45am everyday, on a normal pre-COVID-19 day, the Workflow team gathers around our Kanban board. This time is ingrained into our morning routine and without it the day does not seem to begin properly. By having this brief but regular catch-up with our team we get our brains thinking, focus on priorities, seek help, and share both achievements and frustrations.

Directed by the Board Leader, the responsibility for which rotates through the team each week, we take it in turns to report on three things: what we did yesterday, what we’re going to do today, and any issues we are having that are blocking our work. This often leads to a discussion about how the team could help, suggestions for who to ask or ideas for what we could try.

The whole stand-up process has rules and expectations, all carefully documented, and we are quick to tell someone (good naturedly) if they are not following the rules! Our rules govern things like colour coding of post-it notes and magnets, maximum number of tasks in your column (which is not always adhered to), and order of priority for tasks.

By the very nature of a stand-up meeting, it is kept short, sometimes less than five minutes for all seven of us to have our turn. This also helps any of us who do not like talking in front of a group; it’s fast, relaxed and supportive. If further help or discussion is needed, we can ask for some “Ticket Talk” later, where we talk with a colleague about our tickets.

Kanban innovation

We are very proud of our Kanban board. It is the product of many hours of team-work, creativity and striving to work more effectively, efficiently and collaboratively. It has a column for each person with the tasks that they are allocated to them. When we need more work, we pick up a task from the “New” column and then it stays in our column until we have completed the task, when it is finished it is moved to the “Complete” column so we can celebrate how productive we have been! Whilst we record and complete our work on an online system, we find that this tactile process helps us manage our workload and the workflow, as well as simply giving us visual feedback and a valuable sense of achievement.

Our board has developed over time with monthly “Retrospective” meetings used to brainstorm ideas for how we could improve our stand-up practice and our Kanban Board. In these meetings we each put forward suggestions for what we think we should start, stop and continue. This has been useful to raise new ideas and ensure that we all have a say in how we work. By regularly examining how we work, and suggesting and trying new things, we are always aiming to work more efficiently and effectively. In recent months we have: implemented the weekly rotating role of “Board Leader”, personalised name headers, invited visitors from other teams, included our Imaging Team as a regular stand-up participant, introduced magnets for regular tasks, started a weekly “What I learnt this week” section, and updated rules such as writing the days you are away this week under your name.

Kanban board from May 2018...

...and current version from February 2020

Without stand-up and Kanban

As we have begun working from home, we now have to become used to a new routine, or the lack of our previous one. We no longer have our physical Kanban board but we can still communicate daily with each other and our new team Slack channel has allowed regular chat. To help with this uncertain and isolated period, we are trialing our daily “stand-up” using emojis, where we communicate our thoughts and feelings for the day using three emojis (with a sentence explanation, only if you want to). While we learn new ways of working, at least this will remind us of our useful stand-up meetings and our much-loved Kanban board.

Daily stand-up update using emojis.

Posted by Digital Research Team at 4:55 PM

Tags

Projects