THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

11 September 2018

Building Library Labs around the world - the event and complete our survey!

Posted by Mahendra Mahey, BL Labs Manager.

Original labs lab (not cropped)
Building Library Labs

Around the world, leading national, state, university and public libraries are creating 'digital lab type environments' so that their digitised and born digital collections / data can be opened up and re-used for creative, innovative and inspiring projects by everyone such as digital researchers, artists, entrepreneurs and educators.

BL Labs, which has now been running for five years, is organising what we believe will be the first ever event of its kind in the world! We are bringing together national, state and university libraries with existing or planned digital 'Labs-style' teams for an invite-only workshop this Thursday 13 September and Friday 14 September, 2018.

A few months ago, we sent out special invitations to these organisations. We were delighted by the excitement generated, and by the tremendous response we received. Over 40 institutions from North America, Europe, Asia and Africa will be attending the workshop at the British Library this week. We have planned plenty of opportunities for networking, sharing lessons learned, and telling each other about innovative projects and services that are using digital collections / data in new and interesting ways. We aim to work together in the spirit of collaboration so that we can continue to build even better Library Labs for our users in the future.

Our packed programme includes:

  • 6 presentations covering topics such as those in our international Library Labs Survey;
  • 4 stories of how national Library Labs are developing in the UK, Austria, Denmark and the Netherlands;
  • 12 lightning talks with topics ranging from 3D-Imaging to Crowdsourcing;
  • 12 parallel discussion groups focusing on subjects such as funding, technical infrastructure and user engagement;
  • 3 plenary debates looking at the value to national Libraries of Labs environments and digital research, and how we will move forward as a group after this event.

We will collate and edit the outputs of this workshop in a report detailing the current landscape of digital Labs in national, state, university and public Libraries around the world.

If you represent one of these institutions, it's still not too late to participate, and you can do so in a few ways:

  • Our 'Building Library Labs' survey is still open, and if you work in or represent a digital Library Lab in one of our sectors, your input will be particularly valuable;
  • You may be able to participate remotely in this week's event in real time through Skype;
  • You can contribute to a collaborative document which delegates are adding to during the event.

If you are interested in one of these options, contact: mahendra.mahey@bl.uk.

Please note, that event is being videoed and we will be putting up clips on our YouTube channel soon after the workshop.

We will also return to this blog and let you know how we got on, and how you can access some of the other outputs from the event. Watch this space!

 

 

 

06 September 2018

Visualising the Endangered Archives Programme project data on Africa, Part 3. Finishing up

Sarah FitzGerald is a linguistics PhD researcher at the University of Sussex investigating the origins and development of Cameroon Pidgin English. She is currently a research placement student in the British Library’s Digital Scholarship Team, using data from the Endangered Archives Programme to create data visualisations

This summer I have taken a break by working hard, I’ve broadened my academic horizons by ignoring academia completely, and I’ve felt at home while travelling hundreds of miles a week. But above all else, I’ve just had a really nice time.

In my last two blogs I covered the early stages of my placement at the British Library, and discussed the data visualisation tools I’ve been exploring.

In this final blog I am going to outline the later stages of my project, I am also going to talk about my experience of undertaking a British Library placement, what I’ve learned and whether it was worth it (spoiler alert, it was).

What I’ve been doing

The final stages of my project have mostly consisted of two separate lines of investigation.

Firstly, I have been working on finding out as much as I can about the  Endangered Archives Programme (EAP)’s projects in Africa and finding the best ways to visualise that information in order to create a sort of bank of visualisations that the EAP team can use when they are talking about the work that they do. Visualisations, such as the one below showing the number of applications related to each region of Africa by year, can make tables of data much easier to understand.

Chart

Secondly, I was curious about why some project applications get funded and some do not. I wanted to know if I could identify any patterns in the reasons why projects get rejected.

This gave me the opportunity to apply my skills as a linguist to the data, albeit on a small scale. I decided to examine the feedback given to unsuccessful applicants by the panel that awards the EAP grants to see if I could identify any patterns. To do this I created a corpus, or electronic database, of texts. This could then be run through corpus analysis software to look for patterns.

AntConc

This image shows a word list created for my corpus using AntConc software, which is a free and open source corpus analysis tool.

My analysis allowed me to identify a number of issues common to many unsuccessful applications. In addition to applications outside of the scope of EAP there are also proposals which would make excellent projects but their applications lack the necessary information to award a grant.

Based on my analysis I was able to make a number of recommendations about additional information EAP could provide for applicants which might help to prevent potentially valuable archives being lost due to poor applications.

What I’ve learned

As well as learning about visualisation software I’ve learned a lot this summer about the EAP archives.

I’ve found out where applications are coming from, and which African countries have the most associated applications. I’ve learned that there are many great data visualisation tools available for free online. I’ve learned that there are over 70 different languages represented in the EAP archived projects from Africa.

EAP656
James Ssali and an unknown woman, from the Ham Musaka archive, Uganda (EAP656)

One of the most interesting things I’ve learned is just how much archival material is available for research – and on an incredibly broad range of topics. The materials digitised and preserved in Africa over the last 13 years includes:

This wealth of information provides so much opportunity for research and these are just the archives from Africa. The EAP funds projects all over the world.

EAP143
Shui manuscript from China (EAP143)

In addition to learning about the EAP archives I’ve learned a lot from working in the British Library more generally. The scale of the work that is carried out is immense and I don’t think I fully appreciated before working here for three months just how large the challenges they face are.

In addition to preserving a copy of every book published in the UK, the BL is also working to create large digital archives in order to facilitate the way that modern scholarship has developed. They are digitising books, audio, websites, as well as historical documents such as the records of the East India Company.

East India House
View of East India House by Thomas Hosmer Shepherd

Was it worth it?

A PhD is an intense thing to undertake and you have a time limit to complete it. At first glance, taking three months out to work on a placement with little direct relevance to my PhD might seem a bit foolish, particularly when it means a daily commute from Brighton to London.

Far from wasting my time, however, this placement has been an enriching experience. My PhD is on the origins and development of Cameroon Pidgin English. This placement has given me a break from my work while broadening my understanding of African culture and the context in which the language I study is spoken.

I’ve always had an interest in data visualisation and my placement has given me time to play with visualisation tools and gain a real understanding of the resources available. I feel refreshed and ready for the new term despite having worked full time all summer.

The break has also given me thinking space, it has allowed ideas to percolate and given me new skills which I can apply to my work. Taking a break from academia has given me more perspective on my work and more options for how to develop it.

BL
The British Library, St Pancras

Finally, the travel has been a lot but my supervisors have been very flexible, allowing me to work from home two days a week. The up-side of coming to London regularly has been getting to work with interesting people.

Working in a large institution could be an intimidating and isolating experience but it has been anything but. The digital scholarship team have been welcoming and interested, in particular I have had two very supportive supervisors. The British Library are really keen to support and develop placement students, and there is a lovely community of PhD students at the BL some on placements, some doing their PhD here.

I have had a great time at the British Library this summer and can only recommend the scheme to anyone thinking of applying for a placement next year.

28 August 2018

Student project report: Scribal Handwriting: An automated manuscript analysis tool

In 2017-18, Dr Mia Ridge worked with three groups of second year students on UCL's Computer Science course to apply their skills to collections and digital scholarship-based projects. In this post, Francesco Benintende (francescobenintende@icloud.com), Kamil Zajac (kamil.zajac.16@ucl.ac.uk) and Andrei Maxim (andrei.maxim.16@ucl.ac.uk) explain how they worked with curator Alison Hudson on 'Scribal Handwriting: An automated manuscript analysis tool'. A video of their final presentation is available online and their project page contains more technical information.

The challenge

The team was challenged to create a tool for palaeographers (researchers who analyse handwriting) that can determine the date of a manuscript and sometimes even its scribe and place of production. To help with this task, we designed a tool to quickly find occurrences of similar handwritten characters across a collection of documents. This would be a lengthy and repetitive task if done manually by researchers. Typically, researchers compare characters’ features such as script, size and ink of different manuscripts to establish possible similarities between manuscripts and scribes. 

Our mission was to create a faster and reliable tool that could be used by palaeographers. Our aim was to speed up their research process by automating the comparisons between characters.

Our approach

To create a solution for this particular challenge, our first approach consisted of problem research and user needs analysis. During this phase we made sure we highlighted the main features necessary for the application. We wanted to improve current methods and to understand the needs of future users. This phase was characterised by interviews, questionnaires and surveys aimed at people with similar technical level and background of the future user. This helped us tailor an appropriate user interface to the researchers. In addition to this, we tried to understand what the current limitations of research in the palaeography field were. Our initial research can be found at http://students.cs.ucl.ac.uk/2017/group33/initial_research.html.

After acquiring this initial information, creating the first prototype of the web application and testing the user response to the graphic components, we shifted our focus on building a system that would recognise characters written in similar scripts. This was the main phase of the development of the project. It consisted mainly of testing and evaluating different methods to find and compare characters’ features.

In our final phase, we were concerned with testing and evaluating our web application overall.

The solution

Our solution is a web app that allows researchers to create an account, upload and maintain a collection of manuscripts. With this, they can perform character searches in their personal collection. Furthermore, it allows researchers to perform analyses on these documents from anywhere.

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/ec4b8aec-2c5d-4b88-a021-8d856f190c16.png
Manuscript Collection page of Scribal Handwriting, where users can save documents.


To power our web app, we created our own algorithm to perform analysis between two characters or two ligatures. (Ligatures are the name for two characters written as one shape, as in the example of ‘NT’ below.) It does this by finding characters in selected pages to compare them with the character to be found. This analysis relies on converting the images of characters into ‘functions’ and then comparing them. This enables us to identify similar patterns in the characters, such as recurrent shapes and angles, and use this information to treat two characters as being similar.

 

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/f696d828-4fb4-4ec2-8fb9-20df5df14ad5.png
Example of character search using a selection image (right image) to be found in a page (left image) using our algorithm.

 

Evaluation

Overall our solution offers a consistent improvement over current manual methods as it enables researchers to work on important documents without having to physically consult them. It also offers useful data about the results, by indicating which results are most similar in shape and size to the original character. This can help scholars think about how scribes’ work or even a pen’s sharpness might change over the course of many pages. This might also offer new ways of arguing which parts of a manuscript were copied out by different scribes. Such arguments are often based largely or entirely on more subjective appraisals. While these are still necessary, this app is a useful addition to palaeographers’ toolkits. The app also usefully places the results in context: their location is shown on the full page, and the excerpts include a few letters to either side. As we conclude from our testing, the main limitation in performance can be found in larger images (in pixels) and damaged manuscripts.

Regardless of size and condition, our web app is able to consistently find occurrences of characters in different manuscripts.

 

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-08-28/c294c710-d576-48e0-87e1-6d7a27998ea6.png
Example of character search results of for “m” character.

Conclusion

This project allowed our team to experience real world applications of image processing as well as getting a unique insight into the world of palaeography and its research procedures. The team  also had to consider how to develop a web application for an audience that might not have used this sort of program before, and how to make a website that could work on a wide range of devices, from smartphones to relatively old desktop computers.  Moreover, we outlined future points that would improve the web app to make it a consumer grade tool for researchers: the use of machine learning technologies to improve performance, a mobile version to allow researchers to work from their smartphones too, a version of the app to analyse shapes and decoration as well as text, and an improved version of the algorithm to analyse damaged documents where there is less contrast between the colour of the ink and the colour of the parchment.