Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

30 March 2020

Just stand-up and Kanban!

This is a guest post by Laura Parsons, Digitisation Workflow Administrator for the British Library's Qatar Foundation Partnership, on Twitter as @laurakpar

 

It takes unexpected and extreme world events, such as a pandemic and forced lock down, to make you realise the value of things and routines you previously took for granted. In the Workflow Administration team of the British Library / Qatar Foundation Partnership Project, one of our everyday, normal, taken-for-granted activities is our daily stand-up meeting at our Kanban board, complete with post-it notes, magnets and coloured pens. We thought we would explain our stand-up and Kanban process, how it helps us and how it has changed, and what we are doing now.

Time lapse video of our Kanban board showing it changing over 2 months from October 2019 to January 2020
Time lapse video of our Kanban board showing it changing over 2 months from October 2019 to January 2020

 

Who are we?

The Workflow team is responsible for helping manage items through all the stages of the digitisation project workflow. It is a diverse role where we use problem solving, innovation and cross-team communication. Tasks range from administering our Microsoft SharePoint database that tracks the items we are digitising, to assisting the various teams throughout the workflow with technical questions and issues, and working to create the end product that is uploaded to the Qatar Digital Library. To help us complete these tasks and to ensure we juggle the variety of work, we manage our individual and team work using post-it notes on our Kanban board and by participating in a stand-up meeting.

Stand-up

At 9.45am everyday, on a normal pre-COVID-19 day, the Workflow team gathers around our Kanban board. This time is ingrained into our morning routine and without it the day does not seem to begin properly. By having this brief but regular catch-up with our team we get our brains thinking, focus on priorities, seek help, and share both achievements and frustrations.

Directed by the Board Leader, the responsibility for which rotates through the team each week, we take it in turns to report on three things: what we did yesterday, what we’re going to do today, and any issues we are having that are blocking our work. This often leads to a discussion about how the team could help, suggestions for who to ask or ideas for what we could try.

The whole stand-up process has rules and expectations, all carefully documented, and we are quick to tell someone (good naturedly) if they are not following the rules! Our rules govern things like colour coding of post-it notes and magnets, maximum number of tasks in your column (which is not always adhered to), and order of priority for tasks.

By the very nature of a stand-up meeting, it is kept short, sometimes less than five minutes for all seven of us to have our turn. This also helps any of us who do not like talking in front of a group; it’s fast, relaxed and supportive. If further help or discussion is needed, we can ask for some “Ticket Talk” later, where we talk with a colleague about our tickets.

Kanban innovation

We are very proud of our Kanban board. It is the product of many hours of team-work, creativity and striving to work more effectively, efficiently and collaboratively. It has a column for each person with the tasks that they are allocated to them. When we need more work, we pick up a task from the “New” column and then it stays in our column until we have completed the task, when it is finished it is moved to the “Complete” column so we can celebrate how productive we have been! Whilst we record and complete our work on an online system, we find that this tactile process helps us manage our workload and the workflow, as well as simply giving us visual feedback and a valuable sense of achievement.

Our board has developed over time with monthly “Retrospective” meetings used to brainstorm ideas for how we could improve our stand-up practice and our Kanban Board. In these meetings we each put forward suggestions for what we think we should start, stop and continue. This has been useful to raise new ideas and ensure that we all have a say in how we work. By regularly examining how we work, and suggesting and trying new things, we are always aiming to work more efficiently and effectively. In recent months we have: implemented the weekly rotating role of “Board Leader”, personalised name headers, invited visitors from other teams, included our Imaging Team as a regular stand-up participant, introduced magnets for regular tasks, started a weekly “What I learnt this week” section, and updated rules such as writing the days you are away this week under your name.

Kanban board from May 2018
Kanban board from May 2018...
Current version from February 2020
...and current version from February 2020

 

Without stand-up and Kanban

As we have begun working from home, we now have to become used to a new routine, or the lack of our previous one. We no longer have our physical Kanban board but we can still communicate daily with each other and our new team Slack channel has allowed regular chat. To help with this uncertain and isolated period, we are trialing our daily “stand-up” using emojis, where we communicate our thoughts and feelings for the day using three emojis (with a sentence explanation, only if you want to). While we learn new ways of working, at least this will remind us of our useful stand-up meetings and our much-loved Kanban board.

Daily stand-up update using emojis.
Daily stand-up update using emojis.

 

 

24 March 2020

Learning in Lockdown: Digital Research Team online

Add comment Comments (0)

This blog post is by Nora McGregor, Digital Curator, Digital Research Team/European and Americas Collections, British Library. She's on Twitter as @ndalyrose.

With British Library public spaces now closed, the Digital Research Team are focussing our energies on transforming our internal staff Digital Scholarship Training Programme into an online resource for colleagues working from home. Using a mixture of tools at our disposal (Zoom conferencing and our dedicated course Slack channels for text-based chat) we are experimenting with delivering some of our staff workshops such as the Library Carpentries and Open Refine with Owen Stephens online, as well as our reading group and staff lectures. Last week our colleague in Research Services, Jez Cope trialed the delivery of a Library Carpentry workshop on Tidy Data at the last minute to a virtual room of 12 colleagues. For some it was the first time ever working from home or using remote conferencing tools so the digital skills learning is happening on many levels which for us is incredibly exciting! We’ll share more in depth results of these experiments with you via this blog and in time, as we gain more experience in this area, we may well be able to offer some sessions to the public!

Homeschooling for the Digital Research Team

And just like parents around the world creating hopeful, colourful schedules for maintaining children’s daily learning (full disclosure: I’m one of ‘em!), so too are we planning to keep up with our schooling whilst stuck home. Below are just a handful of some of the online training and resources we in the Digital Research Team are keeping up with over the coming months. We’ll add to this as we go along and would of course welcome in the comments any other suggestions from our librarian and digital scholarship networks! 

  • Archivist’s at Home and Free Webinars and Trainings for Academic Library Workers (COVID-19) We’re keeping an eye on these two particularly useful resources for archivists and academic librarians looking for continuing education opportunities while working from home.
  • Digital Skills for the Workplace These (free!) online courses were created by Institute of Coding (who funded our Computing for Cultural Heritage course) to try to address the digital skills gap in a meaningful way and go much further than your classic “Beginner Excel” courses. Created through a partnership with different industries they aim to reflect practical baseline skills that employers need. 
  • Elements of AI is a (free!) course, provided by Finland as ‘a present for the European Union’ providing a gentle introduction to artificial intelligence. What a great present!
  • Gateway to Coding: Python Essentials Another (free!) course developed by the Institute of Coding, this one is designed particularly for folks like us at British Library who would like a gentle introduction to programming languages like Python, but can’t install anything on our work machines.
  • Library Juice Academy has some great courses starting up in April. The other great thing about these is that you can take them 'live' which means the instructor is around and available and you get a certificate at the end or 'asynchronously' at your own pace (no certificate).
  • Programming Historian Tutorials Tried and true, our team relies on these tutorials to understand the latest and greatest in using technology to manage and analyse data for humanities research. 

Time for Play

Of course, if Stephen King’s The Shining has taught us anything, we’d all do well to ensure we make time for some play during these times of isolation!

We’ll be highlighting more opportunities for fun distractions in future posts, but these are just a few ideas to help keep your mind occupied at the moment:

Stay safe, healthy and sane out there guys!

Sincerely,

The Digital Research Team

16 March 2020

A Season of Place – Journal Article Published!

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

Last year the Library’s Digital Scholarship Training Programme (DSTP), delivering training to BL staff, featured several training sessions dedicated to digital mapping, covering topics such as cataloguing geospatial data, geoparsing, georeferencing, working with online mapping tools, digital research using online maps, and public engagement through interactive platforms and crowdsourcing. We called it the ‘Season of Place’.

A year later, Gethin and I published a paper about it in the Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives, in a special issue dedicated to Information Literacy Instruction. Our shiny new article is entitled “A Season of Place: Teaching Digital Mapping at the British Library”, and is available through this DOI: https://doi.org/10.1080/15420353.2020.1719267. This is the abstract:

“One of the British Library Digital Scholarship team’s core purposes is to deliver training to Library staff. Running since 2012, the main aim of the Digital Scholarship Training Program (DSTP) is to create opportunities for staff to develop the necessary skills and knowledge to support emerging areas of scholarship. Recently, the Library has been experimenting with a new format to deliver its training that would allow flexibility and adaptability through modularity: a “season”. The Digital Scholarship team organized a series of training events billed as a “Season of Place”, which aimed to expose Library staff to the latest digital mapping concepts, methods and technologies, and provide them with the skills to apply cutting-edge research to their collection areas. The authors designed, coordinated and delivered this training season to fulfill broader Library objectives, choosing to mix and match the types of events and methods of delivery to fit the broad range of technologies that constitute digital mapping today. The paper also discusses the impact that these choices of methods and content has had on digital literacy and the uptake of digital mapping by presenting results of an initial evaluation obtained through observation and evaluation surveys.”

A Season of Place: Teaching Digital Mapping at the British Library- article screenshot

One of the things that we wrote about was the results of feedback survey sent to course participants three months after their training. Participants were asked questions about their levels of confidence in applying their learning within their work, relevance of the training to their work, frequency of applying knowledge or skills gained from the training days, and uptake of digital mapping tools following the training days. Survey results were published in the article mentioned above. However, in the meantime we’ve sent out a 1-year-later feedback survey, to see what people’s position was a year after undertaking our digital mapping training.

We had six responses to this 1-year survey. Respondents indicated that in most part digital mapping was not directly relevant to their areas of work, however if/when they would like to apply learning from the courses, they have some confidence in doing so (50% some confidence, 33.3% fairly confident, 16.7% confident). It was noted that areas of learning from the course applied to one’s work relate more to data clean-up and analysis rather than directly to maps, but that it was useful to know which software is available for when the need does arise in the future.

When it comes to specific tool usage, Google My Maps was the most popular tools that we’ve taught, followed by Recogito – this matches the levels of popularity indicated in our 3-month survey. Lastly, course attendees haven’t yet created, visualised or analysed geospatial data with the tools taught in the course (or others) – but did say that they’d learned a great deal, and that when the opportunity arises to start a relevant project – they’ll know where to start!

So, all in all, we’re happy that people have found our courses useful. The Library is now recruiting a Curator for Geospatial Cultural Heritage, contributing to the ‘Locating a National Collection’ project, a Foundational Collaborative project in the ‘Towards a National Collection: Opening UK Heritage to the World’ programme, funded by the AHRC. Do join us!

Apply here: https://britishlibrary.recruitment.zellis.com/birl/pages/vacancy.jsf?latest=01002198 – closing date is 22 March 2020.

 

11 February 2020

Call for participants: April 2020 book sprint on the state of the art in crowdsourcing in cultural heritage

[Update, March 2020: like so much else, our plans for the 'Collective Wisdom' project have been thrown out by the COVID-19 pandemic. We have an extension from our funders and will look to confirm dates when the global situation (especially around international flights) becomes clearer. In the meantime, the JISCMail Crowdsourcing list has some discussion on starting and managing projects in the current context.]

One of the key outcomes of our AHRC UK-US Partnership Development Grant, 'From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions', is the publication of an open access book written through a collaborative 'book sprint'. We'll work with up to 12 other collaborators to write a high-quality book that provides a comprehensive, practical and authoritative guide to crowdsourcing and digitally-enabled participation projects in the cultural heritage sector. Could you be one of our collaborators? Read on!

The book sprint will be held at the Peale Center for Baltimore History and Architecture from 19 - 24th April 2020. We've added a half-day debriefing session to the usual five day sprint, so that we can capture all the ideas that didn't make it into the book and start to shape the agenda for a follow-up workshop to be held at the British Library in October. Due to the pace of writing and facilitation, participants must be able to commit to five and a half days in order to attend. 

We have some confirmed participants already - including representatives from FromThePage, King’s College London Department of Digital Humanities, the Virginia Tech Department of Computer Science, and the Colored Conventions Project, plus the project investigators Mia Ridge (British Library), Meghan Ferriter (Library of Congress) and Sam Blickhan (Zooniverse) - with additional places to be filled by this open call for participation. 

An open call enables us to include folk from a range of backgrounds and experiences. This matches the ethos of the book sprint model, which states that 'diversity in participants—perspectives, experience, job roles, ethnicity, gender—creates a better work dynamic and a better book'. Participants will have the opportunity to not only create this authoritative text, but to facilitate the formation of an online community of practice which will serve as a resource and support system for those engaging with crowdsourcing and digitally-enabled participation projects.

We're looking for participants who are enthusiastic, experienced and engaged, with expertise at any point in the life cycle of crowdsourcing and digital participation. Your expertise might have been gained through hands-on experience on projects or by conducting research in areas from co-creation with heritage organisations or community archives to HCI, human computation and CSCW. We have a generous definition of 'digitally-enabled participation', including not-entirely-digital volunteering projects around cultural heritage collections, and activities that go beyond typical collection-centric 'crowdsourcing' tasks like transcription, classification and description. Got questions? Please email [email protected]!

How to apply

  1. Read the Book Sprint FAQs to make sure you're aware of the process and commitment required
  2. Fill in this short Google Form by midnight GMT February 26th

What happens next?

We'll review applications and let people know by the end of February 2020.

We're planning to book travel and accommodation for participants as soon as dates and attendance is confirmed - this helps keeps costs down and also means that individuals aren't out of pocket while waiting for reimbursement. The AHRC fund will pay for travel and accommodation for all book sprint participants. We will also host a follow up workshop at the British Library in October and hope to provide travel and accommodations for book sprint participants. 

We'll be holding a pre-sprint video call (on March 18, 19 or 20) to put faces to names and think about topics that people might want to research in advance and collect as an annotated bibliography for use during the sprint. 

If you can't make the book sprint but would still like to contribute, we've got you covered! We'll publish the first version of the book online for comment and feedback. Book sprints don't allow for remote participation, so this is our best way of including the vast amounts of expertise not in the room.

You can sign up to the British Library's crowdsourcing newsletters for updates, or join our Crowdsourcing group on Humanities Commons set up to share progress and engage in discussion with the wider community. 

New project! 'From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions'

[Update, March 2020: like so much else, our plans for the 'Collective Wisdom' project have been thrown out by the COVID-19 pandemic. We have an extension from our funders and will look to confirm dates when the global situation (especially around international flights) becomes clearer. In the meantime, the JISCMail Crowdsourcing list has some discussion on starting and managing projects in the current context.]

We - Mia Ridge (British Library), Meghan Ferriter (Library of Congress) and Sam Blickhan (Zooniverse) - are excited to announce that we've been awarded an AHRC UK-US Partnership Development Grant. Our overarching goals are:

  • To foster an international community of practice in crowdsourcing in cultural heritage
  • To capture and disseminate the state of the art and promote knowledge exchange in crowdsourcing and digitally-enabled participation
  • To set a research agenda and generate shared understandings of unsolved or tricky problems that could lead to future funding applications

How will we do that?

We're holding a five day collaborative 'book sprint' (or writing workshop) at the Peale Center for Baltimore History and Architecture in April 2020. Working with up to 12 other collaborators, we'll write a high-quality book that provides a comprehensive, practical and authoritative guide to crowdsourcing and digitally-enabled participation projects in the cultural heritage sector. We want to provide an effective road map for cultural institutions hoping to use crowdsourcing for the first time and a resource for institutions already using crowdsourcing to benchmark their work.

In the spirit of digital participation, we'll publish a commentable version of the book online with an open call for feedback from the extended international community of crowdsourcing practitioners, academics and volunteers. We're excited about including the expertise of those unable to attend the book sprint in our final open access publication.

The book sprint will close with a short debrief session to capture suggestions about gaps in the field and sketch the agenda for the closing workshop. 

In October 2020 we're holding a workshop at the British Library for up to 25 participants to interrogate, refine and advance questions raised during the year and identify high priority gaps and emerging challenges in the field that could be addressed by future research collaborations. We'll work with a community manager to ensure that remote participants are as integrated into the event as much as possible, which will lower our carbon footprint and let people contribute without getting on a plane. 

We'll publish a white paper reporting on this workshop, outlining emerging, intractable and unsolved challenges that could be addressed by further funding for collaborative work. 

Finally, we want this project to help foster the wonderful community of crowdsourcing practitioners, participants and researchers by hosting events and online discussion. 

Why now?

For several years, crowdsourcing has provided a framework for online participation with, and around, cultural heritage collections. This popularity leads to increased participant expectations while also attracting criticism such as accusations of ‘free labour’. Now, the introduction of machine learning and AI methods, and co-creation and new models of ownership and authorship present significant challenges for institutions used to managing interactions with collections on their own terms. 

How can you get involved?

Our call for participants in our April Book Sprint is now open!

Our final workshop will be held in mid- or late-October. The easiest way to get updates such as calls for contributors and links to blog posts is to sign up for the British Library's crowdsourcing newsletters or join the Crowdsourcing group on Humanities Commons

03 February 2020

2019 Winners of the New Media Writing Prize

On Wednesday 15 January 2020 it was the 10th Anniversary Awards Evening of the New Media Writing Prize (NMWP) at Bournemouth University. This international prize encourages and promotes the best in new media writing; showcasing innovative digital fiction, poetry and journalism. The types of interactive writing that we have been examining and researching in the emerging formats work at the Library.

NMWP logo
New Media Writing Prize logo

Before the NMWP winners were announced there was a fun hands-on session in the afternoon, for guests to experience Digital Fiction Curios. This is an immersive experience; re-imagining selected Flash-based digital fiction by the One to One Development Trust in Virtual Reality, made in collaboration with Sheffield Hallam University. Here in the Library we are interested in their playful and innovative approach to preserving the experiences of reading their digital works, and last October the project team were invited to showcase this work to British Library staff for them to try in VR.

Dreaming Methods: Digital Fiction Curios Teaser from One to One Development Trust 

On to the main NMWP awards event, like in previous years, the 2019 competition had attracted strong entries from many parts of the world. With submissions from six continents, the event’s host Jim Pope pointed out that Antarctica was the only geographic area not to have participated yet.

Congratulations to all the 2019 winners:

  • The if:book UK New Media Writing Prize, the main category, was won by Maria Ivanova and her team of volunteers: Anna Gorovaya, Alexey Logvinov, Mike Stonelake, Anton Zayceve and Ekaterina Polyakova, from Belarus for ‘The Life of Grand Duchess Elizabeth’. A stunning biographical narrative, featuring open source archive photographs and quotations from the memoirs of generous philanthropist Grand Duchess Elizabeth Feodorovna of Russia. A granddaughter of English Queen Victoria, who lived during several key events in the history of Russia: including the Russo-Japanese War, the First World War, the revolutions of 1905 and 1917.She became one of the brightest philanthropists of Russia.
  • The Future Journalism award was won by Mahmoud El Wakea’s ‘Made in Prison’, an investigation of Jihadi radicalisation in Egypt.
  • The Unicorn Training Student award was won by Kenneth Sanchez for ‘Escaping the Chaos’. An emotive portrayal of Venezuelan migrants in Peru, with video footage of individuals telling their personal stories.
  • The Dot award for 2019 went to Clare Pollard, editor of Modern Poetry in Translation, the award will enable them to digitise their magazine and to grow their magazine internationally.
The Life of Grand Duchess Elizabeth
Still image from 'The Life of Grand Duchess Elizabeth', Winner of the 2019 if:book UK New Media Writing Prize.

It was gratifying to see that Lynda Clark featured on the main prize shortlist for her work ‘The Memory Archivist’, which was made during her Innovation Placement at the British Library in 2019. Also previous Eccles Centre Fellow, J.R. Carpenter, for the hydro-graphic novel ‘The Pleasure of the Coast’, created in partnership with the Archives Nationales in Paris.

Full shortlists were: 

The 2019 if:book main prize shortlist:

 The Unicorn Student Award 2019 shortlist:

Escaping the Chaos
Still image from 'Escaping the Chaos', Winner of the 2019 Unicorn Training Student award

The Future Journalism Award 2019 shortlist for the best digital interactive journalism, awarded by Future PLC:

Made in Prison
Still image from 'Made in Prison', Winner of the 2019 Future Journalism award 

If reading this blog post is inspiring you to consider entering the Prize in 2020, please do keep your eyes peeled for their call for submissions later in the year. You can follow NMWP on twitter and Facebook. Also do check out the Competition Rules and the FAQs to make sure your creative output fits the competition's criteria. 

This post is by Digital Curator Stella Wisdom (@miss_wisdom

27 January 2020

How historians can communicate their research online

This blog post is by Jonathan Blaney (Institute of Historical Research), Frances Madden (British Library), Francesca Morselli (DANS), Jane Winters (School of Advanced Study, University of London)

This blog will be published in several other locations including the FREYA blog and the IHR blog

Large satellite receiver
Source: Joshua Hoehne, Unsplash

On 4 December 2019, the FREYA project in collaboration with UCL Centre for Digital Humanities, Institute of Historical Research, the British Library and DARIAH-EU organized a workshop in London on identifiers in research. In particular this workshop - mainly directed to historians and humanities scholars - focused on ways in which they can build and manage an online profile as researchers, using tools such as ORCID IDs. It also covered best practices and methods of citing digital resources to make humanities researchers' work connected and discoverable to others. The workshop had 20 attendees, mainly PhD students from the London area but also curators and independent researchers.

Presentations

Frances Madden from the British Library introduced the day which was supported by the FREYA project which is funded under the EU’s Horizon 2020 programme. FREYA aims to increase the use of persistent identifiers (PIDs) across the research landscape by building up services and infrastructure. The British Library is leading on the Humanities and social sciences aspect of this work.

Frances described how PIDs are central to scholarly communication becoming effective and easy online. We will need PIDs not just for publications but for grey literature, for data, for blog posts, presentations and more. This is clearly a challenge for historians to learn about and use, and the workshop is a contribution to that effort.

PIDs: some historical context

Jonathan Blaney from the Institute of Historical Research said that there is a context to citation and the persistent identifiers which have grown up around traditional forms of print citation. These are almost invisible to us because they are deeply familiar. He gave an example of a reference to the gospel story of the woman taken in adultery:

John 7:53-8:11

There are three conventions here: the name ‘John’ (attached to this gospel since about the 2nd century) the chapter divisions (medieval and ascribed to the English bishop Stephen Langton) and the verse divisions (from the middle of the 16th century).

When learning new forms of referencing, such as the ones under discussion at the workshop, Jonathan suggested that historians should remember their implicit knowledge has been learned. He finished with an anecdote about Harry Belafonte, retold in Anthony Grafton’s The Footnote: A Curious History. As a young sailor Belafonte wanted to follow up on references in a book he had read. The next time he was on shore leave he went to a library and told the librarian:

“Just give me everything you’ve got by Ibid.”

People in conference room watching a presentation

Demonstrating the benefits

Prof Jane Winters from School introduced what she claimed was her most egotistical presentation by explaining her own choices in curating her online presence and also what was beyond her control. She showed the different results of web searches for herself using Google and DuckDuckGo and pointed out how things she had almost forgotten about can still feature prominently in results.

Jane described her own use of Twitter, and highlighted both the benefits and challenges of using social media to communicate research and build an online profile. It was the relatively rigid format of her institutional staff profile that led her to create her own website. Although Jane has an ORCID ID and a page on Humanities Commons, for example, there are many online services she has chosen not to use, such as academia.edu.

This is all very much a matter of personal choice, dependent upon people’s own tastes and willingness to engage with a particular service.

How to use what’s available

Francesca Morselli from DANS gave a presentation aiming to provide useful resources about identifiers for researchers as well as explaining in a simple yet exhaustive way how they "work" and the rationale behind them.

Most importantly PIDs ensure:

  1. Citability and discoverability (both for humans and machine)
  2. Disambiguation (between similar objects)
  3. Linking to related resources
  4. Long-term archiving and findability

Francesca then introduced the support provided by projects and infrastructures: FREYA, DARIAH-EU and ORCID. Among the FREYA project pillars (PID graph, PID Commons, PID Forum), the latter is available for anyone interested in identifiers.

The DARIAH-EU infrastructure for Arts and Humanities has recently launched the DARIAH Campus platform which includes useful resources on PIDs and managing research data (i.e. all materials which are used in supporting research). In 2018 DARIAH also organized a winter school on Open Data Citation, whose resources are archived here.

Dariah

 

A Publisher’s Perspective

Kath Burton from Routledge Journals emphasised how much use publishers make of digital tools to harvest convent, including social media crawlers, data harvesters and third party feeds.

The importance of maximising your impact online when publishing was explained, both before publishing (filling in the metadata, giving a meaningful title) and afterwards (linking to the article from social media and websites), as well as how publishers can help support this.

Kath went on to give an example of Taylor & Francis’s interest in the possibilities of online scholarly communication by describing its commitment to publishing 3D models of research objects, which is does on via Sketchfab page.

Breakout Groups

After the presentations and a coffee break there were group discussions about what everyone had just heard. During the first part, the groups were asked what was new to them in the presentations. It was clear from discussions around the room that attendees had heard much which was new to them. For example, some attendees had ORCID IDs but many were surprised at the range of things for which they could be used, such as in journal articles and logging into systems. They were also struck by the range of things in which publishers were interested such as research data. Many were really interested in the use of personal websites to manage their profile.

When asked what tallied with their experiences, it became clear that they were keen to engage with these systems, setting up ORCID IDs and Humanities Commons profiles but that they felt that they were too early on in their careers to have anything to contribute to these platforms and felt they were designed for established researchers. Jane Winters stressed that one could adopt a broad approach to the term ‘publications’, including posters, presentations and blog posts and encouraged all to share what they had.

Lastly discussion turned to how the group cites digital resources. This led to an interesting conversation around the citation of archived web pages and how to cite webpages which might change over time, with tools such as the Internet Archive being mentioned. There was also discussion about whether one can cite resources such as Wikipedia and it was clear that this was not something which had been encouraged. Jonathan, who has researched this subject, mentioned that he had found established academics are happy to cite Wikipedia than those earlier in their career.

Conclusions

The workshop effectively demonstrated the sheer range of online tools, social media forums and publishing venues (both formal and informal) through which historians can communicate their research online. This is both an opportunity and a problem. It is a challenge to develop an online presence - to decide which methods are most appropriate for different kinds of research and different personalities - but that is just the first step. For research communication to be truly valuable, it is necessary to focus your effort, manage your online activities and take control of how you appear to others in digital spaces. PIDs are invaluable in achieving this, and in helping you to establish a personal research profile that stays with you as you move through your career. At the start of the day, the majority of those who attended the workshop did not know very much about PIDs and how you can put them to use, but we hope that they came away with an enhanced understanding of the issues and possibilities, the awareness that it does not take much effort or skill to make a real difference to how you are perceived online, and some practical advice about next steps.

It was apparent that, with some admirable exceptions, neither higher education institutions nor PID organisations are successfully communicating the value and importance of PIDs to early career researchers. Workshop attendees particularly welcomed the opportunity to hear from a publisher and senior academic about how PIDs are used to structure, present and disseminate academic work. The clear link between communicating research online and public engagement also emerged during the course of the day, and there is obvious potential for collaboration between PID organisations and those involved with training focused on impact and public engagement. We ended the day with lots of ideas for further advocacy and training, and a shared appreciation for the value of PIDs for helping historians to reach out to a range of different audiences online.

20 January 2020

Using Transkribus for Arabic Handwritten Text Recognition

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

In the last couple of years we’ve teamed up with PRImA Research Lab in Salford to run competitions for automating the transcription of Arabic manuscripts (RASM2018 and RASM2019), in an ongoing effort to identify good solutions for Arabic Handwritten Text Recognition (HTR).

I’ve been curious to test our Arabic materials with Transkribus – one of the leading tools for automating the recognition of historical documents. We’ve already tried it out on items from the Library’s India Office collection as well as early Bengali printed books, and we were pleased with the results. Several months ago the British Library joined the READ-COOP – the cooperative taking up the development of Transkribus – as a founding member.

As with other HTR tools, Transkribus’ HTR+ engine cannot start automatic transcription straight away, but first needs to be trained on a specific type of script and handwriting. This is achieved by creating a training dataset – a transcription of the text on each page, as accurate as possible, and a segmentation of the page into text areas and line, demarcating the exact location of the text. Training sets are therefore comprised of a set of images and an equivalent set of XML files, containing the location and transcription of the text.

A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474
A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474.

 

This process can be done in Transkribus, but in this case I already had a training set created using PRImA’s software Aletheia. I used the dataset created for the competitions mentioned above: 120 transcribed and ground-truthed pages from eight manuscripts digitised and made available through QDL. This dataset is now freely accessible through the British Library’s Research Repository.

Transkribus recommends creating a training set of at least 75 pages (between 5,000 and 15,000 words), however I was interested to find out a few things. First, the methods submitted for the RASM2019 competition worked on a training set of 20 pages, with an evaluation set of 100 pages. Therefore, I wanted to see how Transkribus’ HTR+ engine dealt with the same scenario. It should be noted that the RASM2019 methods were evaluated using PRImA’s evaluation methods, and this is not the case with Transkribus evaluation method – therefore, the results shown here are not accurately comparable, but give some idea on how Transkribus performed on the same training set.

I created four different models to see how Transkribus’ recognition algorithms deal with a growing training set. The models were created as follows:

  • Training model of 20 pages, and evaluation set of 100 pages
  • Training model of 50 pages, and evaluation set of 70 pages
  • Training model of 75 pages, and evaluation set of 45 pages
  • Training model of 100 pages, and evaluation set of 20 pages

The graphs below show each of the four iterations, from top to bottom:

CER of 26.80% for a training set of 20 pages

CER of 19.27% for a training set of 50 pages

CER of 15.10% for a training set of 75 pages

CER of 13.57% for a training set of 100 pages

The results can be summed up in a table:

Training Set (pp.)

Evaluation Set (pp.)

Character Error Rate (CER)

Character Accuracy

20

100

26.80%

73.20%

50

70

19.27%

80.73%

75

45

15.10%

84.9%

100

20

13.57%

86.43%

 

Indeed the accuracy improved with each iteration of training – the more training data the neural networks in Transkribus’ HTR+ engine have, the better the results. With a training set of a 100 pages, Transkribus managed to automatically transcribe the rest of the 20 pages with 86.43% accuracy rate – which is pretty good for historical handwritten Arabic script.

As a next step, we could consider (1) adding more ground-truthed pages from our manuscripts to increase the size of the training set, and by that improve HTR accuracy; (2) adding other open ground truth datasets of handwritten Arabic to the existing training set, and checking whether this improves HTR accuracy; and (3) running a few manuscripts from QDL through Transkribus to see how its HTR+ engine transcribes them. If accuracy is satisfactory, we could see how to scale this up and make those transcriptions openly available and easily accessible.

In the meantime, I’m looking forward to participating at the OpenITI AOCP workshop entitled “OCR and Digital Text Production: Learning from the Past, Fostering Collaboration and Coordination for the Future,” taking place at the University of Maryland next week, and catching up with colleagues on all things Arabic OCR/HTR!