Digital scholarship blog

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

18 July 2022

UK Digital Comics: More of the same but different? [1]

This is a guest post by Linda Berube, an AHRC Collaborative Doctoral Partnership student based at the British Library and City, University of London. If you would like to know more about Linda's research, please do email her at [email protected].

When I last wrote a post for the Digital Scholarship blog in 2020 (Berube, 2020), I was a fairly new PhD student, fresh out of the starting blocks, taking on the challenge of UK digital comics research.  My research involves an analysis of the systems and processes of UK digital comics publishing as a means of understanding how digital technology has affected, maybe transformed them. For this work, I have the considerable support of supervisors Ian Cooke and Stella Wisdom (British Library) and Ernesto Priego and Stephann Makri (Human-Computer Interaction Design Centre, City, University of London).

Little did I, or the rest world for that matter, know the transformations to daily life brought on by pandemic that were to come. There was no less of an impact felt in the publishing sector, and certainly in comics publishing. Still, despite all the obstacles to meetings, people from traditional[2] large and small press publishers, media and video game companies publishing comics, as well as creators and self-publishers gave generously of their time to discuss comics with me. I am currently speaking with comics readers and observing their reading practices, again all via remote meetings. To all these people, this PhD student owes a debt of gratitude for their enthusiastic participation.

British Comics Publishing: It’s where we’re at

Digital technology has had a significant impact on British comics publishing, but not as pervasively as expected from initial prognostications by scholars and the comics press. Back in 2020, I observed:

  This particular point in time offers an excellent opportunity to consider the digital comics, and specifically UK, landscape. We seem to be past the initial enthusiasm for digital technologies when babies and bathwater were ejected with abandon (see McCloud 2000, for example), and probably still in the middle of a retrenchment, so to speak, of that enthusiasm (see Priego 2011 pp278-280, for example). (Berube, 2020).

But ‘retrenchment’ might be a strong word. According to my research findings to date, and in keeping with those of the broader publishing sector (Thompson, 2010; 2021), the comics publishing process has most definitely been ‘revolutionized’ by digital technology. All comics begin life as digital files until they are published in print. Even those creators who still draw by hand must convert their work to digital versions that can be sent to a publisher or uploaded to a website or publishing platform. And, while print comics have by no means been completely supplanted by digital comics (in fact a significant number of those interviewed voiced a preference for print), reading on digital devices-laptops, tablets, smartphones-has become popular enough for publishers to provide access through ebook and app technology. Even those publishers I interviewed who were most resistant to digital felt compelled ‘to dabble in digital comics’ (according to one small press publisher) by at least providing pdf versions on Gumroad or some other storefront. The restrictions on print distribution and sales through bookstores resulting from Covid lockdown compelled some of the publishers not only to provide more access to digital versions, but some went as far to sell digital-exclusive versions, in other words comics only offered digitally.

Everywhere you look, a comic

The visibility of digital comics across sectors including health, economics, education, literacy and even the hard sciences was immediately obvious from a mapping exercise of UK comics publishers, producers and platforms as well as through interviews. What this means is that comics-the creation and reading of them-are used to teach and to learn about multiple topics, including archiving (specifically UK Legal Deposit) (Figure 1) and Anthropology (specifically Smartphones and Smart Ageing) (Figure 2):

Cartoon drawing of two people surrounded by comics and zines
Figure 1: Panel from 'The Legal Deposit and You', by Olivia Hicks (British Library, 2018). Reproduced with permission from the British Library.

 

Cartoon drawing of two women sitting on a sofa looking at and discussing content on a smartphone
Figure 2: Haapio-Kirk, L., Murariu, G., and Hahn, A. (artist) (2022) 'Beyond Anthropomorphism Palestine', Anthropology of Smartphones and Smart Ageing (ASSA) Blog. Based on Maya de Vries and Laila Abed Rabho’s research in Al-Quds (East Jerusalem). Available at: https://wwwdepts-live.ucl.ac.uk/anthropology/assa/discoveries/beyond-anthropomorphism/ . Reproduced with permission.

Moreover, comics in their incarnation as graphic novels have grabbed literary prizes, for example Jimmy Corrigan: the smartest kid on earth (Jonathan Cape, 2001) by Chris Ware won the Guardian First Book Award in 2001, and Sabrina (Granta, 2018) by Nick Drnaso was longlisted for the Man Booker Prize in 2018 (somewhat controversially, see Nally, 2018).

Just Like Reading a Book, But Not…

But by extending the definition of digital comics[3] to include graphic novels mostly produced as ebooks, the ‘same-ness” of reading in print became evident over the course of interviews with publishers and creators. Publishing a comic in pdf format, whether that be on a website, on a publishing platform, or as a book is just the easiest, most cost-effective way to do it:

  We’re print first in our digital workflow—Outside of graphic novels, with other types of books we occasionally have the opportunity to work with the digital version as a consideration at the outset, in which case the tagging/classes are a factored in at the beginning stages (a good example would be a recent straight -to-digital reflowable ebook). This is the exception though, and also does not apply to graphic novels, which are all print-led. (Interview with publisher, December 2020)

Traditional book publishers have not been the only ones taking up comics - gaming and media companies have acquired the rights to comics, comics brands previously published in print. For more and different sectors, comics increasingly have become an attractive option especially for their multimedia appeal. However, what they do with the comics is a mixture of the same, for instance being print-led as described in the above comment, and different, for example through conversion to digital interactive versions as well as providing apps with more functionality than the ebook format.

It's How You Read Them

Comics formatted especially for reading on apps, such as 2000 AD, ComiXology, and Marvel Unlimited, can be variable in the types of reading experiences they offer to readers. While some have retained the ‘multi-panel display’ experience of reading a print comic book, others have gone beyond the ‘reads like a book’ experience. ComiXology, a digital distribution platform for comics owned by Amazon, pioneered the “guided view” technology now used by the likes of Marvel and DC, where readers view one panel at a time. Some of the comics readers I have interviewed refer to this reading experience as ‘the cinematic experience’. Readers page through the comic one panel or scene at a time, yes, as if watching it on film or TV.

These reading technologies do tend to work better on a tablet than on a smartphone. The act of scrolling required to read webcomics on the WEBTOON app (and others, such as Tapas), designed to be read on smartphones, produces that same kind of ‘cinematic’ effect: readers of comics on both the ComiXology and Web Toon apps I have interviewed describe the exact same experience: the build-up of “anticipation”, “tension”,  “on the edge of my seat” as they page or scroll down to the next scene/panel. WEBTOON creators employ certain techniques in order to create that tension in the vertical format, for example the use of white space between panels: the more space, the more scrolling, the more “edge of the seat” experience. Major comics publishers have started creating ‘vertical’ (scrolling on phones) comics: Marvel launched its Infinity Comics to appeal to the smartphone webcomics reader.

So, it would seem that good old-fashioned comics pacing combined with publishing through apps designed for digital devices provide a different, but same reading experience:  a uniquely digital reading experience.

Same But Different: I’m still here

So, here I am, still a PhD student currently conducting research with comics readers, as part of my research and as part of a secondment with the BL supported by AHRC Additional Student Development funding. This additional funding has afforded me the opportunity to employ UX (user behaviour/experience) techniques with readers, primarily through conducting reading observation sessions and activities. I will be following up this blog with an update on this research as well as a call for participation into more reader research.

References 

Berube, L. (2020) ‘Not Just for Kids: UK Digital Comics, from creation to consumption’, British Library Digital Scholarship Blog”, 24 August 2020. Available at: https://blogs.bl.uk/digital-scholarship/2020/08/not-just-for-kids-uk-digital-comics-from-creation-to-consumption.html

Drnaso, N. (2018) Sabrina. London, England: Granta Books.

McCloud, Scott (2000) Reinventing Comics: How Imagination and Technology Are Revolutionizing an Art Form.  New York, N.Y: Paradox Press. 

Nally, C. (2018) ‘Graphic Novels Are Novels: Why the Booker Prize Judges Were Right to Choose One for Its Longlist’, The Conversation, 26 July. Available at: https://theconversation.com/graphic-novels-are-novels-why-the-booker-prize-judges-were-right-to-choose-one-for-its-longlist-100562.

Priego, E. (2011) The Comic Book in the Age of Digital Reproduction. [Thesis] University College London. Available at: https://doi.org/10.6084/m9.figshare.754575.v4, pp278-280.

Ware, C. (2001) Jimmy Corrigan: the smartest kid on earth. London, England: Jonathan Cape.

Notes

[1] “More of the same but different”, a phrase used by a comics creator I interviewed in reference to what comics readers want to read.↩︎

[2] By ‘traditional’, I am referring to publishers who contract with comics creators to undertake the producing, publishing, distribution, selling of a comic, retaining rights for a certain period of time and paying the creator royalties. In my research, publishers who transacted business in this way included multinational and small press publishers. Self-publishing is where the creator owns all the rights and royalties, but also performs the production, publishing, distribution work, or pays for a third-party to do so. ↩︎

[3] For this research, digital comics include a diverse selection of what is produced electronically or online: webcomics, manga, applied comics, experimental comics, as well as graphic novels [ebooks].  I have omitted animation. ↩︎

27 June 2022

IIIF-yeah! Annual Conference 2022

At the beginning of June Neil Fitzgerald, Head of Digital Research, and myself attended the annual International Image Interoperability Framework (IIIF) Showcase and Conference in Cambridge MA. The showcase was held in Massachusetts’s Institute of Technology’s iconic lecture theatre 10-250 and the conference was held in the Fong Auditorium of Boylston Hall on Harvard’s campus. There was a stillness on the MIT campus, in contrast Harvard Yard was busy with sightseeing members of the public and the dismantling of marquees from the end of year commencements in the previous weeks. 

View of the Massachusetts Institute of Technology Dome IIIF Consortium sticker reading IIIF-yeah! Conference participants outside Boylston Hall, Harvard Yard


The conference atmosphere was energising, with participants excited to be back at an in-person event, the last one being held in 2019 in Göttingen, with virtual meetings held in the meantime. During the last decade IIIF has been growing as reflected by the fast expanding community  and IIIF Consortium, which now comprises 63 organisations from across the GLAM and commercial sectors. 

The Showcase on June 6th was an opportunity to welcome those new to IIIF and highlight recent community developments. I had the pleasure of presenting the work of British Library and Zooninverse to enable new IIIF functionality on Zooniverse to support our In the Spotlight project which crowdsources information about the Library’s historical playbills collection. Other presentations covered the use of IIIF with audio, maps, and in teaching, learning and museum contexts, and the exciting plans to extend IIIF standards for 3D data. Harvard University updated on their efforts to adopt IIIF across the organisation and their IIIF resources webpage is a useful resource. I was particularly impressed by the Leventhal Map and Education Center’s digital maps initiatives, including their collaboration on Allmaps, a set of open source tools for curating, georeferencing and exploring IIIF maps (learn more).

 The following two days were packed with brilliant presentations on IIIF infrastructure, collections enrichment, IIIF resources discovery, IIIF-enabled digital humanities teaching and research, improving user experience and more. Digirati presented a new IIIF manifest editor which is being further developed to support various use cases. Ed Silverton reported on the newest features for the Exhibit tool which we at the British Library have started using to share engaging stories about our IIIF collections.

 Ed Silverton presenting a slide about the Exhibit tool Conference presenters talking about the Audiovisual Metadata Platform Conference reception under a marquee in Harvard Yard

I was interested to hear about Getty’s vision of IIIF as enabling technology, how it fits within their shared data infrastructure and their multiple use cases, including to drive image backgrounds based on colour palette annotations and the Quire publication process. It was great to hear how IIIF has been used in digital humanities research, as in the Mapping Colour in History project at Harvard which enables historical analysis of artworks though pigment data annotations, or how IIIF helps to solve some of the challenges of remote resources aggregation for the Paul Laurence Dunbar initiative.

There was also much excitement about the Detekiiif browser extension for Chrome and Firefox that detects IIIF resources in websites and helps collect and export IIIF manifests. Zentralbibliothek Zürich’s customised version ZB-detektIIIF allows scholars to create IIIF collections in JSON-LD and link to the Mirador Viewer. There were several great presentations about IIIF players and tools for audio-visual content, such as Avalon, Aviary, Clover, Audiovisual Metadata Platform and Mirador video extension. And no IIIF Conference is ever complete without a #FunWithIIIF presentation by Cogapp’s Tristan Roddis this one capturing 30 cool projects using IIIF content and technology! 

We all enjoyed lots of good conversations during the breaks and social events, and some great tours were on offer. Personally I chose to visit the Boston Public Library’s Leventhal Map and Education Centre and exhibition about environment and social justice, and BPL Digitisation studio, the latter equipped with the Internet Archive scanning stations and an impressive maps photography room.

Boston Public Library book trolleys Boston Public Library Maps Digitisation Studio Rossitza Atanassova outside Boston Pubic Library


I was also delighted to pay a visit to the Harvard Libraries digitisation team who generously showed me their imaging stations and range of digitised collections, followed by a private guided tour of the Houghton Library’s special collections and beautiful spaces. Huge thanks to all the conference organisers, the local committee, and the hosts for my visits, Christine Jacobson, Bill Comstock and David Remington. I learned a lot and had an amazing time. 

Finally, all presentations from the three days have been shared and some highlights captured on Twitter #iiif. In addition this week the Consortium is offering four free online workshops to share IIIF best practices and tools with the wider community. Don’t miss your chance to attend. 

This post is by Digital Curator Rossitza Atanassova (@RossiAtanassova)

16 June 2022

Working With Wikidata and Wikimedia Commons: Poetry Pamphlets and Lotus Sutra Manuscripts

Greetings! I’m Xiaoyan Yang, from Beijing, China, an MSc student at University College London. It was a great pleasure to have the opportunity to do a four-week placement at the British Library and Wikimedia UK under the supervision of Lucy Hinnie, Wikimedian in Residence, and Stella Wisdom, Digital Curator, Contemporary British Collections. I mainly focused on the Michael Marks Awards for Poetry Pamphlets Project and Lotus Sutra Project, and the collaboration between the Library and Wikimedia.

What interested you in applying for a placement at the Library?

This kind of placement, in world-famous cultural institutions such as the Library and Wikimedia is  a brand-new experience for me. Because my undergraduate major is economic statistics, most of my internships in the past were in commercial and Internet technology companies. The driving force of my interest in digital humanities research, especially related data, knowledge graph, and visualization, is to better combine information technologies with cultural resources, in order to reach a wider audience, and promote the transmission of cultural and historical memory in a more accessible way.

Libraries are institutions for the preservation and dissemination of knowledge for the public, and the British Library is one of the largest and best libraries in the world without doubt. It has long been a leader and innovator in resource protection and digitization. The International Dunhuang Project (IDP) initiated by the British Library is now one of the most representative transnational collaborative projects of digital humanistic resources in the field. I applied for a placement opportunity hoping to learn more about the usage of digital resources in real projects and the process of collaboration from the initial design to the following arrangement. I also wanted  to have the chance to get involved in the practice of linked data, to accumulate experience, and find the direction of future improvements.

I would like to thank Dr Adi Keinan-Schoonbaert for her kind introduction to the British Library's Asian and African Digitization projects, especially the IDP, which has enabled me to learn more about the librarian-led practices in this area. At the same time, I was very happy to sit in on the weekly meetings of the Digital Scholarship Team during this placement, which allowed me to observe how collaboration between different departments are carried out and managed in a large cultural resource organization like the British Library.

Excerpt from Lotus Sutra Or.8210 S.155. An old scroll of parchment showing vertical lines of older Chinese script.
Excerpt from Lotus Sutra Or.8210 S.155. Kumārajīva, CC BY 4.0, via Wikimedia Commons

What is the most surprising thing you have learned?

In short, it is so easy to contribute knowledge at Wikimedia. In this placement, one of my very first tasks was to upload information about winning and shortlisted poems of the Michael Marks Awards for Poetry Pamphlets for each year from 2009 to the latest, 2021, to Wikidata. The first step was to check whether this poem and its author and publisher already existed in Wikidata. If not, I created an item page for it. Before I started, I thought the process would be very complicated, but after I started following the manual, I found it was actually really easy. I just need to click "Create a new Item". 

I always remember that the first item of people that I created was Sarah Jackson, one of the shortlist winners of this award in 2009. The unique QID was automatically generated as Q111940266. With such a simple operation, anyone can contribute to the vast knowledge world of Wiki. Many people who I have never met may read this item page  in the future, a page created and perfected by me at this moment. This feeling is magical and full of achievement for me. Also, there are many useful guides, examples and batch loading tools such as Quickstatements that help the users to start editing with joy. Useful guides include the Wikidata help pages for Quickstatements and material from the University of Edinburgh.

Image of a Wikimedia SPARQL query to determine a list of information about the Michael Marks Poetry Pamphlet uploads.
An example of one of Xiaoyan’s queries - you can try it here!

How do you hope to use your skills going forward?

My current dissertation research focuses on the regional classic Chinese poetry in the Hexi Corridor. This particular geographical area is deeply bound up with the Silk Road in history and has inspired and attracted many poets to visit and write. My project aims to build a proper ontology and knowledge map, then combining with GIS visualization display and text analysis, to explore the historical, geographic, political and cultural changes in this area, from the perspective of time and space. Wikidata provides a standard way to undertake this work. 

Thanks to Dr Martin Poulter’s wonderful training and Stuart Prior’s kind instructions, I quickly picked up some practical skills on Wiki queries construction. The layout design of the timeline and geographical visualization tools offered by Wiki query inspired me to improve my skills in this field more in the future. What’s more, although I haven’t had a chance to experience Wikibase yet, I am very interested in it now, thanks to Dr Lucy Hinnie and Dr Graham Jevon’s introduction, I will definitely try it in future.

Would you like to share some Wiki advice with us?

Wiki is very self-learning friendly: on the Help page various manuals and examples are presented, all of which are very good learning resources. I will keep learning and exploring in the future.

I do want to share my feelings and a little experience with Wikidata. In the Michael Marks Awards for Poetry Pamphlets Project, all the properties used to describe poets, poems and publishers can be easily found in the existing Wikidata property list. However, in the second Lotus Sutra Project, I encountered more difficulties. For example, it is difficult to find suitable items and properties to represent paragraphs of scrolls’ text content and binding design on Wikidata, and this information is more suitable to be represented on WikiCommons at present.

However, as I learn more and more other Wikidata examples, I understand more and more about Wikidata and the purpose of these restrictions. Maintaining concise structured data and accurate correlation is one of the main purposes of Wikidata. It encourages reuse of existing properties as well as imposing more qualifications on long text descriptions. Therefore, this feature of Wikidata needs to be taken into account from the outset when designing metadata frameworks for data uploading.

In the end, I would like to sincerely thank my direct supervisor Lucy for her kind guidance, help, encouragement and affirmation, as well as the British Library and Wikimedia platform. I have received so much warm help and gained so much valuable practical experience, and I am also very happy and honored that by using my knowledge and technology I can make a small contribution to linked data. I will always cherish the wonderful memories here and continue to explore the potential of digital humanities in the future.

This post is by Xiaoyan Yang, an MSc student at University College London, and was edited by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian) and Digital Curator Stella Wisdom (@miss_wisdom).

23 May 2022

Picture Perfect Platinum Jubilee Puddings on Wikimedia Commons

2022 is the year of the UK’s first ever Platinum Jubilee. Queen Elizabeth II is the first monarch in British history to serve for over 70 years, and the UK is getting ready to celebrate! Here at the Library we are doing a number of things to celebrate. The UK Web Archive is inviting nominations for websites to be archived to a special Jubilee collection that will commemorate the event. You can read more about their project here and here, and nominate websites using this online form.

Inspired by Fortnum & Mason's Platinum Jubilee Pudding Competition, in Digital Scholarship we are encouraging you to upload images of your celebratory puddings and food to Wikimedia Commons.

Queen Elizabeth II in 1953, pictured wearing a tiara and smiling broadly. The image is black and white.
Queen Elizabeth II in 1953. Image from Associated Press, Public domain, via Wikimedia Commons

Wikimedia Commons is a collection of freely usable images that anyone can edit. We have created a simple set of Jubilee guidelines to help you upload your images: you can view and download it here. The most important thing to know about Commons is that everything you upload is then available under a Creative Commons license which allows it to be used, for free, by anyone in the world. The next time someone in Australia searches for a trifle, it may be yours they find! 

You may be asking yourself what you should upload. You could have a look at specific Wikipedia entries for types of pudding or cake. Wikipedia images come from Commons, so if you spot something missing, you can upload your image and it can then be used in the Wikipedia entry. You might want to think regionally, making barmbrack from Ireland, Welsh cakes, Scottish cranachan or parkin from northern England. If you’re feeling adventurous, why not crack out the lemon and amaretti and try your hand at the official Jubilee pudding?

How to make your images platinum quality:

  • Make sure your images are clear, not blurry.
  • Make sure they are high resolution: most phone cameras are now very powerful, but if you have a knack for photography, a real camera may come in useful.
  • Keep your background clear, and make sure the image is colourful and well-lit.
  • Ask yourself if it looks like pudding – sometimes an image that is too close up can be indistinct.
Image of a white cake with jigsaw shaped white icing, representing the Wikipedia logo.
Image of cake courtesy of Ainali, CC BY-SA 4.0, via Wikimedia Commons

NB: Please add the category 'British Library Platinum Pudding Drive' to your uploads. You can see instructions on how to add categories here. from 3.19 onwards.

We can’t wait to see your images. The Wikimedia Foundation recently ran a series of events for Image Description Week – check out their resources to help and support your uploads, making sure that you are describing your images in an accessible way. Remember to nominate any websites you’d like to see archived at the UK Web Archive, and if your local library is part of the Living Knowledge Network, keep an eye out for our commemorative postcards, which contain links to both the Web Archive drive and our Commons instructions.

We have events running at the Library to celebrate the Jubilee, such as the Platinum Jubilee Pudding at St Pancras on Monday 23rd May, and A Queen For All Seasons on Thursday 26th May. There is also a fantastic Food Season running until the end of May, with a wide array of events and talks. You can book tickets for upcoming events via the Events page.

Happy Jubilee!

20 April 2022

Importing images into Zooniverse with a IIIF manifest: introducing an experimental feature

Digital Curator Dr Mia Ridge shares news from a collaboration between the British Library and Zooniverse that means you can more easily create crowdsourcing projects with cultural heritage collections. There's a related blog post on Zooniverse, Fun with IIIF.

IIIF manifests - text files that tell software how to display images, sound or video files alongside metadata and other information about them - might not sound exciting, but by linking to them, you can view and annotate collections from around the world. The IIIF (International Image Interoperability Framework) standard makes images (or audio, video or 3D files) more re-usable - they can be displayed on another site alongside the original metadata and information provided by the source institution. If an institution updates a manifest - perhaps adding information from updated cataloguing or crowdsourcing - any sites that display that image automatically gets the updated metadata.

Playbill showing the title after other large text
Playbill showing the title after other large text

We've posted before about how we used IIIF manifests as the basis for our In the Spotlight crowdsourced tasks on LibCrowds.com. Playbills are great candidates for crowdsourcing because they are hard to transcribe automatically, and the layout and information present varies a lot. Using IIIF meant that we could access images of playbills directly from the British Library servers without needing server space and extra processing to make local copies. You didn't need technical knowledge to copy a manifest address and add a new volume of playbills to In the Spotlight. This worked well for a couple of years, but over time we'd found it difficult to maintain bespoke software for LibCrowds.

When we started looking for alternatives, the Zooniverse platform was an obvious option. Zooniverse hosts dozens of historical or cultural heritage projects, and hundreds of citizen science projects. It has millions of volunteers, and a 'project builder' that means anyone can create a crowdsourcing project - for free! We'd already started using Zooniverse for other Library crowdsourcing projects such as Living with Machines, which showed us how powerful the platform can be for reaching potential volunteers. 

But that experience also showed us how complicated the process of getting images and metadata onto Zooniverse could be. Using Zooniverse for volumes of playbills for In the Spotlight would require some specialist knowledge. We'd need to download images from our servers, resize them, generate a 'manifest' list of images and metadata, then upload it all to Zooniverse; and repeat that for each of the dozens of volumes of digitised playbills.

Fast forward to summer 2021, when we had the opportunity to put a small amount of funding into some development work by Zooniverse. I'd already collaborated with Sam Blickhan at Zooniverse on the Collective Wisdom project, so it was easy to drop her a line and ask if they had any plans or interest in supporting IIIF. It turns out they had, but hadn't had the resources or an interested organisation necessary before.

We came up with a brief outline of what the work needed to do, taking the ability to recreate some of the functionality of In the Spotlight on Zooniverse as a goal. Therefore, 'the ability to add subject sets via IIIF manifest links' was key. ('Subject set' is Zooniverse-speak for 'set of images or other media' that are the basis of crowdsourcing tasks.) And of course we wanted the ability to set up some crowdsourcing tasks with those items… The Zooniverse developer, Jim O'Donnell, shared his work in progress on GitHub, and I was very easily able to set up a test project and ask people to help create sample data for further testing. 

If you have a Zooniverse project and a IIIF address to hand, you can try out the import for yourself: add 'subject-sets/iiif?env=production' to your project builder URL. e.g. if your project is number #xxx then the URL to access the IIIF manifest import would be https://www.zooniverse.org/lab/xxx/subject-sets/iiif?env=production

Paste a manifest URL into the box. The platform parses the file to present a list of metadata fields, which you can flag as hidden or visible in the subject viewer (public task interface). When you're happy, you can click a button to upload the manifest as a new subject set (like a folder of items), and your images are imported. (Don't worry if it says '0 subjects).

 

Screenshot of manifest import screen
Screenshot of manifest import screen

You can try out our live task and help create real data for testing ingest processes at ​​https://frontend.preview.zooniverse.org/projects/bldigital/in-the-spotlight/classify

This is a very brief introduction, with more to come on managing data exports and IIIF annotations once you've set up, tested and launched a crowdsourced workflow (task). We'd love to hear from you - how might this be useful? What issues do you foresee? How might you want to expand or build on this functionality? Email [email protected] or tweet @mia_out @LibCrowds. You can also comment on GitHub https://github.com/zooniverse/Panoptes-Front-End/pull/6095 or https://github.com/zooniverse/iiif-annotations

Digital work in libraries is always collaborative, so I'd like to thank British Library colleagues in Finance, Procurement, Technology, Collection Metadata Services and various Collections departments; the Zooniverse volunteers who helped test our first task and of course the Zooniverse team, especially Sam, Jim and Chris for their work on this.

 

12 April 2022

Making British Library collections (even) more accessible

Daniel van Strien, Digital Curator, Living with Machines, writes:

The British Library’s digital scholarship department has made many digitised materials available to researchers. This includes a collection of digitised books created by the British Library in partnership with Microsoft. This is a collection of books that have been digitised and processed using Optical Character Recognition (OCR) software to make the text machine-readable. There is also a collection of books digitised in partnership with Google. 

Since being digitised, this collection of digitised books has been used for many different projects. This includes recent work to try and augment this dataset with genre metadata and a project using machine learning to tag images extracted from the books. The books have also served as training data for a historic language model.

This blog post will focus on two challenges of working with this dataset: size and documentation, and discuss how we’ve experimented with one potential approach to addressing these challenges. 

One of the challenges of working with this collection is its size. The OCR output is over 20GB. This poses some challenges for researchers and other interested users wanting to work with these collections. Projects like Living with Machines are one avenue in which the British Library seeks to develop new methods for working at scale. For an individual researcher, one of the possible barriers to working with a collection like this is the computational resources required to process it. 

Recently we have been experimenting with a Python library, datasets, to see if this can help make this collection easier to work with. The datasets library is part of the Hugging Face ecosystem. If you have been following developments in machine learning, you have probably heard of Hugging Face already. If not, Hugging Face is a delightfully named company focusing on developing open-source tools aimed at democratising machine learning. 

The datasets library is a tool aiming to make it easier for researchers to share and process large datasets for machine learning efficiently. Whilst this was the library’s original focus, there may also be other uses cases for which the datasets library may help make datasets held by the British Library more accessible. 

Some features of the datasets library:

  • Tools for efficiently processing large datasets 
  • Support for easily sharing datasets via a ‘dataset hub’ 
  • Support for documenting datasets hosted on the hub (more on this later). 

As a result of these and other features, we have recently worked on adding the British Library books dataset library to the Hugging Face hub. Making the dataset available via the datasets library has now made the dataset more accessible in a few different ways.

Firstly, it is now possible to download the dataset in two lines of Python code: 

Image of a line of code: "from datasets import load_dataset ds = load_dataset('blbooks', '1700_1799')"

We can also use the Hugging Face library to process large datasets. For example, we only want to include data with a high OCR confidence score (this partially helps filter out text with many OCR errors): 

Image of a line of code: "ds.filter(lambda example: example['mean_wc_ocr'] > 0.9)"

One of the particularly nice features here is that the library uses memory mapping to store the dataset under the hood. This means that you can process data that is larger than the RAM you have available on your machine. This can make the process of working with large datasets more accessible. We could also use this as a first step in processing data before getting back to more familiar tools like pandas. 

Image of a line of code: "dogs_data = ds['train'].filter(lamda example: "dog" in example['text'].lower()) df = dogs_data_to_pandas()

In a follow on blog post, we’ll dig into the technical details of datasets in some more detail. Whilst making the technical processing of datasets more accessible is one part of the puzzle, there are also non-technical challenges to making a dataset more usable. 

 

Documenting datasets 

One of the challenges of sharing large datasets is documenting the data effectively. Traditionally libraries have mainly focused on describing material at the ‘item level,’ i.e. documenting one dataset at a time. However, there is a difference between documenting one book and 100,000 books. There are no easy answers to this, but libraries could explore one possible avenue by using Datasheets. Timnit Gebru et al. proposed the idea of Datasheets in ‘Datasheets for Datasets’. A datasheet aims to provide a structured format for describing a dataset. This includes questions like how and why it was constructed, what the data consists of, and how it could potentially be used. Crucially, datasheets also encourage a discussion of the bias and limitations of a dataset. Whilst you can identify some of these limitations by working with the data, there is also a crucial amount of information known by curators of the data that might not be obvious to end-users of the data. Datasheets offer one possible way for libraries to begin more systematically commuting this information. 

The dataset hub adopts the practice of writing datasheets and encourages users of the hub to write a datasheet for their dataset. For the British library books, we have attempted to write one of these datacards. Whilst it is certainly not perfect, it hopefully begins to outline some of the challenges of this dataset and gives end-users a better sense of how they should approach a dataset. 

18 March 2022

Looking back at LibCrowds: surveying our participants

'In the Spotlight' is a crowdsourcing project from the British Library that aims to make digitised historical playbills more discoverable, while also encouraging people to closely engage with this otherwise less accessible collection. Digital Curator Dr Mia Ridge writes...

If you follow our @LibCrowds account on twitter, you might have noticed that we've been working on refreshed versions of our In the Spotlight tasks on Zooniverse. That's part of a small project to enable the use of IIIF manifests on Zooniverse - in everyday language, it means that many, many more digitised items can form the basis of crowdsourcing tasks in the Zooniverse Project Builder, and In the Spotlight is the first project to use this new feature. Along with colleagues in Printed Heritage and BL Labs, I've been looking at our original Pybossa-based LibCrowds site to plan a 'graceful ending' for first phase of the project on LibCrowds.com.

As part of our work documenting and archiving the original LibCrowds site, I'm delighted to share summary results from a 2018 survey of In the Spotlight participants, now published on the British library's Research Repository: https://doi.org/10.23636/w4ee-yc34. Our thanks go to Susan Knight, Customer Insight Coordinator, for her help with the survey.

The survey was designed to help us understand who In the Spotlight participants were, and to help us prioritise work on the project. The 22 question survey was based on earlier surveys run by the Galaxy Zoo and Art UK Tagger projects, to allow comparison with other crowdsourcing projects, and to contribute to our understanding of crowdsourcing in cultural heritage more broadly. It was open to anyone who had contributed to the British Library's In the Spotlight project for historical playbills. The survey was distributed to LibCrowds newsletter subscribers, on the LibCrowds community forum and on social media.

Some headline findings from our survey include:

  • Respondents were most likely to be a woman with a Masters degree, in full-time employment, in London or Southeast UK, who contributes in a break between other tasks or 'whenever they have spare time'.
  • 76% of respondents were motivated by contributing to historical or performance research

Responses to the question 'What was it about this project which caused you to spend more time than intended on it?':

  • Easy to do
  • It's so entertaining
  • Every time an entry is completed you are presented with another item which is interesting and
  • illuminating which provides a continuous temptation regarding what you might discover next
  • simplicity
  • A bit of competitiveness about the top ten contributors but also about contributing something useful
  • I just got carried away with the fun
  • It's so easy to complete
  • Easy to want to do just a few more
  • Addiction
  • Felt I could get through more tasks
  • Just getting engrossed
  • It can be a bit addictive!
  • It's so easy to do that it's very easy to get carried away.
  • interested in the [material]

The summary report contains more rich detail, so go check it out!

 

Crowdsourcing projects from the British Library. 2,969 Volunteers. 265,648 Contributions. 175 Projects
Detail of the front page of libcrowds.com; Crowdsourcing projects from the British Library. 2,969 Volunteers. 265,648 Contributions. 175 Projects

16 March 2022

Getting Ready for Black Theatre and the Archive: Making Women Visible, 1900-1950

Following on from last week’s post, have you signed up for our Wikithon already? If you are interested in Black theatre history and making women visible, and want to learn how to edit Wikipedia, please do join us online, on Monday 28th March, from 10am to 1.30pm BST, over Zoom.

Remember the first step is to book your place here, via Eventbrite.

Finding Sources in The British Newspaper Archive

We are grateful to the British Newspaper Archive and Findmypast for granting our participants access to their resources on the day of the event. If you’d like to learn more about this Archive beforehand, there are some handy guides to how to do this below.

Front page of the British Newspaper Archive website, showing the search bar and advertising Findmypast.
The British Newspaper Archive Homepage

I used a quick British Newspaper Archive Search to look for information on Una Marson, a playwright and artist whose work is very important in the timeframe of this Wikithon (1900-1950). As you can see, there were over 1000 results. I was able to view images of Una at gallery openings, art exhibitions and read all about her work.

Page of search results on the British Newspaper Archive, looking for articles about Una Marson.
A page of results for Una Marson on the British Newspaper Archive

Findmypast focuses more on legal records of people, living and dead. It’s a dream website for genealogists and those interested in social history. They’ve recently uploaded the results of the 1921 census, so there is a lot of material about people’s lives in the early 20th century.

Image of the landing page for the 1921 Census of England and Wales on Findmypast.
The Findmypast 1921 Census Homepage.

 

Here’s how to get started with Findmypast in 15 minutes, using a series of ‘how to’ videos. This handy blog post offers a beginner's guide on how to search Findmypast's family records, and you can always use  Findmypast’s help centre to seek answers to frequently asked questions.

Wikipedia Preparation

If you’d like to get a head start, you can download and read our handy guide to setting up your Wikipedia account, which you can access  here. There is also advice available on creating your account, Wikipedia's username policy and how to create your user page.

The Wikipedia logo, a white globe made of jigsaw pieces with letters and symbols on them in black.
The Wikipedia Logo, Nohat (concept by Paullusmagnus), CC BY-SA 3.0, via Wikimedia Commons

Once you have done that, or if you already have a Wikipedia account, please join our event dashboard and go through the introductory exercises, which cover:

  • Wikipedia Essentials
  • Editing Basics
  • Evaluating Articles and Sources
  • Contributing Images and Media Files
  • Sandboxes and Mainspace
  • Sources and Citations
  • Plagiarism

These are all short exercises that will help familiarise you with Wikipedia and its processes. Don’t have time to do them? We get it, and that’s totally fine - we’ll cover the basics on the day too!

You may want to verify your Wikipedia account - this function exists to make sure that people are contributing responsibly to Wikipedia. The easiest and swiftest way to verify your account is to do 10 small edits. You could do this by correcting typos or adding in missing dates. However, another way to do this is to find articles where citations are needed, and add them via Citation Hunt. For further information on adding citations, watching this video may be useful.

Happier with an asynchronous approach?

If you cannot join the Zoom event on Monday 28th March, but would like to contribute, please do check out and sign up to our dashboard. The online dashboard training exercises will be an excellent starting point. From there, all of your edits and contributions will be registered, and you can be proud of yourself for making the world of Wikipedia a better place, in your own time.

This post is by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian).