Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

30 August 2019

Using Transkribus for automated text recognition of historical Bengali Books

In this post Tom Derrick, Digital Curator, Two Centuries of Indian Print, explains the Library's recent use of Transkribus for automated text recognition of Bengali printed books.

Are you working with digitised printed collections that you want to 'unlock' for keyword search and text mining? Maybe you have already heard about Transkribus but thought it could only be used for automated recognition of handwritten texts. If so you might be surprised to hear it also does a pretty good job with printed texts too. You might be even more surprised to hear it does an impressive job with printed texts in Indian scripts! At least that is what we have found from recent testing with a batch of 19th century printed books written in Bengali script that have been digitised through the British Library’s Two Centuries of Indian Print project.

Transkribus is a READ project and available as a free tool for users who want to automate recognition of historical documents. The British Library has already had some success using Transkribus on manuscripts from our India Office collection, and it was that which inspired me to see how it would perform on the Bengali texts, which provides an altogether different type of challenge.

For a start, most text recognition solutions either do not support Indian scripts, or do not reach close to the same level of recognition as they do with documents written in English or other Latin scripts. In part this is down to supply and demand. Mainstream providers of tools have prioritised Western customers, yet there is also the relative lack of digitised Indian texts that can be used to train text recognition engines.

These text recognition engines have also been well trained on modern dictionaries and a collection of historical texts like the Bengali books will often contain words which are no longer in use. Their aged physicality also brings with it the delights of faded print, blotchy paper and other paper-based gremlins that keeps conservationists in work yet disrupts automated text recognition. Throw in an extensive alphabet that contains more diverse and complicated character forms than English and you can start to piece together how difficult it can be to train recognition engines to achieve comparable results with Bengali texts.

So it was with more with hope than expectation I approached Transkribus. We began by selecting 50 pages from the Bengali books representing the variety of typographical and layout styles within the wider collection of c. 500,000 pages as much as possible. Not an easy task! We uploaded these to Transkribus, manually segmenting paragraphs into text regions and automating line recognition. We then manually transcribed the texts to create a ground truth which, together with the scanned page images, were used to train the recurrent neural network within Transkribus to create a model for the 5,700 transcribed words.

Screenshot of a page from one of the British Library's Bengali books within the Transkribus viewer showing segmentation of the page by green bounding boxes around paragraphs and underlined text lines. Typed transcriptions of the text are shown below the page image                               Screenshot of a page from one of the British Library's Bengali books within the Transkribus viewer showing segmentation of the page by green bounding boxes around paragraphs and underlined text lines. Typed transcriptions of the text are shown below the page image. 

The model was tested on a few pages from the wider collection and the results clearly communicated via the graph below. The model achieved an average character error rate (CER) of 21.9%, which is comparable to the best results we have seen from other text recognition services. Word accuracy of 61% was based on the number of words that were misspelled in the automated transcription compared to the ground truth. Eventually we would like to use automated transcriptions to support keyword searching of the Bengali books online and the higher the word accuracy increases the chances of users pulling back all relevant hits from their keyword search. We noticed the results often missed the upper zone of certain Bengali characters, i.e. the part of the character or glyph which resides above the matra line that connects characters in Bengali words. Further training focused on recognition of these characters may improve the results.

Screenshot of a graph showing the learning curve of the Bengali model using the Transkribus HTR tool which achieved 21.91% character error rateScreenshot of a graph showing the learning curve of the Bengali model using the Transkribus HTR tool which achieved 21.91% character error rate      

Our training set of 50 pages is very small compared to other projects using Transkribus and so we think the accuracy could be vastly improved by creating more transcriptions and re-training the model. However, we're happy with these initial results and would encourage others in a similar position to give Transkribus a try.

 

 

21 August 2019

Chevening British Library Fellowship working with Chinese historical texts

Chevening is the UK government’s international awards programme aimed at developing global leaders. In 2015, the Foreign and Commonwealth Office (FCO) has partnered with the British Library to offer professionals two new fellowships every year. These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as Nationalism, Independence, and Partition in South Asia, 1900-1950 and Big Data and Libraries.

We are thrilled to announce that one of the two placements available for the 2020/2021 academic year will focus on automating the recognition of historical Chinese handwritten texts. This is a special opportunity to work in the Library’s Digital Scholarship Department, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this Fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library
Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it’s important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western language print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, a project called “Living with Machines,” is underway to apply Optical Character Recognition (OCR) to UK newspapers, design and implement new methods in data science and artificial intelligence, and analyse these materials at scale.

Taking a broader perspective on Library collections, we have started to explore opportunities with non-Latin collections too. Members of the Digital Scholarship team are engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. Another initiative which Tom is engaged with is exploring Transkribus for Bengali printed texts. He trained Transkribus’ HTR+ recognition engine, which ended up transcribing this material at 94% character accuracy! Tom and Adi’s recent blog post in EuropeanaTech Insight (issue on OCR) summarises these initiatives.

Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.
Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2020 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” Fellow page: https://www.chevening.org/fellowship/british-library-chinese-handwritten-texts/.

Applications close at 12pm (GMT), 5 November 2019. Good luck!

 

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

20 August 2019

Innovation Labs and the digital divide

Guest posting by Milena Dobreva-McPherson, Associate Professor Library and Information Studies UCL Qatar with contributions from Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) and Fidelity Phiri, Visiting Researcher, UCL Qatar.

Can you recall seeing an interesting digital cultural heritage object from Zambia lately? If you search the Europeana Collections portal, you will find some 2500 digital objects coming from European heritage institutions. Alongside these items, you can enjoy the sound recording of a grunting and splashing Hippopotamus captured on 2 July 1985 on Luangwa river in Zambia. This object was aggregated from the British Library’s sound collection

Digitisation efforts of various Zambian institutions date back to 2002; for example, at the National Archives of Zambia (which does not have its own website at the time of writing this post), finding digital content originating from Zambian institutions is currently a challenge, unless you are visiting these institutions in person. One possible reason is that institutions in Zambia digitise for the purposes of internal collection management, preservation, and on-site use, like many other organisations. A rare exception is the digitised collection of the records of the United National Independence Party (UNIP) of Zambia, which was created in 2007 in collaboration with the Endangered Archives Programme of the British Library. While it cannot be accessed on any Zambian digital platform, it is available on the website of the British Library.

Is this situation (of very little accessible digital material online in the archives) common for all cultural sectors? Let us have a look at museums. In this domain, the Livingstone Museum was the first to carry out digitisation activities in 2009. The National Museum Board of Zambia, an umbrella organisation for 5 national and 2 community museums, also has an online presence with digitised images. However, trying to explore the Photo gallery or Audio/video files in the Multimedia section on the website returns the ominous 404 Page not found error although the Board definitely has plenty of objects to share. 

Certainly, one could argue that the poor institutional online digital presence is to be expected in a country within the Global South where a digital divide still exists.  After all, even finding data to assess the scale of this digital divide is a challenge, and the body of publications on digital divide in Africa had been quite limited with some 100 identified works over 12-year period (2000-2012). There is also a lack of recent estimates on the state of technological use in museums. Back in 2002, Lorna Abungu suggested that "[a]t present, out of 357 known museums throughout the African continent (including the Indian Ocean islands), only seventy-five have – on an institutional level – at least basic Internet access for e-mail." 

And, while tackling the digital divide is one of the big challenges of the Global South, when we look at it specifically from the digital cultural heritage perspective it has a global effect. Those within the divide are not able to use modern information and communication technologies to their full advantage. This is one of the reasons digitisation is either delayed or caters only for on-site use in Zambia, for example. But for those on the other side of the divide it results in impaired access to the digital heritage currently being accumulated in the regions affected by the digital divide. This is why the users searching for the sounds of hippopotamus splashing will have a chance to discover them only if they are deposited in a collection on the other side of the divide. 

To foster a change within this current situation of a lack of accessibility to the digital cultural heritage of Zambia, UCL Qatar joined forces with the National Museums Board of Zambia to deliver a day-long workshop on Innovation Labs in Cultural Heritage Institutions which was hosted on 1 August, 2019 by the Livingstone Museum. You can read more about this event , in a 'Reflections from the First Sub-Saharan African Workshop on Digital Innovation Labs in Cultural Heritage Institutions' blog post.

Fig. 1. After discussing how to overcome some of the disadvantages of the digital divide: Participants in the Innovation Labs in Cultural Heritage Institutions which was hosted on 1 August, 2019 by the Livingstone Museum
Fig. 1. After discussing how to overcome some of the disadvantages of the digital divide:
Participants in the Innovation Labs in Cultural Heritage Institutions which was hosted on 1 August, 2019 by the Livingstone Museum

There was a clear message from Mahendra Mahey, of British Library Labs that innovation in user engagement can start small, with the use of open source tools and popular web platforms. This event provided useful insights on the questions newcomers to the Innovation Lab community have to ask. In September, a Book Sprint to develop the first guide for setting up, running and maintaining a Digital Cultural Heritage Innovation Labs will be held in Doha, Qatar. 

Here are some of these interesting questions for the wider labs community:

  • Keeping in mind how the level of technological innovation is different on both sides of the divide; what should an innovation lab within the divide offer? Incremental innovation to the state of technology around or advanced innovation to match the global leaders?
  • How much can open platforms support innovation for these labs?
  • Can the route of using predominantly open tools and platforms for innovation labs be used also as a way to enhance open science in the Global South? 

Until a shift in the digital access happens, we will continue browsing some digital content on Zambian heritage coming from other cultural heritage organisations outside Zambia, beyond the digital divide.

Dr Milena Dobreva-McPherson, Associate Professor Library and Information Studies at UCL Qatar Dr Milena Dobreva-McPherson, is Associate Professor Library and Information Studies at UCL Qatar with international experience of working in Bulgaria, Scotland and Malta. Since graduating M.Sc. (Hons) in Informatics in 1991, Milena specialized in digital humanities and digital cultural heritage in the Bulgarian Academy of Sciences, where she earned her PhD in 1999 in Informatics and Applied Mathematics and served as the Founding Head of the first Digitisation Centre in Bulgaria (2004); she was also a member of the Executive Board of the National Commission of UNESCO. Milena’s research interests are in the areas of innovation diffusion in the cultural heritage sector; citizen science; and users of digital libraries. Milena is a member of the editorial board of the IFLA Journal - Sage, and of the International Journal on Digital Libraries (IJDL) - Springer and a member of the steering committed of the three biggest conference series in digital libraries, IJDL, TPDL and ICADL. Consultant of the Europeana Task Force on Research Requirements.  

Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA). He holds a Master’s Degree in Information Science from China. In addition, Mr. Bwalya has received training in India and Belgium in Library Automation with Free and Open Source Library Management Systems such as Koha and ABCD. His research interests include free and open source library management systems; open access publishing; database systems; web development; records management; cataloguing and classification.

Fidelity Phiri, Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar Fidelity Phiri is currently employed as Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar. He has worked for National Museums Board of Zambia since 2001. He  holds a Bachelor's degree in Library and Information Science from the University of Zambia. Fidelity  also graduated in April 2019 from UCL Qatar and  is a holder of a Master’s degree in Library and Information studies. His research interests are in bibliometrics studies and digital humanities/units  that provide access to digital collections.

Acknowledgements: We would like to thank Fred Nyambe for the photos and Dania Jalees for the editing.

Reflections from the First Sub-Saharan African Workshop on Digital Innovation Labs in Cultural Heritage Institutions

Guest posting by Milena Dobreva-McPherson, Associate Professor Library and Information Studies UCL Qatar with contributions from Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) and Fidelity Phiri, Visiting Researcher, UCL Qatar.

Recently UCL Qatar joined forces with the National Museums Board of Zambia to deliver a day-long workshop on Innovation Labs in Cultural Heritage Institutions which was hosted on 1 August, 2019 by the Livingstone Museum, Zambia. This workshop was the first of its kind in Sub Saharan Africa and was made possible with the support of the Africa and the Middle East Teaching Fund of the UCL Global Engagement Office. Initially planned for 15 professionals from the cultural heritage sector, it attracted 27 participants (see Fig. 1) coming from six towns located in four out of the ten provinces in Zambia (see map).

Fig. 1.  Participants by sector and gender in the First Sub Saharan Workshop on Innovation Labs in Cultural Heritage Institutions in Zambia, 1‌ August 2019
Fig. 1.  Participants by sector and gender in the First Sub Saharan Workshop on Innovation Labs in Cultural Heritage Institutions in Zambia, 1‌ August 2019

After two vibrant events about Digital Innovation Labs in Cultural Heritage organisations, this was the first event bringing together a higher proportion of participants from museums and archives in addition to the libraries represented. The Building Library Labs event was the first of its kind ever held at the British Library in September 2018, followed by a second workshop in Copenhagen (March, 2019); both attracted mostly library professionals though there were a few attendees from Archives, Galleries and Museums.  

The Innovation Labs emerged as specialised library units supporting a variety of users in experimenting with digital content in the mid 2000s. However, engaging users with digital content is equally important for museums, archives and galleries. And the exchange of institutional experience across the digital cultural heritage sector is essential for professionals who work there, especially when the number of Innovation Labs around the world is growing steadily. The presenters at the event in Zambia included Milena Dobreva-McPherson, UCL Qatar, Fidelity Phiri, Mr Tuesday Bwalya, University of Zambia, Mr Fred Nyambe (Registrar of Collections, Livingstone Museum) and Mr Brian Mwale, (Chief Librarian, National Archives of Zambia). Fiona Clancy (Digitisation Workflow Manager, British Library), Mahendra Mahey (BL Labs Manager, British Library), and Somia Salim, who is an MA student in Library and Information Studies at UCL Qatar, also contributed online (see full programme with links to some of the presentations).

The call for innovation in the heritage sector was clearly communicated in the welcome address delivered on behalf of the Livingstone district acting commissioner Harriet Kawina; this had been duly reported in several publications in Zambian national newspapers (see for an example Fig.2).

Fig. 2. Article on the event in the MAST independent newspaper, 5.08.2019
Fig. 2. Article on the event in the MAST independent newspaper, 5 August 2019

The mixture of presentations discussing the current trends in user engagement with digital content and local examples of digitisation projects and how it works in reality, created a great opportunity to discuss the stumbling blocks in opening content for wider access and use. For some Zambian institutions, the main issue is a lack of a coherent and systematic digitisation efforts, and there was a shared feeling amongst attendees that there needed to be more guidance and clear policies about digitisation for them to follow, which are still not currently in place. Other institutions accumulated digital content and keep it available only internally, not looking into or even considering access and use to external audiences using online platforms on a systematic basis. 

The workshop discussions were lively and engaged; they identified that there is definitely a larger scope to learn from each other locally. In addition, there was a growing realisation amongst organisations that opening their digital content for use by an external audience is now the next step on the agenda of those who have already accumulated it. The feedback of one of the participants, which perhaps summarised this the most clearly, suggested what needs to happen after this workshop in three-steps: 

  • Put the knowledge acquired in the workshop to use ASAP.
  • Conduct a follow up workshop to determine progress in the innovation labs created.
  • Organise a massive awareness campaign to introduce potential users to the innovation labs created.

The workshop participants also experienced the traditional scheduled power outage for the day which explains why the photo illustrating the presentation of certificates is a bit dark (but hey, in the digital world we can easily fix such glitches!)

Fig.3. Participant receiving a certificate from Assoc. Prof. Milena Dobreva
Fig.3. Participant receiving a certificate from Associate Professor Milena Dobreva

Bringing for the first time to the Sub Saharan region the knowledge about innovation labs, fostering dialogue between representatives of different cultural heritage institutions, and discussing the issue of improving access to digital content is just a humble first step in what we hope will help local institutions to improve user engagement and overcome the current digital divide which keeps available digital content hidden from the world.  Read more about Innovation Labs and the digital divide.

Dr Milena Dobreva-McPherson, Associate Professor Library and Information Studies at UCL Qatar Dr Milena Dobreva-McPherson, is Associate Professor Library and Information Studies at UCL Qatar with international experience of working in Bulgaria, Scotland and Malta. Since graduating M.Sc. (Hons) in Informatics in 1991, Milena specialized in digital humanities and digital cultural heritage in the Bulgarian Academy of Sciences, where she earned her PhD in 1999 in Informatics and Applied Mathematics and served as the Founding Head of the first Digitisation Centre in Bulgaria (2004); she was also a member of the Executive Board of the National Commission of UNESCO. Milena’s research interests are in the areas of innovation diffusion in the cultural heritage sector; citizen science; and users of digital libraries. Milena is a member of the editorial board of the IFLA Journal - Sage, and of the International Journal on Digital Libraries (IJDL) - Springer and a member of the steering committed of the three biggest conference series in digital libraries, IJDL, TPDL and ICADL. Consultant of the Europeana Task Force on Research Requirements.  

 

Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA) Mr Tuesday Bwalya, Lecturer, Library and Information Science Department, The University of Zambia (UNZA). He holds a Master’s Degree in Information Science from China. In addition, Mr. Bwalya has received training in India and Belgium in Library Automation with Free and Open Source Library Management Systems such as Koha and ABCD. His research interests include free and open source library management systems; open access publishing; database systems; web development; records management; cataloguing and classification.

 

Fidelity Phiri, Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar Fidelity Phiri is currently employed as Librarian at Moto Moto Museum and a visiting researcher at UCL Qatar. He has worked for National Museums Board of Zambia since 2001. He  holds a Bachelor's degree in Library and Information Science from the University of Zambia. Fidelity  also graduated in April 2019 from UCL Qatar and  is a holder of a Master’s degree in Library and Information studies. His research interests are in bibliometrics studies and digital humanities/units  that provide access to digital collections.

Acknowledgements: We would like to thank Fred Nyambe for the photos and Dania Jalees for the infographic and the editing.

15 August 2019

Creating Geo-located Digital Sound Walks

A few months ago, here at the British Library we held an interesting Exploring with Sound Walks event, that discussed digital projects that connect literature, sound recordings, place, technology and walking. Several digital tools were mentioned by the presenters at this event, so this post, by Marcin Barski, is a practical guide for creating geo-located sound walks.

We hope you are inspired to create your own walks, listen to sound walks, vote for your favourite (you need to be logged in to vote), and maybe attend one of the Sound Walk Sunday events taking place on the 1st and throughout the month of September 2019. If you can easily travel to London, you may also be interested in attending a Sound Walk Sunday walkshop in and around the British Library, taking place 10:00-13:00 on Saturday 31st August.

Man standing next to a tree, wearing headphones and listening to a sound walk experience
Image copyright Stefaan van Biesen

Over to Marcin for his advice on creating sound walks:

Let's start with some basic definitions. A sound walk is any activity that involves both walking and some form of listening. Listening is a much broader term than most of us would ever suspect. It can basically relate to the very act of giving attention to sound, but if we focus on details we soon realise that, as much as it happens mostly involuntarily, in certain circumstances and contexts we can direct it at the topics or phenomena that otherwise would have been lost in the very rich audiosphere that incessantly surrounds us.

It's rather important to understand that those topics and phenomena do not necessarily need to be of audial nature. Using pre-recorded sets of narratives, spoken word or studio-engineered music, we can make our audience aware of stories normally hidden from sight.

In recent years several tools have been developed that help sound walk artists, educators and creators place sounds in exact locations. Once placed, we need to tell our audience how to find and experience them. This can be achieved by voice instructions, QR codes or most commonly (and conveniently) by using mobile apps that determine the user's position via GPS and trigger sounds in the exact locations where we would like them to be heard.

Below you will find a quick guide on how to start creating your own geo-located sound walk, along with descriptions of some of the tools that can make the process smooth and stress free.

  1. Know your subject

The very first thing you will need to do is to decide what story you want to tell. Do you know it well enough already, or is there a need to do some research? Is the story self-explanatory, or will you need to explain some or all details to your audience? And - actually, maybe the first question to ask - is it a story at all? Some sound walks can be based on natural recordings and music only, meaning the whole covered area changes its character without a single word.

Once your story is ready, decide how it should be conveyed to your audience. Do you prefer to tell the story yourself, or maybe it will be better to find and interview other people with significant knowledge of the topic? Creating a sound walk can sometimes be similar to working on a radio piece, in which important elements are delivered by experts or insiders. Will you need to record everything yourself or is it possible to find archival sounds that would add something valuable to your content?

  1. Choose the area

In the next step you will need to decide what area should be covered with your sounds and narratives. Would you like the sounds to be located only in the described and meaningful locations, or would you prefer to have the whole area covered with some more or less abstract background recordings? In the latter case, you need to take into consideration that the larger the area, the more sounds will be needed to fill all the silent spaces.

Bear in mind that your audience will most likely experience your piece by foot, hence the distances between the particular spots should not be too large. The best results are achieved when distances between the sounds allow for smooth and undistracted strolling.

Remember to consider safety! Don't force your audience to walk in hazardous or restricted areas. They will be using headphones, so choose a route away from busy roads.

  1. Choose your tools

Each app you are going to use will come with specific requirements for the format or duration of your recordings. In most cases these requirements will be easy to meet, but make sure you are doing it correctly right from the beginning to avoid the hassle of file conversion or additional editing in later stages. You will find helpful descriptions of some of the available apps below.

Creating a geo-located sound walk can be fun not only for you, but also for others. Consider working in a group in which all of you have assigned tasks or subjects to cover. You will be surprised how much a socially creative activity it can become. Some people create sound walks with children, or with their local community groups.

  1. Recording and editing

We don't all carry high quality microphones in our pockets. Of course, if you want to create an audiophile experience, you will need to secure professional audio recorders and microphones, however technology is not a barrier anymore. You can even use your smartphone to make your recordings - their quality will definitely be good enough to record speech.

If you'd like to use background sounds, and you have no means of making the recordings yourself, there are repositories of sounds available on the Internet. Impressive collection of sounds can be found, for example, on the British Library's SoundCloud channel. You can also search at http://archive.org and http://freesound.org - which are available for you to use free of charge.

There's quite a number of open-source and free sound editing software on the Internet. If you're not a professional sound designer, most likely Audacity will be enough for you. It's easy to use and has all the features you may need. It's also quite popular so you will find many helpful tutorials online.

  1. Placing sounds

This is probably the most pleasant and at the same time the most challenging part of the work. Be prepared to spend hours in your chosen area and to have your patience tested. Although most of the apps allow for fairly accurate placing of sounds, you will need to test each single location yourself. Sometimes you will need to move a sound by a few metres, other times you will want to change the way in which two sounds interact with each other. Wear comfortable shoes and submit to the trial and error process. Despite the challenges, trust us, it's fun!

  1. Go public and advertise

Once you are sure that all of your recordings are out there and in the exact places you want them to be, you can make your walk public. In most of the available apps you can publish with just one click. And when it's public, don't forget to tell everyone to try it. It's very rewarding to hear back from your audience - you will realise how much your work has re-shaped their perception of the chosen space.

Person standing in front of a church building, they are wearing headphones and listening to a sound walk
Image copyright Stefaan van Biesen

Here is a list of digital tools and platforms available for making sound walks:

Echoes

Echoes gives you the freedom to explore breath-taking GPS-triggered audio tours wherever you are. With the Echoes Creator, you can quickly and easily upload audio, images, and text, geolocate them on the map, and publish them for the world to see. Just add shapes to the map, which create geofenced areas. These will trigger content when your listeners physically walk inside them.

Echoes is free to use and available at http://echoes.xyz

PlaceCloud

Placecloud's mission is to reveal the cultural significance of everyday places. To achieve this, they have invented something called 'placecasts', or place-specific podcasts: short audio recordings with GPS coordinates attached to them. Users can listen to them while being physically present in the places they refer to.

Placecloud keeps the process simple: many of the steps described above won't be necessary when working with this tool. By adding your recordings you become part of a wider community of 'placecasters'.

http://www.placecloud.io

VoiceMap

VoiceMap is a tool for digital storytelling in public spaces. It's designed for storytellers and passionate locals all over the world who can - in an easy way - share their thoughts and narratives about the places they live in. As a creator you can guide your audience around your city - and get paid for this.

http://voicemap.me

Locosonic

Similar to Echoes, Locosonic is designed for creating "movies for your ears" - as they call it. Locosonic Soundscapes link sounds, music and stories to a location. While exploring an area, you will hear the Soundscape that matches your location. Like an additional sense, Locosonic allows you to experience places through their stories and music.

http://www.locosonic.com

CGeomap

CGeomap  is a collaborative tool which allows people to work together on the same project. Very easy to use, it creates simultaneously an online map and a browser-based web app, geolocating audio, text and visual content, without the need to install on your device.

It is more limited in terms of sound than Echoes or Locosonic, but adds media to your walk, and generates simultaneously an online media map, accessible for all on desktop. An extra feature allows the user to shift, while walking, from one map to the other, activating up to three layers of content in one place.

Info: http://bit.ly/300hpMS

Aporee

radio aporee ::: miniatures for mobiles is a platform for (creating) sound walks. These are created and organised by a web-based editing tool and listened to with a mobile phone app, while walking outside, at the site where the piece is created for. In addition to the phone apps, a (prototype) browser-based web app is also available, without the need to install the app on your device.

https://aporee.org/mfm/


We hope you have fun making and listening to sound walks! Sound Walk Sunday events are taking place on the 1st and throughout the month of September 2019. One of them "Ecumenopolis – the whole world is one city", by Geert Vermeire, is being made for walkers around the British Library London, the State Library of Moscow, the National Library of Greece and the City Library of Sao Paulo, so we can't wait to listen to this work.

This post is introduced by Digital Curator Stella Wisdom (@miss_wisdom) and Andrew Stuck from the Museum of Walking.  Many thanks to Marcin Barski, curator, music publisher, sound and installation artist, co-founder of the Instytut Pejzażu Dźwiękowego (Polish Soundscape Institute) for writing this practical guide to creating sound walks.

13 August 2019

A New System for the Digital Paleography of Middle Eastern Manuscripts

This guest post is by Michael Penn, Teresa Hihn Moore Professor of Religious Studies and Classics, Stanford University.

For much of the Middle Ages, the world’s most geographically expansive church was one that few in the twenty first century have ever hear of. Stretching from Turkey, throughout the Middle East, into Afghanistan, Tibet, India, and even China, the Syriac Church of the East reminds us that, for much of church history, the geographic center of Christianity was not in Rome, nor in Constantinople, but rather in Baghdad. But this particular church was only one of many ancient churches whose members did not write in the relatively well-studied languages of Greek and Latin but rather in a dialect of Aramaic called Syriac. The British Library has the world’s largest and most important collection of manuscripts from these Syriac churches. This corpus of almost 1100 Syriac codices includes the majority of the earliest surviving texts from these churches. But in order to properly analyze these works one must first figure out when and where they were written.

Because only a few libraries allow for the carbon dating of their materials, the most common way to localize a manuscript in space and time is the analysis of its handwriting, a practice called paleography. Paleography will always remain as much an art as a science. But at its base, much of paleography is shape comparison, a task that in recent years computers have become increasingly good at. Starting in 2011 I began assembling a forty-seven-person digital humanities team to explore the feasibility of computer assisted paleography for Syriac. That is, could one use big data, visual analytics, and recent advances in the digital analysis of handwriting to better understand Syriac manuscripts and Syriac manuscript culture.

This June we launched the first public-facing part of our project titled the “Digital Analysis of Syriac Handwriting” or DASH. DASH consists of digital page images from 152 manuscripts distributed across thirteen libraries each with a scribal note telling us when they were written. Together these constitutes 90% of the world’s surviving Syriac manuscripts securely dated to before the twelfth century. DASH also has just under 90,000 individually identified and trimmed early Syriac letters. This extensive data on Syriac handwriting allows scholars to better estimate the composition date of other Syriac manuscripts as well as to explore the overall development of Syriac script.

Using the DASH interface one can view these manuscripts in a Stanford designed manuscript viewer called Mirador. Below is a screen shot showing four British Library manuscripts all written in the opening years of the eleventh century by a scribe named Jeshua son of Andrew. They provide a rare opportunity to observe how consistent a scribes handwriting may be across manuscripts they wrote at different times.

British Library Syriac manuscripts by a Jeshua son of Andrew, displayed on DASH using Mirador

DASH also creates instant script charts allowing one to rapidly compare individual letter forms. Below are the beginnings of such a chart tracing Jeshua’s handwriting across the same four manuscripts.

Script charts tracing Jeshua’s handwriting, displayed on DASH

Using DASH one can also look beyond a single scribe and instead trace the chronological trajectory of multiple letters and their relationships to each other. This allows scholars, for the first time, to fully visualize how Syriac scripts developed.

DASH used for visualizing the development of Syriac scripts

This data has already revolutionized our understanding of Syriac. Open-source and with its code on GitHub, the project’s most far-reaching goal, however, is to serve as a template for digital paleography work in other language groups. So regardless of your own knowledge a Syriac, take a quick peek at dash.Stanford.edu. Play with some gorgeous manuscripts from the British Library and other depositories, leave a comment on the website, and contact us if our team can be of help as you develop your own research projects.

 

29 July 2019

Invitation to join ‘Digital Cultural Heritage Innovation Labs Book Sprint’, Doha, Qatar, 23-27 September 2019

Posted by Mahendra Mahey, Manager of BL Labs and Milena Dobreva-McPherson, Associate Professor Library and Information Studies UCL Qatar.

Laboratory Greyscale 3 resized 1600 x 900
Building Digital Cultural Heritage Innovation Labs

Calling all of you that work in and/or do research in Digital Cultural Heritage Innovation Labs! Join us in Doha, Qatar 23-27 September 2019 for a week long Book Sprint!

Apply now by midnight 5 August 2019 for one of the fully funded trips to take part!

We want to create a new guide for setting up, running and maintaining a Digital Cultural Heritage Innovation Lab. Let’s share our experiences (both awesome and challenging) far and wide so that other organisations don’t have to reinvent the wheel.

This is a fantastic opportunity to contribute to a legacy for the Cultural Heritage sector that is bigger than any of us individually. It’s going to be a lot of hard work, but it will also be a fun, creative, and rewarding process!

The idea for the sprint came from the Building Library Labs event we held at the British Library in September 2018, work which we built on in Copenhagen (March, 2019).

The event is generously sponsored by UCL Qatar, Qatar University Library and Books Sprint Ltd.

We will let applicants know by 8 August 2019 if they have been successful.

If you are not chosen, or simply can’t make it, don’t worry! We will find other ways to get you involved after the book is published.  We intend to promote the work as part of ‘International Open access week’ which will take place between 21-27 October 2019. We also want to make sure the book is a ‘living publication’ that will be constantly updated and amended online to ensure its continued relevance and usefulness to the broader cultural heritage sector and possibly further.

If you have any specific questions before you apply, please feel free to email me at mahendra.mahey@bl.uk or Milena at m.dobreva@ucl.ac.uk

22 July 2019

Our highlights from Digital Humanities 2019: Nora and Giorgia

We've put together a series of posts about our experiences at the Digital Humanities conference in Utrecht this month. In this post, Digital Curator Nora McGregor and Dr Giorga Tolfo from the British Library / Alan Turing Institute's Living with Machines project shares her impressions. See also Mia and Yann's post, and Rossitza and Daniel's post.

Tivoli
Lunchtime at TivoliVredenburg music hall, viewed from Cloud Nine

Nora McGregor

My most exciting discovery was the Libraries & Digital Humanities Special Interest Group (@LibsDH) of the Alliance of Digital Humanities Organizations (ADHO) (@ADHOrg). I found my PEOPLE! This is a loosely joined cohort of folks from Libraries across the world with a peculiar passion for all that is supporting digital scholarship. We held a casual, brief and efficient gathering over lunch where talk turned to joining forces to develop a summer school (in the vein of popular and prolific Rare Books, and Digital Humanities week long affairs) to address the specific digital skills training needs of Librarians.

Giorga Tolfo

What talk were you most looking forward to, why? 

DH2019 offered a huge plethora of panels and workshops to choose from. When I first read the program I felt like a hungry person at the supermarket, craving everything on the shelves. Since I couldn’t eat everything, I had to focus on the panels whose topic I knew was or sounded relevant to the Living with Machines project, an interdisciplinary project at the crossroad between historical research and artificial intelligence in collaboration with the Alan Turing Institute.

As my role involves an in depth knowledge of digitisation strategies for newspapers and data models, my attention was immediately drawn to the Oceanic Exchanges panel, which focussed on some case studies around the spread of news and/or the translation of concepts across the atlantic ocean as it emerged in newspapers. Among these studies, one I was particularly interested in was on the concept of italianità (= italianness) in italian and US-based italian ethnic newspapers at the time of the unification of Italy.

What did you learn?

What I found most interesting, beyond the content of the singular research cases presented, was that regardless of the focus of the project, in the digital humanities community there are an underpinning shared methodology, as well as common known concerns and issues that we are trying to face both independently and together.

Among the latter there is certainly a problem with the availability and access to datasets. Due to copyrights limitations or lack of funds to digitise new material some possibly relevant datasets aren’t available, forcing in some cases the research questions to be reshaped according to what is available. The impact of this is the blurring of the distinction between historical research and storytelling. Which stories emerge from data analysis and visualisation? Are these universal or just some among the many possible ones? Are the sources biased or reliable? These are epistemological problems that need to be addressed carefully.

On the other side, in terms of shared methodology, there is an increasing awareness of the need (and effort) to focus on integration, sustainability and shareability. Hence the interest of many research teams on common data models, open linked data, use of standard languages and methodologies, scalable and reusable components.

Anything else?

Well, the fun run! I was one of the enthusiastic 25 people who set the alarm clock at 6am just to run.. for fun!