Digital scholarship blog

11 July 2016

Finding digitised books and images about Finland in a collection of 65,000 books

Posted by Ruby Dixon, currently a student at Graveney School and on work-experience at BL Labs.

Background

The ‘Microsoft’ books are 65,000 digitised volumes - about 22.5 million pages - which were published between 1789 and 1914; they were digitised in partnership with Microsoft. They cover a wide range of subject areas including philosophy, poetry, history and literature and they include Optically Character Recognised (OCR) text from the millions of pages.

In discussion with Mahendra Mahey, Project Manager of BL Labs, we explored making a ‘sub collection’ from this larger set which will hopefully help researchers in the future. After thinking about making a collection of ‘works of fiction’, ‘bibles’ or titles about ‘slavery’ I decided that identifying a collection of books about Finland would be the most interesting and realistic thing to do as part of my mini-project at the Library.

The collection I am creating will hopefully help a project that the Library might be working on which celebrates the 100th year of independence of Finland in 2017.

Facts about Finland

When starting this mini-project, I thought it would be wise to do some background research about Finland. I thought this would be a great way to put my GSCEs in Geography and History to use. Knowing more about the history and geography of Finland would help me in my ‘detective’ hunt through the collection of books. I would learn about important keywords I might need to use to help me identify relevant books in the digitised collection.

Here are some useful facts that you may not know about Finland:

  • Finland had autonomy with Russia on 29 March 1809.
  • Finland received independence on 6 December 1917.
  • Finland joined the European Union on 1 January 1995.

These and more facts can be accessed online: https://en.wikipedia.org/wiki/Finland

Map of Finland picA map showing Finland, taken from Wikipedia: https://en.wikipedia.org/wiki/Finland

This gave me a clue in understanding that there may in fact be several books in the collection in the Russian Language that could cover Finland, given that Finland was given autonomy in 1809 from Russia. Looking at the map of Finland, I also realised that bordering countries would most likely have books about Finland as well.

Approach

Analysing the collection spreadsheet 

Master spreadsheet pic 2A screen shot of a section of the spreadsheet containing 65,000 records of digitised books in the ‘Microsoft Books’ collection.

My first task was to examine the huge spreadsheet containing information about the 65,000 books in the collection.

There were several lines of ‘attack’ we could take in finding information about Finland in this collection, some which involve using the ‘Filter’ function in Excel.

Master spreadsheet picScreen shot from Microsoft Books Spreadsheet: 1. The 'Filter' function in Excel. 2. Filter has been applied on the language code for Finland ‘fin’

We came up with the following strategy:

  1. Find words relating to 'Finland' in the Title field in the spreadsheet for the books.
  2. This task would have to be done in several languages as there are 28 languages listed in the language code field (column C). I decided I would prioritise English and languages of bordering nations around Finland and if I had time would look at the other languages too.
  3. I knew I would have to use Google translate (https://translate.google.co.uk/) to find equivalent words in that language relating to Finland to help me with filtering.

In terms of thinking of what words I might use for the filtering, Mahendra suggested that it might be useful to create a word cloud about all things 'Finnish'; this might help me decide which words were the most important and to use first in filtering.

I used https://tagul.com/ and here is the word cloud I made using the Wikipedia page about Finland:

Word cloud picWordcloud created using Tagul, based on the Wikipedia page in English about Finland.

From this, we decided to use the following words (the amount of words was limited due to time): Finland, Finnish, Helsinki and Finn. 

We also filtered using Danish, Swedish, German, English, Finnish and Russian languages and using related words about Finland in those languages.

Below is a summary table showing the number of books we found by applying a filter to the 'Title' field in the spreadsheet about words related to 'Finland'.

Table 1The table above shows the number of books I found using various filters in the digitised collection.

Please note, that I didn’t have time to look further into the collections we found in some of the non-English language collections, as I am not a native speaker in any of them. More time would be needed to filter this collection. The spreadsheet is available here.

What is interesting, however, is that we know there are 582 books in the collection in the Russian language, details of which I sent to Katya Rogatchevskaia, Lead Curator of East European Collections. 

Images in the books about Finland

I learned how the images from the 'Microsoft' books were extracted and placed on The British Library’s Flickr page. This slide from a BL Labs presentation nicely summarises how it all happened: 

Flickr process pic

Taken from the BL Labs Slideshare account, http://www.slideshare.net/labsbl

More information is available from a blog post written by Ben O’Steen, Technical Lead of BL Labs, which explains this process in much more detail.

What I realised was that there must be images identified in these books which relate to Finland. Mahendra suggested that I first look at some work done by the Wikimedia community on trying to find maps within these images.

Wikimedia commons synoptic index

The Wikimedia Commons Synoptic Index for the Mechanical Curator images, contains a really handy breakdown of the images by geographical place.

Wikimedia pic

Image taken from British Library/Mechanical Curator collection/Synoptic index, Europe.

From this, I was able to find that there were 12 books that had been identified as having images which had something to do with Finland in them.

Wikimedia Finland picImage taken from Wikimedia Commons page.

This was a great way to start, but now I thought I would try the British Library’s Flickr Commons site to see if there were more images about Finland that had been tagged with Finland-related words.

British Library Flickr Commons

As of 07/07/16 there are 1,023,705 images on the British Library’s Flickr Commons page; a large proportion of these come from images snipped out of the digitised books that I have been working on.

The site has had an incredible 400,000,000 plus views and users have tagged over 100,000 images with around 500,000 tags. I am really looking forward to see what the winners of the Labs Competition 2016 will do on their SherlockNet project as they are hoping to tag all the images using computers code!

For now, I wanted to use the tags already there to see if I could find images relating to Finland.

Here is an example image which has several tags added, some of which relate to Finland:

  Image from Flickr 1 Flickr tags pic
Tags added to an example image on the British Library Flickr Commons page.

Here you can see tags such as ‘Finland’, ‘Suomi’ (Finnish for ‘Finnish’), ‘Helsinki’, ‘Helsingfors’ (Swedish for ‘Helsinki’) etc. which have been added by Flickr users (grey tags). Please note that tags in white are those added automatically by Flickr itself.

I have summarised the images I have found on the British Library’s Flickr Commons collection below:

 Keyword(s) used and link to BL Flickr Commons   Number of images found 
Finland 917
Helsinki 18
Suomi 3
Suomen 418
Suomalaiset 15
Finns 42
Finnish 352
Gulf of Finland 43
Kulturbilder ur Finlands historie 1
Turku 3
Pori 4
Tampere 1
Kuopio 2
Hanko 177
Lapland 148
Suomenlinna 2
Kemi 1
Total 1997

 Table showing links and number of British Library Flickr Commons images about Finland

What is clear from this initial research is that there are definitely more books with images about Finland than the 12 identified through Wikimedia Commons. Much more work will be needed on this. Also, I would recommend that all the images that I have found be downloaded so that they may be used for the Finnish 100 year independence project.

In conclusion, I have enjoyed being able to participate in this project and have loved getting involved in some work on it. Although it has been relatively challenging, this new experience has been very interesting and I have definitely enjoyed spending my time on it. On the other hand, I would say that more time is certainly needed on this project to find more books in the 65,000 collection as I have only had a limited amount of time to spend on it. Furthermore, I would recommend that more words relating to Finland should be found and used in several languages to filter the master spreadsheet, in order to add more books to the Finnish collection. Lastly, one other thing that could be done to develop this project even further is to work with the curators of other languages to help identify Finland-related books.

If you would like to find more sub collections in the Microsoft books collection, please email labs@bl.uk, they would love to hear from you!

Tomorrow I will blog about my work experience at the library.

 

 

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs