THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

21 December 2017

Cleaning and Visualising Privy Council Appeals Data

This blog post continues a recent post on the Social Sciences blog about the historical context of the Judicial Committee of the Privy Council (JCPC), useful collections to support research and online resources that facilitate discovery of JCPC appeal cases.

I am currently undertaking a three-month PhD student placement at the British Library, which aims enhance the discoverability of the JCPC collection of case papers and explore the potential of Digital Humanities methods for investigating questions about the court’s caseload and its actors. Two methods that I’ll be using include creating visualisations to represent data about these judgments and converting this data to Linked Data. In today’s post, I’ll focus on the process of cleaning the data and creating some initial visualisations; information about Linked Data conversion will appear in a later post.

The data I’m using refers to appeal cases that took place between 1860 and 1998. When I received the data, it was held in a spreadsheet where information such as ‘Judgment No.’, ‘Appellant’, ‘Respondent’, ‘Country of Origin’, ‘Judgment Date’ had been input from Word documents containing judgment metadata. This had been enhanced by generating a ‘Unique Identifier’ for each case by combining the judgment year and number, adding the ‘Appeal No.’ and ‘Appeal Date’ (where available) by consulting the judgment documents, and finding the ‘Longitude’ and ‘Latitude’ for each ‘Country of Origin’. The first few rows looked like this:

Spreadsheet

Data cleaning with OpenRefine

Before visualising or converting the data, some data cleaning had to take place. Data cleaning involves ensuring that consistent formatting is used across the dataset, there are no errors, and that the correct data is in the correct fields. To make it easier to clean the JCPC data, visualise potential issues more immediately, and ensure that any changes I make are consistent across the dataset, I'm using OpenRefine. This is free software that works in your web browser (but doesn't require a connection to the internet), which allows you to filter and facet your data based on values in particular columns, and batch edit multiple cells. Although it can be less efficient for mathematical functions than spreadsheet software, it is definitely more powerful for cleaning large datasets that mostly consist of text fields, like the JCPC spreadsheet.

Geographic challenges

Before visualising judgments on a map, I first looked at the 'Country of Origin' column. This column should more accurately be referred to as 'Location', as many of the entries were actually regions, cities or courts, instead of countries. To make this information more meaningful, and to allow comparison across countries e.g. where previously only the city was included, I created additional columns for 'Region', 'City' and 'Court', and populated the data accordingly:

Country

An important factor to bear in mind here is that place names relate to their judgment date, as well as geographical area. Many of the locations previously formed part of British colonies that have since become independent, with the result that names and boundaries have changed over time. Therefore, I had to be sensitive to each location's historical and political context and ensure that I was inputting e.g. the region and country that a city was in on each specific judgment date.

In addition to the ‘Country of Origin’ field, the spreadsheet included latitude and longitude coordinates for each location. Following an excellent and very straightforward tutorial, I used these coordinates to create a map of all cases using Google Fusion Tables:

While this map shows the geographic distribution of JCPC cases, there are some issues. Firstly, multiple judgments (sometimes hundreds or thousands) originated from the same court, and therefore have the same latitude and longitude coordinates. This means that on the map they appear exactly on top of each other and it's only possible to view the details of the top 'pin', no matter how far you zoom in. As noted in a previous blog post, a map like this is already used by the Institute of Advanced Legal Studies (IALS); however, as it is being used here to display a curated subset of judgments, the issue of multiple judgments per location does not apply. Secondly, it only includes modern place names, which it does not seem to be possible to remove.

I then tried using Tableau Public to see if it could be used to visualise the data in a more accurate way. After following a tutorial, I produced a map that used the updated ‘Country’ field (with the latitude and longitude detected by Tableau) to show each country where judgments originated. These are colour coded in a ‘heatmap’ style, where ‘hotter’ colours like red represent a higher number of cases than ‘colder’ colours such as blue.

This map is a good indicator of the relative number of judgments that originated in each country. However, Tableau (understandably and unsurprisingly) uses the modern coordinates for these countries, and therefore does not accurately reflect their geographical extent when the judgments took place (e.g. the geographical area represented by ‘India’ in much of the dataset was considerably larger than the geographical area we know as India today). Additionally, much of the nuance in the colour coding is lost because the number of judgments originating from India (3,604, or 41.4%) are far greater than that from any other country. This is illustrated by a pie chart created using Google Fusion Tables:

Using Tableau again, I thought it would also be helpful to go to the level of detail provided by the latitude and longitude already included in the dataset. This produced a map that is more attractive and informative than the Google Fusion Tables example, in terms of the number of judgments from each set of coordinates.

The main issue with this map is that it still doesn't provide a way in to the data. There are 'info boxes' that appear when you hover over a dot, but these can be misleading as they contain combined information from multiple cases, e.g. if one of the cases includes a court, this court is included in the info box as if it applies to all the cases at that point. Ideally what I'd like here would be for each info box to link to a list of cases that originated at the relevant location, including their judgment number and year, to facilitate ordering and retrieval of the physical copy at the British Library. Additionally, each judgment would link to the digitised documents for that case held by the British and Irish Legal Information Institute (BAILII). However, this is unlikely to be the kind of functionality Tableau was designed for - it seems to be more for overarching visualisations than to be used as a discovery tool.

The above maps are interesting and provide a strong visual overview that cannot be gained from looking at a spreadsheet. However, they would not assist users in accessing further information about the judgments, and do not accurately reflect the changing nature of the geography during this period.

Dealing with dates

Another potentially interesting aspect to visualise was case duration. It was already known prior to the start of the placement that some cases were disputed for years, or even decades; however, there was no information about how representative these cases were of the collection as a whole, or how duration might relate to other factors, such as location (e.g. is there a correlation between case duration and  distance from the JCPC headquarters in London? Might duration also correlate with the size and complexity of the printed record of proceedings contained in the volumes of case papers?).

The dataset includes a Judgment Date for each judgment, with some cases additionally including an Appeal Date (which started to be recorded consistently in the underlying spreadsheet from 1913). Although the Judgment Date shows the exact day of the judgment, the Appeal Date only gives the year of the appeal. This means that we can calculate the case duration to an approximate number of years by subtracting the year of appeal from the year of judgment.

Again, some data cleaning was required before making this calculation or visualising the information. Dates had previously been recorded in the spreadsheet in a variety of formats, and I used OpenRefine to ensure that all dates appeared in the form YYYY-MM-DD:

Date

3) does it indicate possibility of lengthy set of case papers.?

It was then relatively easy to copy the year from each date to a new ‘Judgment Year’ column, and subtract the ‘Appeal Year’ to give the approximate case duration. Performing this calculation was quite helpful in itself, because it highlighted errors in some of the dates that were not found through format checking. Where the case duration seemed surprisingly long, or had a negative value, I looked up the original documents for the case and amended the date(s) accordingly.

Once the above tasks were complete, I created a bar chart in Google Fusion Tables to visualise case duration – the horizontal axis represents the approximate number of years between the appeal and judgment dates (e.g. if the value is 0, the appeal was decided in the same year that it was registered in the JCPC), and the vertical axis represents the number of cases:

 

This chart clearly shows that the vast majority of cases were up to two years in length, although this will also potentially include appeals of a short duration registered at the end of one year and concluded at the start of the next. A few took much longer, but are difficult to see due to the scale necessary to accommodate the longest bars. While this is a useful way to find particularly long cases, the information is incomplete and approximate, and so the maps would potentially be more helpful to a wider audience.

Experimenting with different visualisations and tools has given me a better understanding of what makes a visualisation helpful, as well as considerations that must be made when visualising the JCPC data. I hope to build on this work by trying out some more tools, such as the Google Maps API, but my next post will focus on another aspect of my placement – conversion of the JCPC data to Linked Data.

This post is by Sarah Middle, a PhD placement student at the British Library researching the appeal cases heard by the Judicial Committee of the Privy Council (JCPC).  Sarah is on twitter as @digitalshrew.    

18 December 2017

Workshop report: Identifiers for UK theses

Along with the Universities of Southampton and London South Bank, EThOS and DataCite UK have been investigating if having persistent identifiers (PIDs) for both a thesis and its data would help to liberate data from the appendices of the PDF document. With some funding from Jisc in 2014, we ran a survey and some case studies looking at the state of linking to research data underlying theses to see where improvements could be made. Since then, there has been some slow but steady progress towards realising its recommendations. Identifiers are now visible in EThOS itself (see image below) and a small number of UK institutions are now assigning Digital Object Identifiers (DOIs) to their theses on a regular basis. Many more are implementing ORCID iDs for their post-graduate students. We wanted to reignite the conversation around unlocking thesis data and see what was needed to progress it further.

EThOS_CambridgeRecord_JBB

On 4th December 2017, we ran a workshop to hear what progress is being made and what the remaining barriers are to applying persistent identifiers to theses and thesis data. We heard from both the University of Cambridge and the London School of Hygiene and Tropical Medicine, both of whom are assigning DOIs to published theses on a regular basis. They gave an outline of how they have got to this point, including the case made within the university to ensure DOIs were available for theses.

As institutions start to identify their theses with DOIs, we need to ensure that these identifiers are picked up and usable in EThOS. Heather Rosie (EThoS Metadata Manager) explained how the lack of any consistent identifier for theses up to this point hinders disambiguation – due to errors in titles and different representations of author names, we simply do not know many theses have been published in the UK. But Heather also highlighted what institutions can do to help ensure any available identifiers make their way into EThOS - by making sure they are available for harvest, especially via OAI-PMH.

Based on the morning’s presentations there was broad discussion around the remaining issues that institutions still have in applying their DOIs or ORCIDs to their published theses. These included barriers such as:

  • Low priority due to lack of buy in or interest from both researchers and institutional decision-makers. Interest could be increased by improving understanding of what PIDs are and what they can do, particularly the tangible benefits they provide
  • A single institution may use multiple systems to manage different pieces of information about its researchers and their outputs. This creates internally competing systems that overlap; uneven resource; and a lack of clarity about what details go where
  • Further technical barriers include having to rely on the suppliers of non-open source systems to make the appropriate changes. Where plug-ins for even open-source systems are developed at institution, the associated workflow might not be appropriate for all other users. Finally, technical support teams tend to be removed from Library staff
  • Sustainability of using the identifiers, especially in terms of cost.

The second half of the workshop looked towards both the future and the past: whether the British Library digitising its large collection of legacy theses on microfilm might be a way to make them available to users, but also to ensure they are digitally preserved and assigned persistent identifiers. Paul Joseph from the University of British Columbia (UBC) gave us a great example to consider here: they have digitised 32,000 (both doctoral and masters level) and made them openly available through their repository: assigning DOIs as they did so. A major concern for UK universities undertaking a similar endeavour is the inability to confirm that third-party rights have been cleared in the thesis. But under their clear take-down policy, it was interesting to hear that UBC find that they only receive 2-3 take-down notices per year.

The final discussions of the day covered community needs for the future. This included two topics carried over from the morning’s session, on how we make the case for wider application of identifiers to theses to researchers and senior management and what can be done to make technical integration and workflow changes possible or easier. We also dug down into the other persistent identifiers related to theses that would support the needs of the UK community (such as organisation identifiers and funding identifiers), the potential for the Library to mass-digitize theses and assign DOIs to them and the other steps that can be taken to break data out of the thesis.

Through these discussions we got a strong steer as to what we at the British Library need to do to help to support the community in using persistent identifiers as a way of encouraging greater availability of doctoral research. These include providing:

  • more advocacy for PIDs – for example to students & research managers. We heard that a message from BL goes a long way – ‘we have to ask you to claim an ORCID iD because the British Library says so’, or ‘DOIs are needed because national thesis policy says so’
  • metadata guidance for libraries. What we already provide is great but we could do more of it, e.g. best practice examples, support desk, engage with system suppliers on behalf of institutions
  • preservation of digital theses. This is urgently needed
  • a big piece of IPR work to give institutions the confidence to make legacy theses open access without express permission, including a press campaign to drive interest & support.

But it is not only the Library that attendees thought may influence developments. There was also a clear appetite for stronger mandates from funders to support the deposit of open theses and reduction of embargo periods. There was also interest in national-level activities such as a national strategy for UK theses or a Scholarly Communication Licence for theses.

It’s clear there’s still a lot to be done before we’re at a stage where we can rely on persistent identifiers to help us jail-break research data out of thesis appendices. But we’ll continue to work with the community on this through EThOS and DataCite UK. We hope to hold a webinar in 2018 to talk more about the outcomes of this workshop, but in the meantime you can direct any questions on this work to datasets@bl.uk.

This post is by Rachael Kotarski, the British Library's Data Services Lead, on twitter as @RachPK.

29 November 2017

Crowdsourcing using IIIF and Web Annotations

Alex Mendes from the Digital Scholarship team explains how the LibCrowds platform uses emerging standards for digitised images and annotations.

Our new crowdsourcing project, In the Spotlight, was officially launched at the start of November 2017. The project asks volunteers to identify and transcribe key data held in digitised playbills. Here we explore two of the key technologies we adopted to enable this: IIIF and Web Annotations.

Task-configuration
Configuring a selection task using JSON

Commonly, when an institution began digitising a new type of content, or a particular project realised that the current infrastructure didn’t fit their needs, they may have built or commissioned a new image viewer, one that would probably be tightly coupled with their custom metadata structures. This leads to an ever-growing collection of isolated data silos that, among other issues, do not allow the information they contain to be easily reused.

The International Image Interoperability Framework (IIIF) is a set of APIs (protocols for requests between computers) that aims to tackle this issue by allowing images and metadata to be requested in a standardised way. Via these APIs, particular regions of images can be requested in a specified quality, size and format. The associated metadata includes information about how the images should be displayed and in what order. As this metadata is standardised, different image viewers can be built that are all able to understand and display the same sets of images. The one increasingly used by the library for catalogue items is called the 'Universal Viewer'.

Another IIIF-compliant viewer, called LibCrowds Viewer, has been developed for In the Spotlight. The viewer takes advantage of the flexibility enabled by the APIs described above. Images and metadata already held by the British Library can be requested, combined with some additional configuration details, and used to generate sets of crowdsourcing tasks. This means that we don’t need to host any additional image data, nor are we tied to any institution-specific metadata structures. In fact, the system could be used to generate crowdsourced annotations for any IIIF-compliant content.

Transcriptions are collected in the form of Web Annotations, a W3C standard that was published at the start of this year. This is another step towards future interoperability and reuse. By adopting this standard we can share our transcriptions more easily across the Web and incorporate them back into our core discovery systems.

As well as making the crowdsourced transcriptions searchable via the library’s catalogue viewer, they will be made available via the IIIF Content Search API, further increasing the ways in which the data could be reused. For example, we could develop programmatic ways to search the collection for a particular person who performed in a certain play in a given location.

To enable such exciting functionality we first need to collect the data and since we launched volunteers have completed over 14,000 tasks, which is a fantastic start. Visit In the Spotlight to get involved.

20 November 2017

Heritage and Data: Challenges and Opportunities for the Heritage Sector

The AHRC Heritage Priority Area, Heritage Futures, the Alan Turing Institute and the British Library have recently released the report ‘Heritage & Data: Challenges and Opportunities for the Heritage Sector’.

This report captures key issues raised during the ‘Heritage and Data Workshop’ event that was held at the British Library in June 2017. The workshop, envisaged as a start of a sector-wide heritage data conversation, brought together key representatives from the UK heritage industry and academic community to discuss challenges and opportunities, arising as data becomes ever more significant in the heritage sector.

The workshop attempted to understand the existing capacity and data developments in larger heritage organisations through the case studies from the National Archives, British Museum, Heritage Lottery Fund, Historic England and the British Library. It was acknowledged, however, that data opportunities are of great importance, and even more challenging, for smaller heritage organisations. This will require further discussion in the future.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-11-20/fa338fbb-f5f5-41be-8ece-67203c3d9e4a.png
Heritage and Data: Report of the Heritage Data Research Workshop held Friday 23 June 2017 at the British Library, London

Based on the case studies presented, it was evident that many organisations are actively developing their data capabilities, as well as a range of increasingly successful and innovative projects, such as the National Archives’ project Traces through Time, Heritage Index developed by the Heritage Lottery Fund and the RSA, or the British Museum’s project ResearchSpace. It was encouraging to hear that data is increasingly seen as a cross-organisational and strategic issue, as in the examples of the British Library’s development of its first Data Strategy and the Historic England’s work to develop a set of organisational priorities related to data.

The case studies presented at the workshop led to a series of discussions, which were aiming to distil a set of shared themes, research questions and recommendations that would move heritage data developments forward.

The participants were united in their belief that data will have transformative effect on heritage practice in collecting, curating and managing both natural and cultural heritage, as well as in engaging and understanding audiences. At the same time, it was recognised that further advancement of heritage data is dependent on the sector’s ability to address the issues of transparency and privacy, and to build trust among partners and the public.

An important principle defined by the workshop was a vital requirement for interdisciplinary research and cross-sector collaboration, with data specialists working in close collaboration with domain experts to ensure quality of data and relevant contextualisation and interpretation.

The workshop participants also emphasised the need for a stronger collective voice in shaping policy and investment, as well as a need for a joint effort to raise the profile of data with audiences.

The report captures a number of both long-term and more immediate recommendations. In the long-term, one particularly challenging area is a range of infrastructure related challenges that need solving in order to enable interoperability and more integrated discovery of heritage data. This and the other recommendations all share a requirement for greater collaboration across the sector, hence an immediate recommendation is to explore the development of an exchange space, or a hub, to share experiences and best practice.

This post is by Maja Maricevic, the British Library's Head of Higher Education, on twitter as @MajaMaricevic.

09 November 2017

You're invited to come and play - In the Spotlight

Mia Ridge, Alex Mendes and Christian Algar from the Library's Digital Scholarship and Printed Heritage teams invite you to take part in a new crowdsourcing project...

It’s hard for most of us to remember life before entertainment on demand through our personal devices, but a new project at the British library provides a glimpse into life before electronic entertainment. We're excited to launch In the Spotlight, a crowdsourcing site where the public can help transcribe information about performance from the last 300 years. We're inviting online volunteers to help make the British Library's historic playbills easier to find while uncovering curiosities about past entertainments. You can step Into the Spotlight at http://playbills.libcrowds.com

The original playbills were handed out or posted outside theatres, and like modern nightclub flyers, they weren't designed to last. They're so delicate they can't be handled, so providing better access to digitised versions will help academic, local and family history researchers.

Playbills compiled into a volume
The Library’s collection has over a thousand volumes holding thousands of fragile playbills

 

What is In the Spotlight?

Individual playbills in the historical collection are currently hard to find, as the Library's catalogue contains only brief information about the place and dates for each volume of playbills. By marking up and transcribing titles, dates, genres, participant volunteers will make each playbill - and individual performances - findable online.

We’ve started with playbills from theatres in Margate, Plymouth, Bristol, Hull, Perth and Edinburgh. We think this provides wider opportunities for people across the country to connect with nationally held collections.

Crowdsourcing interface screenshot
Take a close look at the playbills whilst marking up or transcribing the titles of plays

 

But it's not all work - it's important to us that volunteers on In the Spotlight can indulge their curiosity. The playbills provide fascinating glimpses into past entertainments, and we're excited to see what people discover.

The playbills people can see on In the Spotlight provide a fabulous source for looking at British and Irish social history from the late 18th century through to the Victorian period. More than this, their visual richness is an experience in itself, and should stimulate interest in historical printing’s use of typography and illustrations. Over time, playbills included more detailed information, and these the song titles, plot synopses, descriptions of stage sets and choreographed action from the plays help bring these past performances to life.

Creating an open stage 

You can download individual playbills, share them on social media or follow a link back to the Library's main catalogue. You can also download the transcribed data to explore or visualise as a dataset.

We also hope that people will share their discoveries with us and with other participants, either on our discussion forum, or social media. Jumping In the Spotlight is a chance for anyone anywhere to engage with the historical printed collections held at the British Library. We’ve created our very own stage for dialogue where people can share and discuss interesting or curious finds - the forum is a great place to post about a particular typeface that takes your fancy, an impressive or clever use of illustration, or an obscure unheard-of or little known play. It's also a great place to ask questions, like 'why do so many playbills announce an evening’s entertainment, ‘For the Benefit’ of someone or other?'. In the Spotlight’s open stage means anyone can add details or links to further good reads: share your growing knowledge with others!

We're also keen to promote the discoveries of project volunteers, and encourage you to get in touch if you'd like to write a short post for the Library’s Untold Lives blog, the English & Drama blog or here on our Digital Scholarship blog. If forums and twitter aren't your thing, you can email us digitalresearch@bl.uk.

Playbill from Devonport, 1836
In the Spotlight is an ‘Open House’ – share your findings with others on the Forum, contribute articles to British Library blogs!

 

What's been discovered so far?

We quietly launched an alpha version of the interface back in September to test the waters and invite comments from the public. We’ve received some incredibly helpful feedback (thank you to all!) that has helped us fine-tune the interface design. We also received some encouraging comments from colleagues at other libraries who work with similar collections. We’ll take someone saying they are 'insanely jealous' of the crowdsourcing work we are doing with our historical printed collections as a good sign!

We've been contacted about some very touching human-interest stories too - follow @LibCrowds or sign up to our crowdsourcing newsletter to be notified when blog posts about discoveries go live. We're looking forward to the first post written by the In the Spotlight participant who uncovered a sad tale behind a Benefit performance for several actors in Plymouth in 1827.

What can you do?

Take on a part! Take a step Into the Spotlight at http://playbills.libcrowds.com and help record titles, dates and genres.

If you are interested in theatre and drama, in musical performance, in the way people were entertained, come and explore this collection and help researchers while you’re doing it. All you need is a little free time and it’s LOTS OF FUN! Help us make In the Spotlight the best show in town.

Lots-of-fun
Join in, it'll be lots of fun!

04 November 2017

International Games Week 2017

Today at the British Library we are hosting a pop-up game parlour for International Games Week. So if you are in the Library between 10:00 and 16:00 come play some games!

IGW_Logo_Africa-EuropeWe have our usual favourites, including Animal Upon Animal, Biblios, Carcassonne, Dobble, Pandemic, Rhino Hero, Scrabble and Ticket To Ride Europe.

Plus some new ones, including The Hollow Woods: Storytelling Card Game, which revives the Victorian craze for ‘myrioramas’ and Great Scott! - The Game of Mad Invention, a Victorian themed card game for 3 to 5 players, made by Sinister Fish Games, which uses images selected from the British Library’s Mechanical Curator collection on Flickr in their artwork

Great Scott! - The Game of Mad Invention

It is always lovely to see the British Library’s digital collections being used in creative projects and this week Robin David won the BL Lab's commercial award for his game Movable Type; which also used the Mechanical Curator images in the artwork for a card-drafting, word-building game that has been described like Scrabble crossed with Sushi Go. Moveable Type was a successful Kickstarter campaign in 2016, which sold out quickly, but we understand they have a new Kickstarter being launched very soon, we'll keep you posted!

Cassie Elle's explanation of Movable Type by Robin David

In addition to board and card games, we are also delighted to host Sally Bushell and James Butler from Lancaster University, who the British Library are working with on the AHRC funded project Creating a Chronotopic Ground for the Mapping of Literary Texts. They have been using Minecraft for The Lakescraft Project; which created an innovative teaching resource to provide a fun and innovative means of introducing concepts centred around the literary, linguistic, and psychological analysis of Lake District's landscape. This is a fascinating initiative and I'm pleased to report Lakescraft has evolved into a broader project called Litcraft, to use the approach for exploring literature set in other locations.

Introduction to The Lakescraft Project

Introductory video for Litcraft's first public release: R.L.Stevenson's Treasure Island

So lots of exciting fun games happening today in the  British Library and if you can't be here in person, do keep an eye on social media using the hashtag #ALAIGW. Also do check out what games clubs and events may be running in your local library.

This post is by Digital Curator Stella Wisdom, you can follow her on twitter @miss_wisdom

18 October 2017

Databeers Descends on Digital Scholarship!

Last week over 150 data enthusiasts descended on the British Library as the Digital Scholarship Team played host to Databeers – London, a global data-oriented networking group started in Spain and now in over twenty cities around the world. There is frankly nothing more fun for us than downing some beers, listening to great data talks, and introducing the British Library to a whole new audience---particularly those who may not have considered the Library as anything other than a staid place with a whole lot of old books, let alone ventured through the front doors

DMAcIGwXcAEb4mm
DataDrunkards. Image courtesy of @DatabeersLDN

Home to the UK Web Archive, Turing Institute and a staggering amount of digital collections and data, the British Library is a thriving place for data-centric research and our team is here to support that  innovative use of the Library’s digital content.

To get a sense of the many ways in which we do this, pop along to our upcoming 5th Annual BL Labs Symposium on Monday 30 October 2017, our annual networking and awards event showcasing interesting projects which have used the British Library's digital content over the past year.

 

 

 This post is by Digital Curator Nora McGregor, on twitter as @ndalyrose

17 October 2017

Imaginary Cities – Collaborations with Technologists

Posted by Mahendra Mahey (Manager of BL Labs) on behalf of Michael Takeo Magruder (BL Labs Artist/Researcher in Residence).

In developing the Imaginary Cities project, I enlisted two long-standing colleagues to help collaboratively design the creative-technical infrastructures required to realise my artistic vision.

The first area of work sought to address my desire to create an automated system that could take a single map image from the British Library’s 1 Million Images from Scanned Books Flickr Commons collection and from it generate an endless series of everchanging aesthetic iterations. This initiative was undertaken by the software architect and engineer David Steele who developed a server-side program to realise this concept.

David’s server application links to a curated set of British Library maps through their unique Flickr URLs. The high-resolution maps are captured and stored by the server, and through a pre-defined algorithmic process are transformed into ultra-high-resolution images that appear as mandala-esque ‘city plans’. This process of aesthetic transformation is executed once per day, and is affected by two variables. The first is simply the passage of time, while the second is based on external human or network interaction with the original source maps in the digital collection (such as changes to meta data tags, view counts, etc.).


Time-lapse of algorithmically generated images (showing days 1, 7, 32 and 152) constructed from a 19th-century map of Paris

The second challenge involved transforming the algorithmically created 2D assets into real-time 3D environments that could be experienced through leading-edge visualisation systems, including VR headsets. This work was led by the researcher and visualisation expert Drew Baker, and was done using the 3D game development platform Unity. Drew produced a working prototype application that accessed the static image ‘city plans’ generated by David’s server-side infrastructure, and translated them into immersive virtual ‘cityscapes’.

The process begins with the application analysing an image bitmap and converting each pixel into a 3D geometry that is reminiscent of a building. These structures are then textured and aligned in a square grid that matches the original bitmap. Afterwards, the camera viewpoint descends into the newly rezzed city and can be controlled by the user.

Takeo_DS-Blog3-2_Unity1
Analysis and transformation of the source image bitmap
Takeo_DS-Blog3-3_Unity2
View of the procedurally created 3D cityscape

At present I am still working with David and Drew to refine and expand these amazing systems that they have created. Moving forward, our next major task will be to successfully use the infrastructures as the foundation for a new body of artwork.

You can see a presentation from me at the British Library Labs Symposium 2017 at the British Library Conference Centre Auditorium in London, on Monday 30th of October, 2017. For more information and to book (registration is FREE), please visit the event page.

About the collaborators:

Takeo_DS-Blog3-4_D-Steele
David Steele

David Steele is a computer scientist based in Arlington, Virginia, USA specialising in progressive web programming and database architecture. He has been working with a wide range of web technologies since the mid-nineties and was a pioneer in pairing cutting-edge clients to existing corporate infrastructures. His work has enabled a variety of advanced applications from global text messaging frameworks to re-entry systems for the space shuttle. He is currently Principal Architect at Crunchy Data Solutions, Inc., and is involved in developing massively parallel backup solutions to protect the world's ever-growing data stores.

Takeo_DS-Blog3-5_D-Baker
Drew Baker

Drew Baker is an independent researcher based in Melbourne Australia. Over the past 20 years he has worked in visualisation of archaeology and cultural history. His explorations in 3D digital representation of spaces and artefacts as a research tool for both virtual archaeology and broader humanities applications laid the foundations for the London Charter, establishing internationally-recognised principles for the use of computer-based visualisation by researchers, educators and cultural heritage organisations. He is currently working with a remote community of Indigenous Australian elders from the Warlpiri nation in the Northern Territory’s Tanami Desert, digitising their intangible cultural heritage assets for use within the Kurdiji project â€“ an initiative that seeks to improve mental health and resilience in the nation’s young people through the use mobile technologies.