Digital scholarship blog: May 2013

9 posts from May 2013

23 May 2013

Does crowdsourcing capture the crowd at its wisest? An interview with Nick Hiley

How best to make the non-textual “stuff” (to borrow Tim Hitchcock’s nomenclature) we digitise discoverable for users is one of the real challenges facing digitization projects. Whereas making printed text available to mining is relatively straightforward, the same cannot be said for images. Cartoons, a medium which typically contains text alongside implied meaning, foreground these challenges. It seems appropriate then that the British Cartoon Archive was chosen, as a partner in the Going Digital programme (which I blogged about last week), to host a workshop which touched on issues of image creation, management, metadata and discoverability. I was excited to have the opportunity to chat to Dr Nick Hiley, the Head of the British Cartoon Archive (whose website was deemed by curators and and experts from the British Library and participating libraries as one of the top 100 for future researchers), to discuss those challenges, their impact on digital scholarship, and locating the real wisdom of the crowd.

James: Welcome to the British Library and thanks for coming along. First up I wanted to ask was how did you get involved in Going Digital?

Nick: I was asked if I’d like to participate! Very early on in fact, at the sort of time when you think this will never take off the ground and it doesn’t matter if I say yes! Then suddenly it took off, and I thought really early on that if we can’t do something interesting about digitising images then what can we present because that’s what we do and have done for a long time. There was some sudden worry, because I realised I’ve been doing this for so long without actually knowing the difference between gifs and tiffs and jpegs and whatever else, but then I realised again that you just have to know the point at which you leave the technicality to someone else and you continue doing what you are good at, which I think in our case - at the Archive - is content. And that is what I’ve tried to get over in the workshop: that you need to know that all these technicalities exist but you also need to know the point at which you say that is where I stop and I hand over to somebody else to understand the bit depths and colour balance and exactly how to store this material safely, put it on the web, and so on. So, I was invited.

James: I’m glad you were invited - not only because you could then invite me to come along - but because I see the British Cartoon Archive as having a really interesting collection - not only because of my own research interests - but because of the wealth of description that the dataset contains.

Nick: Indeed. The collection is big, and in a sense 5-10 years ago it was bigger, comparatively. When I arrived in 1999 there were very few databases of this sort of size - even though it only contained 30,000 images. The other thing that interests me is that we’ve been around long enough to remake our images. Whereas if somebody is starting a project now you must produce your archival master and generate images from that - you digitise once and use many times - well we’ve digitised once many times because what that image is needed for and what we’ve defined as an archival image has changed so much over 10-15 years. So the tiny little images that we put in our catalogue in the beginning, because we needed the catalogue to tell us what was downstairs in the archive for us to bring up, are quite different from the 100MB tiffs that we might produce now. A 100MB tiff delivered over a dial-up landline is no use to anybody!

James: And a 100 MB tiff made for a very different purpose: that is the archive in a way. Whilst places such as the British Cartoon Archive were previously cataloguing what was downstairs with their digital images, you are now doing something different. In many cases you don’t even have it downstairs. The archive is the server.

Nick: The strange thing is that because we’ve been around so long we still don’t think enough of our digital collection that we have. We have a big digital collection. Probably a million images, if you break it down into the different sizes of images we deliver on the web. For instance most of these still have a number which is the number given to the physical object. So we essentially have two sets of collections which have the same catalogue numbers. We need to do something about that because we need to accept that this is a separate collection and that is has to be separately conserved and looked after.

Pick the implied meaning out of that... Mancunian Computer Science Joke art courtesy of Flickr user guitarfish / Creative Commons Licensed [apparently inspired by our very own Andy Jackson]

James: Given that change in the nature of the purpose of the images themselves how has the way they’ve been described changed over the life of the British Cartoon Archive?

Nick: I think that probably the ambitions have developed, initially I don’t think it was an ambition to put quite so much contextual information with the image: in terms of notes about the background, cataloguing not only the people shown but also the people referred to, the implied text - so if it visually refers to Alice in Wonderland but not textually, we put that in. And it is that which interestingly now is changing again, because of the feeling that it is better to get as many images out there as possible than to describe them in great detail with added metadata. It’s an interesting point that we’re at, because you can see the rise in that ambition to describe and to index through the description, and you can perhaps see that beginning to be undermined again by the feeling that this is work that should be done by the users, work that will be done if you throw all this stuff out. So we’re either at a terribly interesting and ambitious stage, or we’re peering into the dark ages. My feeling is that it is the whole idea of 20/80 - which is a completely bogus set of statistics - but the idea that 20% of the material in an archive gets used 80% of the time. And the way to break that down is through metadata, and we can ensure that people don’t go only to what they know and we can make sure that 20/80 is broken down, that people use different images - that’s something I’ve seen over the years, I think largely because of the work that Jane Newton has done with cataloging. You used to get people come along for images, cartoons they’ve seen in books, and then they’d do searches on our catalogue as the catalogue grew and they come with completely new images of Hitler which you’d never seen anybody use before and they’d want those for their books. And I fear what is going to happen now is that we go back to a digital 20/80, and we get people following the routes that other people have followed through the digital collection, and finding 20% of it again and again and again. Architects talk about paths of design, the foot routes that people take through buildings and across patches of ground, if you want to stop those developing and wearing away the grass - the digital grass - then I think you’ve got to do that with metadata, which is what makes me a sceptic when it comes to things like crowdsourcing.

James: Following on from that, from the idea that there is a concern that digital archives could become paths trodden over and over again, and adding on the fact that the fewer abilities to connect within a collection the lesser chance people will go off those trails, if you introduce something like crowdsourcing into the mix will the trails users are already reaching be those that are described? So perhaps the challenge with crowdsourcing is to get information which people wouldn’t expect to add detail on?

Nick: Much of my academic work has been on the history of audiences. I love audiences. They are very challenging to research because they don’t leave any records: they don’t have to. But they leave subtle records in the changes that they make in the media with which they interact. So look at the design of cinemas and you can tell something about what people do in them: it is very crude, like a river running through a landscape. You can see what audiences in the mass have done, so I’m very much on the side of the users and audiences. I just don’t have a great deal of faith in the present definition of crowdsourcing. I think if you look at crowdsourcing projects the results they produce are not specific to the digital media and they are not even characteristic of the digital media. They are very much like any small society that you might ever have been a member of: five people doing all the work, and every time the committee is re-elected the same people are unopposed for treasurer or chairman. However many members of the society you have, the work is done by a small number of people. And that it seems to me is what a lot of these crowdsourcing projects are. It’s a great ambition to capture what users are interested in, but what is happening in most of these projects is remote volunteers. And I think if you look at the characteristics of the web, even those website we think of as been characterised by user creation - Flickr, Youtube - one user in five hundred creates material for those sites. The way that people use the web is they move freely, they don’t leave things behind naturally. And I think that that might be the best way to capture this extra sort of information, but capturing information about searches for instance. We’re in the British Library now. There might be temptation to archive Amazon and show the range of products that are available in 2013. But historically I’d far more like to have records of how people searched Amazon in 2013, what all those people are actually looking for, what they’re trying to find, what they’re wanting, the way that they move around. And I think that is the wisdom of the crowd, that is the extra dimension. It is not that they’ll sign up and do some editing for you in a remote crowdsourcing project, I think we may be looking to capture the wrong sorts of things.

Was ist Crowdsourcing? art courtesy of Flickr user Hannes Treichl / Creative Commons Licensed

James: Final question then, so in a hypothetical world if you managed to persuade some research funder to give you some money to create a usable interface that would provide rich information on the patterns of use in the your website, how would you envisage deploying that? Would it be through an algorithms which took patterns of use to drive other users toward similar content? Because the problem with the latter, which is common in say Amazon, is that it would do precisely what you don’t want: if people come in on traditional routes finding what they expect, you don’t want to keep narrowing the focus of what people get to. So how might you envisage using that kind of information? Or is that the great unknown in need of an solution?

Nick: If I was given resources I’d hope that we could put money into people cataloguing and creating metadata! Though I understand there are great tensions between building this great fortress of knowledge with careful, organised cataloguing by experts and the fact that this fortress becomes a forbidding thing, and I’d be the first out there trying to knock it down! But somewhere between the two I think there will be natural processes, I don’t think we have to make it happen - if I’m wrong I’m retire and somebody else will take my place. What I’m really worried about is that behind every crowdsourcing initiative I see a group of managers in finger mittens trying to warm themselves over a single candle, because what they are trying to do is cut costs. It is being seized by people who want to dethrone the expert because expertise is expensive. And I do think there is an argument that expertise can be liberating and it can mean that all this material we are throwing out during digitization can be used in new and different ways. Whenever expertise narrows things down and narrows vistas and possibilities, we should get rid of it. But I don’t think that’s happened just yet.

James: Thank you Nick for taking the time to speak to me today.

James Baker

@j_w_baker

Posted by James Baker at 3:51 PM

Tags

Data, Events

22 May 2013

Round-Up: Upcoming UK Digital Scholarship Training Opportunities

While we run our own 15 course Digital Scholarship Training programme for staff at the Library* we are always on the look-out for further training opportunities for colleagues. The following is a small selection of what’s on our radar at the moment. If you can recommend others coming up please do let us know in the comments!

Note: This post is updated periodically as suggestions come in!

Conferences/Workshops

Digital Humanities @Oxford Summer School 2013
8-12 July 2013
DHOxSS delegates are introduced to a range of topics suitable for researchers, project managers, research assistants, and students who are interested in the creation, management, analysis, modelling, visualization, or publication of digital data for the humanities.

GIS in the Humanities Summer School 2013
15-18 July 2013
This free workshop, sponsored by the European Research Council's Spatial Humanities: Texts, GIS, Places project and hosted by Lancaster University, will provide a basic introduction to GIS both as an approach to academic study and as a technology.

DARIAH-DE International Digital Humanities School 2013
19-23 August 2013
A one-week crash course in using the scripting language Python and its Natural Language Toolkit to perform in-depth computational analysis of digital texts. [Ok, not UK based but looks like a great course!]

Free Online Tutorials

Going Digital
Going Digital is a unique training programme aimed at doctoral students who are new to digital research and who want to learn more.
The programme has been running since January and while applications to take part in the workshops have now closed many of the courses have already added their “How-to Guides” to the website. See for instance the excellent 30 January 2013 “Scraping the web” tutorial.

The Programming Historian 2
The Programming Historian 2 (PH2) is a tutorial-based open access textbook designed to teach humanists practical computer programming skills that are immediately useful to real research needs.
A few of us around the Library are working our way through these extremely well-written lessons at the moment and can highly recommend them!

Open Knowledge Foundation School of Data
School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively – evidence is power!
The courses offered here cover a whole range of topics around working with data, from data fundamentals to extracting, cleaning, analysing and presenting data.

Institute of Historical Research Digital Training Online Courses
The IHR offers a really nice selection of free online courses for digital research including one covering Digital Tools for semantic mark-up and text-mining and the like. Thank you to Jonathan Blaney for the head's up there!

*if you’re at #dh2013 this summer swing by my session to hear all about it!

-Nora McGregor, Digital Curator, @ndalyrose

Posted by Nora McGregor at 12:15 PM

Tags

Data, Events, Tools

16 May 2013

On metadata and cartoons

I love cartoons. And few collections of cartoons excite me more than those held by the British Cartoon Archive. Thanks to some meticulous cataloguing its digital archive is a pleasure to explore, so it seemed fitting to me that the BCA was chosen to host a 'Digitising the Image' workshop on 15 May as part of the AHRC-funded Going Digital doctoral training programme. This programme includes events at The Courtauld Institute of Art, Goldsmiths (University of London), the Open University, and the Universities of East Anglia, Essex, Kent, and Sussex, and runs until the end of July this year. I was invited along to this particular event to talk about how archives of digital images can be used in research, and I chose to focus on how metadata can provide novel opportunities for discovering large corpora of digital images - if perhaps through a less appealing door than by going directly to the cartoons themselves (slides here). The rest of the day covered creating images, file types, publishing images, copyright, and metadata, and provided an excllent opportunity to reflect on how these skills - perhaps even more importantly the knowledge of the possibility of aquiring these skills - can be brought to wider audiences. Going Digital is a good start to this process, but only really the first tentative steps into fully integrating 'the digital' into how budding historians, art historians and literary critics are trained in higher education.

Yes it is... Metadata is a love note to the future photograph courtesy of Flickr user sarah0s / Creative Commons Licensed

So, back to metadata and cartoons. A few weeks before the event I asked the BCA to provide me with a dump of metadata. Quite wisely they came back with some sample .xml which - after some tests - I realised I could do something with at a technical level. I was also advised that the metadata was strongest for the 1960s and 1970s. This then became my focus and having received the full dataset I set about doing some quick and dirty transformations and visualisations for demonstration purposes (warning: quick and dirty are the operative words).

The content includes nearly 400,000 lines of data, with date, title, subject, author and various archival data. After doing a little cleaning of the 'Date' field - and where necesary some judicious removing - in Open Refine, I poked around the data for useful fields (I'll admit that plenty more cleaning could be done). By far the most interesting were the 'Title' field - in which is free text of any inscriptions within the cartoon - and the 'Subject' field - containing text entered by the BCA team in order to categorise the cartoon (so for a single cartoon the list of subjects might include 'backgardens', 'budgerigars', 'pigs', 'ballet', 'typewriters'). It is this latter field which makes the collection such a rich resource for researchers.

In order to force the data into Voyant - perhaps the easiest data discovery tool for newcomers to get to grips with - I had to sort the data by date and then remove the data column to create an artifical chronology: not ideal, but necesary as Voyant can only handle text not text vs. date. A fudged solution also had to be found to get the data into Zotero for use in Paper Machines. I wanted to demonstrate topic modeling given recent discussions on the subject in the Journal of Digital Humanities, yet getting the data into an easy to use tool such as Paper Machines proved troublesome: converting the data to bibtex made Zotero (on top of which Paper Machines sits) fall over, so instead I crudely chopped the textual data into annual text files for the years 1960 to 1979 and uploaded them for comparison. Again not ideal at all, but it got the point across for at the event I was able to demonstrate manipulating the data in these tools live: risky perhaps, but if my object was for the audience to understand the power of the tools (which it was!) then static slides wouldn't do. And what more than justified the risk was the evident enthuasiam in the room for the tools and for the fresh discoveries this type of data driven analysis can enable. More evidence then - if any were needed - that doing trumps reading/hearing/seeing when it comes to encouraging critical tool use.

At this point you might be thinking, what did I actually discover in the data. In a sense I discovered what I expected to discover (and not for the first time). The themes of the cartoons in the corpus track the politics of the day, with for example clusters of words around 'Maggie' and 'Conservative' growing to a crescendo by the end of the 1970s. Equally expected, but nonetheless of interest, is the observation that textual content within cartoons during the same period tended toward natural language, with words such as 'british', 'harold', 'christmas' and 'strike' marginal (see below).

Word cloud of 'Title' field for data exported from British Cartoon Archive database for years between 1960 and 1979 (dataset, Voyant)

A more naunced discovery, and one which I think suggests the potential both of the data and of the method, is revealed by comparing visualisations of the 'Title' field and of the 'Title' and 'Subject' fields combined. In the latter case, the subjects overwhelm the titles. This is to be expected: as the subjects are chosen by curators of the data at the point of digitisation we might expect these entries to form clusters and to reuse categories. Hence although the addition of the 'Subject' field to the 'Title' only increased the number of unique words from 30,621 to 33,178, it increases the total words from 660,981 words to 1,208,082 and the most frequent word from 2,877 occurances for "it's" to 12,000 for "party" (note: all counts correct after the application of standard stop words - with a few manual additions - to the data).

Word trend graph of 'Title' and 'Subject' fields for data exported from British Cartoon Archive database for years between 1960 and 1979 (dataset, Voyant)

Word trend graph of 'Title' field for data exported from British Cartoon Archive database for years between 1960 and 1979 (dataset, Voyant)

This additional data also changes the trends within the corpus. So whilst comparing 'police', 'unions', and 'strikes' in the Subject+Title corpus shows 'police' and 'strikes' as occuring with relatively equal frequency over time (or across the length of the text), when we look at only the text within the cartoons 'police' occurs with far greater frequency across the period (see above). What is going on here demonstrates the value of capturing implied meaning in metadata as opposed to merely inscribed text. The word 'police' is simply more likely to appear in cartoons: think of stock phrases such as "Stop! Police!" (and derivations thereof) or the appearance of the words 'Police Station' above or around the door of a building. Words such as 'unions' and 'strikes' are more likely on the other hand to only appear in natural speech: "Who's still out? Any new strikes?", "We're not against pay strikes mate", "I dunno Denis - if these strikes go on". So whereas the word 'police' and the theme of policing might appear together, the theme of striking and unions is more likely to be implied within a cartoon and is then more available for this sort of corpus analysis when that impled meaning has been captured and translated into text.

In the case of the BCA 1960s and 1970s collections this capturing of implied meaning was undertaken by paid experts. Today some of this sort of work can be outsourced to volunteering crowd: our own Picturing Canada project is an excellent example of how this could work for digital images. In a future post I will discuss with Nick Hiley, Head of the British Cartoon Archive, the challenges of creating high-quality descrptive metadata in an era where crowdsourcing is so in vogue.

James Baker

@j_w_baker

Posted by James Baker at 3:37 PM

Tags

Collaborations, Data, Events

15 May 2013

Remembering the Great War - the Harold Ward Letters

As part of our activities with Europeana 1914 – 1918 to remember the 100^th anniversary of the outbreak of WW1, we have recently digitised a fascinating collection of letters sent by Captain Harold Ward to his wife Louise Ward and son Kenneth Martin Ward between 1917 and 1918. This collection, lent to the BL by Captain Ward’s granddaughter for digitisation, comprises some 260 items including 9 field service post cards, written on paper of various shapes, sizes, colours and conditions. In his correspondence, Captain Ward gives a very poignant account of life in the fronts where he was serving with the 2/4th and 2/5th Lincolnshire Battalions, offering a vivid image of the everyday life of his soldiers in the battlefields and the rough conditions created by war.

But perhaps one the most striking aspects of the correspondence is the way Captain Ward express his personal view of the battlefields: the description of his experiences is always accompanied by comments of hope to return home and love towards his family, as we can see in the letter presented here which was written to his son in July 1917. By reading Captain Ward letters, rather than trying to understand the past through a mere description of events, one has the feeling of approaching history from a highly personal and human perspective as if we were transported to the very moment when these events were taking place. The material offers indeed a great resource for researchers and the general public interested in learning more about the Great War, especially for those keen to understand WW1 from the point of view of those fighting in the trenches. The full correspondence is available at http://bit.ly/10F80nU

Letter sent By Captain Harold Ward to his son Kenneth on July 1917

Posted by Aquiles Brayner at 10:20 AM

07 May 2013

Improved access to newspapers: The Europeana Newspapers Project

Image source: National Library of Estonia

This is a brief post to highlight the activities of The Europeana Newspapers Project (ENP), a network of 18 partners (and 11 associated partners) working together to make more than 18 million digitised newspaper pages (including 10 million pages of full-text content) available via the Europeana ecosystem of online services, with aggregation carried out by The European Library.

The project will improve discoverability of content through the application of refinement methods for Optical Character Recognition (OCR), Optical Layout Recognition (OLR), named Entity Recognition (NER) and Page Class Recognition. It also addresses the challenges around quality evaluation for automatic refinement technologies, transformation of local metadata to the Europeana Data Model (EDM), and metadata standardisation in close collaboration with stakeholders from the public and private sector.

Demonstrations of the evaluation tools, OLR, NER tagging and the role of ground truth will take place at ENP first dissemination workshop on refinement and quality assessment at the University Library Svetozar Markovic, Belgrade, 13-14 June.

The British Library is a networking partner in the ENP and will be hosting an information day and a dissemination workshop in 2014.

For further information about the project, visit its website http://bit.ly/17WNlir and follow Europeana Newspapers on Facebook and @eurnews on Twitter.

Posted by Rossitza Atanassova at 5:28 PM

03 May 2013

A novel, a writing machine and a leafy square in London

Front cover of Len Deighton's classic of realistic fiction, a detailed and perceptive account of a bombing raid in 1943 through the eyes of protagonists on both sides of the Channel, published by Jonathan Cape in 1970. Image of Raymond Hawkey designed dust-jacket graciously supplied by Edward Milward-Oliver

According to Matthew Kirschenbaum of the Maryland Institute for Technology in the Humanities (MITH), the first novel to be written on a word processor was Bomber published by Len Deighton in 1970. He explained his thinking in an article in The Slate, 1 March 2013, “The Book-Writing Machine”. The subtitle reads: What was the first novel ever written on a word processor?

The machine was IBM’s Magnetic Tape Selectric Typewriter (MT/ST). Its primary unit weighed 200lbs (ca 91 kg) and when Len Deighton leased it in 1968, a crane had to be used to get it into his house on Merrick Square, just south of the Thames in London. It is not far from Borough Market, a mere six Underground stops on the Northern Line from where the British Library stands today.

Houses on Merrick Square as it is today. Screenshot image from Google Maps street view

It was Deighton’s personal assistant Ellenor Handley who mastered the new technology and there is a wonderful account in the article of the integral role that she played in bringing the writing to fruition, along with the author’s use of maps, weather charts, colour-coded and cross-reference notes, and tags in what was a very complex creative operation.

The article briefly explores the role of the MTST: “At the same instant a character was imprinted on the page from the Selectric’s typing mechanism, that keystroke was recorded as data on a magnetic tape cartridge. There was no screen...”.

When I visited Maryland a few weeks ago I was given the opportunity to see a similar machine that MITH has obtained. It is awaiting some technical care and restoration to bring it back into operation but it already sits in pride of place along with a copy of Bomber in the newly established home of the institute where some of its collections of computer hardware, disks, manuals and printouts are prominently displayed.

Matthew Kirschenbaum kindly sent me a photograph, indicating that this is a Model II, which has a single tape reel whereas Deighton’s would have had two tape reels.

Photograph of IBM Magnetic Tape Selectric Typewriter (MT/ST) Model II at Maryland Institute for Technology in Humanities, shown with permission. Courtesy © Matthew G. Kirschenbaum

Photographs in the Science & Society Picture Library suggest that there is at least one in the Science Museum of the UK. In Europe the machine was known as the MT72 and it was built in the Netherlands. Some further information about typewriters can be obtained from IBM's website.

For the curator, the article is revealing in other ways. It shows how digital academics and scholars are advancing their research, their concern for the materiality of the intellectual process, their interest in the digital practices of writers, artists and scientists; and it is scholars like Kirschenbaum that libraries and archives will be serving in the coming years and decades.

We can look forward to much more. There will be a book entitled “Track Changes: A Literary History of Word Processing” from Harvard University Press. I cannot wait to see it.

Information about Len Deighton himself can be found on the delightful website The Deighton Dossier. He is someone who has always been technically oriented, as is evident in his diverse writings including topics such as the pen and aeroplane engines; no doubt his use of computer technologies for writing has continued to match ongoing developments over the years. A biography about Deighton is being prepared by Edward Milward-Oliver, who was recently interviewed by Jeremy Duns

Jeremy Leighton John, @emsscurator

Posted by Jeremy John at 11:17 AM

Tags

Experiments, Tools

Labs Competition - Win £3000!

Details have been released for the first British Library Labs competition...

Calling all researchers and software developers!

We want you to propose an innovative and transformative project that answers a research question using the British Library's digital collections and if your idea is chosen, the Labs team will work with you to make it happen and you could win a prize of up to £3,000.

From the digitisation of thousands of books, newspapers and manuscripts, the curation of UK websites, bird sounds or location data for our maps, over the last two decades we’ve been faithfully amassing a vast and wide-ranging digital collection for the nation. What remains elusive, however, is understanding what researchers need in place in order to unlock the potential for new discoveries within these fascinating and diverse digital collections.

The Labs competition is designed to attract scholars, explorers, trailblazers and software developers who see the potential for new and innovative research and development opportunities lurking within these immense digital collections. Through soliciting imaginative and transformative projects utilising this content you will be giving us a steer as to the types of new processes, platforms, arrangements, services and tools needed to make it more accessible. We’ll even throw the Library’s resources behind you to make your idea a reality.

To find out more, visit the competition pages (deadline for submission of ideas is the 26 June 2013), sign up to the wiki, express your interest and participate in one of the related events, virtually (17 May 2013, 1500 GMT), at one of our roadshow events, or hack event in London on the 28 and 29 May, 2013.

Good luck!

Posted by Mr Mahendra Mahey at 10:34 AM

Tags

Events, Projects

02 May 2013

Wikipedian in Residence: conclusions

My residency at the British Library is coming to an end today, and so it seemed a good chance to look back at what we've done over the past twelve months. It's been a very productive and very interesting year.

The residency was funded by AHRC, who aimed to help find ways for researchers and academics to engage with new communities through Wikipedia, and disseminate the material they were producing as widely as possible. To help with this, we organised a series of introductory workshops; these were mostly held at the British Library, with several more at the University of London (two at Birkbeck and three at Senate House) and others scattered from Southampton to Edinburgh. Through the year, these came to fifty sessions for over four hundred people, including almost a hundred Library staff both in London and at Boston Spa, and another fifty Library readers in London! Attendees got a basic introduction to Wikipedia - how it works, how to edit it, and how to engage with its community - as well as the opportunity to experiment with using the site.

As well as building a broad base of basic skills and awareness, we also worked with individual projects to demonstrate the potential for engagement in specific case. At the Library, the International Dunhuang Project organised a multi-day, multi-language, editing event in October; IDP staff, student groups, and Wikipedia volunteers worked on articles about central Asian archaeology, creating or improving around fifty articles.

At the Library, one of the most visible outcomes has been the "Picturing Canada" project, digitising around 4,000 photographs from the Canadian Copyright Collection, with funding from Wikimedia UK and the Eccles Centre for American Studies. We've released around 2,000 images so far, as JPEGs and as high-resolution TIFFs, with the full collection likely to be available by early June (we've just found enough left in the budget to do an extra batch of postcards). Other content releases have included digitised books, historic photographs, collection objects, and ancient manuscripts (below).

We also hosted the GLAM-Wiki conference in April, which was a great success, with over 150 attendees and speakers from around the world. Several of the presentations are now online.

While I'm leaving the Library, some of these projects I've been working on will be continuing - we still have another 2,000 of the Canadian photographs to be released, for example! We're also hoping to host some more workshops here in the future (possibly as part of the upcoming JISC program). I'll still be contactable, and I'm happy to help with any future projects you might have in mind; please do get in touch if there's something I can help you with.

—Andrew.

Library curators exploring a new world (or, Alexander the Great being lowered into the water in a submarine); BL Royal MS 15 E vi f20v.

Posted by Andrew Gray at 1:49 PM