Digital scholarship blog

218 posts categorized "Experiments"

02 April 2014

Unconferencing; or a digital scholarship training experiment

The British Library's Digital Scholarship Training programme aims to provide library colleagues with the skills and knowledge to best exploit the digital transformations taking place around us, both in and outside the research community.

To close the third semester of this programme we in Digital Research decided to embark on something of an experiment: to transform our one-day 'What is Digital Scholarship?' course (one of the sixteen one-day courses we offer) into a staff-only unconference on Digital Scholarship and Working Innovatively with Digital Collections.

For those out of the unconference loop, an 'Unconference' brings together delegates under a particular theme but the schedule for the day is entirely created by the attendees. Anyone can propose a session beforehand or on the morning of the event. The day begins with all who have proposed sessions having an opportunity to briefly pitch their ideas. A vote is then cast (at our event three votes each cast as ticks/crosses/marks placed next to a session name on a flip-chart, so relatively anonymous!) and those with the highest votes form the final schedule for the day (as we couldn't guarantee that there would be room for every proposed session, we asked colleagues not to over prepare!) Pitchers then act as facilitators for their sessions. Not all participants are required to propose a session - most, in fact, come along for the ride!

2014-03-28 13.04.38

As digital scholarship and innovation is happening across the British Library, we wanted to give colleagues the opportunity to share their interests and skills with others. We suggested that sessions could be on anything related to digital scholarship and innovation with digital cultural heritage collections in the broadest sense, and to avoid the event amounting to little more than a series of talks we asked colleagues to fit their proposal into one of more of the following categories:

  • Talk ...such as, a presentation on a digital project.
  • Make  ...such as, a session where attendees collaboratively build or work on something like tagging or geo-referencing a collection.
  • Teach ...such as, a session where you show a group of people how to do something, such as how to update Wikipedia articles.
  • Play ...such as, a discussion aimed at generating fresh, creative ideas for innovating with our digital collections or services.

We then put together a skeleton schedule (Pitching and Voting session 10-11.15 - First sessions 11.30-12.30 - Break 12.30-13.30 - Second sessions 13.30-14.30 - Third sessions 14.45-15.45 - Wrap-up, discussion & reflection 16.00-16.30) and put out a call for contributions via various internal channels.

On the day, every proposed session passed a threshold of interest and we hosted ten sessions across three rooms. These ranged from a discussion of open licensing, an introduction to editing Wikipedia, a talk about Chinese social media (who knew our one million Flickr images are blocked by the Great Firewall of China?), and a workshop on creative reuses of Europeana content, to an update on our web archiving and associated access activities, an informal survey on how we might engage with local history communities, and a lively session around what access to our digital content should and could look like in an ideal world.

  2014-03-28 14.07.07

As should now be clear, having no schedule did not equate to little organisation, to anarchy. Rather, a carefully constructed framework needed to be built around the day to ensure everything ran smoothly. And as it turned out, the event was as creative, provocative, and fun as we had hoped, as well as being enormously productive. In particular, what emerged was clear feedback regarding how the Digital Research team can develop our future training provision in line with staff needs, including not only a sense that colleagues valued a varied and creative programme but also around how best to introduce colleagues to digital scholarship. As so the unconference will return, but likely as an external event embedded within our otherwise staff-only programme. A few eager twitterers have expressed an interest in this already, but if you'd like to collaborate with us on an unconference around 'Digital Scholarship and Working Innovatively with Digital Collections' (working title) sometime later in 2014 then get in touch via email ([email protected]) or Twitter (@j_w_baker or @ndalyrose). We'd love to hear from you.

James Baker

Curator, Digital Research

@j_w_baker

27 March 2014

Tracking Public Domain Re-use in the Wild

We folks over at #bldigital are excited to be partnering with Technology Strategy Board and IC tomorrow on their next Digital Innovation Contest. £25K is up for grabs to encourage digital innovation in data. 

Our challenge, should you choose to accept it, is to encourage and establish the necessary feedback loop for tracking and measuring the use and impact of the public domain content we make available online.

The Library seeks to enrich the cultural life of the nation and stimulate economic and social growth and the release of 1 million images and counting into the public domain and on to Flickr Commons is one way in which we hope to fulfill that mission.

 

But what happens when this content is released into the wild? Once online, we have little way of following that content as it is re-used, which makes it difficult to measure the creative and economic benefits of having done so.  

At the moment we try and capture innovative re-use as best we can manually, primarily by scouring social media channels for mentions, but this is hardly sustainable nor scalable and we know there is much inspired activity we are missing. 

Colouringin1
Colouring-In Pages for Children by Zoe Toft

What we need is an innovative solution for enabling the sharing of this content on platforms which are popular but outside of our control, while retaining the ability to see how it has eventually been remixed and reused.

A formidable challenge we know, and one which is shared by anyone putting content online today. Think you might be able to crack it? Visit IC Tomorrow for more details on the call. Closing date is noon on May 7, 2014.

@ndalyrose

07 February 2014

The Metadata Quest

Posted on behalf of Sara Wingate Gray, originally posted here: http://artefacto.org.uk/content/metadata-quest-part-1

Recently, we've been involved in an exciting new project, which comes out of some exploratory work we produced during the British Library Labs May 2013 hack event as part of their inaugural 2013 digital collections competition. Here at artefacto, we were particularly excited when BL Labs launched, in March last year, not least because we'd been following the pioneering digital and creative libraryings of Harvard Library Lab for several years, alongside the more recent developments of the Digital Public Library of America and the wonderful work that the New York Public Library has been getting up to (check out their historical menu's project for a start). What's exciting about all these developments (and there's so many we could list in this vein: Europeana ... in fact, have a list, courtesy of The Open Knowledge Foundation) is the opening up of public access to this "digital reserve of knowledge", and the potential it brings, in the case of BL Labs, for instance, "to create new narratives from the British Library’s vast incredible digital collections from 19th Century books to archived websites and wildlife sounds to manuscripts to name but a few examples."1

For us, what's also intriguing, in this new world collision and collection of objects, and people, in digital space, is how we might go about piecing together, jigsaw-like, the underlying narratives which sit within: how do we help reveal and "unlock" each object's own story?

The May 2013 hackday event at BL Labs gave us the opportunity (and the excuse) to explore this question: with access to their 68,000 digitised volumes of text (from the 19th Books collection), sounds (e.g. the archive of Resonance FM, Survey of English Dialects), Ordance survey maps, and much much more, it promised a veritable feast of digital content, and importantly, metadata to get our hands on.

That metadata is finally a hot topic of discussion worldwide is not only music to the ears of all librarians out there (well, ok, not all of you guys, but you're the groundswell folks!) but it also means we don't have to give you a definition. Except we probably do, since all this consorting with the NSA is frankly giving metadata a bad rep right now (and no, Guardian newspaper, metadata is not just "information generated as you use technology"). Wikipedia provides the very vaguely straightforward term "data about data" as a definition, while Zeng and Qin (2008, p.7) note that "[b]roadly speaking, metadata encapsulates the information that describes any document or object in both digital and traditional formats."2 In the context of any of the British Library's digital content, for instance, this could mean information about a painting's date and artist, a map's geographic range, or a sound's physical placing (to name just a few instances or rather, metadata elements: take a look at The Library of Congress's sample of metadata for an 1864 letter from Alexander Melville Bell to Alexander Graham Bell if you really want to explore metadata in more detail).

Essentially, what excites us about metadata, is that by harnessing it in different ways, new surfaces and territories can suddenly open up in a digital object's narrative; by making explicit, textually and visually, an object's creation space, or time, new threads of connections are discovered and yarns newly spun.

The result of our brief two days at the BL Labs event was a quick build of an experimental version of our imagined platform, where digital content could sit waiting to be explored in these ways, through people navigating and thinking about these facets, and although not ultimately winning the Labs competition, we got some great feedback which suggested we should continue with our project and idea. We named it Curatorial and went on our merry way, content in our imaginings. It was great, therefore, to (some months later) find ourselves participating in the Data Tales project, which gave us the opportunity to further develop the platform: under the AHRC's Digital Transformations Network Data – Asset – Method: Harnessing the Infinite Archive network, we've been able to spend time re-building and imagining further what is possible, and what stories can be told, when metadata, objects and people digitally collide. Partnering with the Horizon Digital Economy Research Institute based at the University of Nottingham, Loughborough University, and the British Library for the Data Tales project has meant a great team experiment, and we presented the first results of our work together at a workshop at the British Library (January 24th, 2014).

One of the first ports of call when approaching this project was: how can we get our hands on the metadata we want? What types of collections (and their 'owners' or 'content holders'?) are out there? Our previous BL Labs experience grappling with the vast range of data types available from the British Library was really helpful, not least for priming us for detective work (what format is that geolocation data in exactly?), and so our sleuthing, and structuring, commenced.

"Why does a man need to tell stories to others and himself? It is a way by which the mind uses fantasy to structure the chaos of the original experience. Complex and unpredictable, the vivid experience always lacks what fiction can provide: a closed time, a hierarchy of events, the value of people, effects and causes, the connections under the actions."3

This is where the quest for metadata begins.

Why not come and see us speak at Making the Most of Metadata, on Wednesday 12 February, 2014 at the British Library.

TBC.

1. http://britishlibrary.typepad.co.uk/digital-scholarship/2013/03/bl-labs-launch-event.html

2. Zeng, M. L. and Qin, J. 2008. Metadata. New York: Neal Schuman Publishers.

3. Vargas Llosa, M. 1997. The Truth of Lies. In Making Waves. New York: Farrar, Straus and Giroux.

12 December 2013

A million first steps

We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images to us, allowing us to release them back into the Public Domain.

The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of.

Which brings me to the point of this release. We are looking for new, inventive ways to navigate, find and display these 'unseen illustrations'. The images were plucked from the pages as part of the 'Mechanical Curator', a creation of the British Library Labs project. Each image is individually addressible, online, and Flickr provies an API to access it and the image's associated description. 

We may know which book, volume and page an image was drawn from, but we know nothing about a given image. Consider the image below. The title of the work may suggest the thematic subject matter of any illustrations in the book, but it doesn't suggest how colourful and arresting these images are.

(Aside from any educated guesses we might make based on the subject matter of the book of course.)

11075039705_36900f9312

See more from this book: "Historia de las Indias de Nueva-España y islas de Tierra Firme..." (1867)

Next steps

We plan to launch a crowdsourcing application at the beginning of next year, to help describe what the images portray. Our intention is to use this data to train automated classifiers that will run against the whole of the content. The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.

The manifests of images, with descriptions of the works that they were taken from, are available on github and are also released under a public-domain 'licence'. This set of metadata being on github should indicate that we fully intend people to work with it, to adapt it, and to push back improvements that should help others work with this release. 

There are very few datasets of this nature free for any use and by putting it online we hope to stimulate and support research concerning printed illustrations, maps and other material not currently studied. Given that the images are derived from just 65,000 volumes and that the library holds many millions of items.

If you need help or would like to collaborate with us, please contact us on email, or twitter (or me personally, on any technical aspects)

The Initial Layout

The images have been tagged to aid browsing and to provide new views on the works themselves. They are tagged by publication year (eg 1764, 1864, 1884), by book (eg 003927270000149253), by author (eg Charles Dickens) and by other means.

This structure is helpful but we can do better! We want to collaborate with researchers and anyone else with a good idea for how to markup, classify and explore this set with an aim to improve the data and to improve and add to the tagging. We are looking to crowdsource information about what is depicted in the images themselves, as well as using analytical methods to interpret them as a whole.

We are very interested to hear what ideas and projects people use these images for and we would ideally like to collaborate with those who have been inspired to explore them.

Finally, while they have been released into the public domain, we would like to direct you to a post by Dan Cohen titled "CC0 (+BY)" There is no obligation for you to attribute anything to us, but we'd appreciate it. The dataset will develop over time, and will improve after all!

Some examples

11223149846_449d526f31_z

"Manners and Customs of the ancient Egyptians, ... Illustrated by drawings, etc. 3 vol. (A second series of the Manners and Customs of the Ancient Egyptians. 3 vol.)" by WILKINSON, John Gardner - Sir

11305478975_8d6c506459

"The United States of America. A study of the American Commonwealth, its natural resources, people, industries, manufactures, commerce, and its work in literature, science, education and self-government. [By various authors.] Edited by N. S. Shaler ... With many illustrations" by SHALER, Nathaniel Southgate.

11307227433_e5bb52c3ba_z

"Comic History of Greece from the earliest times to the death of Alexander the Great ... Illustrated, etc" by SNYDER, Charles M.

11228106243_cfaba62d0f_z

"The Coming of Father Christmas" by MANNING, Eliza F.

11232670175_86031d436a_z

"The Casquet of Literature, being a selection of prose and poetry from the works of the most admired authors. Edited with biographical and literary notes by C. Gibbon ... and M. E. Christie. Illustrated from original drawings by eminent artists" by GIBBON, Charles - Esq., and CHRISTIE (Mary Elizabeth) Miss

25 November 2013

Mixing the Library: Information Interaction & the DJ - Origins

Posted on behalf of Dan Norton - British Library Labs 2013 Competition Winner

Following the completion of my PhD at the University of Dundee, I spent a period of time as Artist in Resident at Hangar Centre for Art and Research, Barcelona. There, I collaborated closely with the Department of Library Science and Documentation at the University of Barcelona to explore the potential value of the DJ’s model of information interaction in the field of Library Science and particularly with the use of digital collections.

It was with this background, I decided to enter the British Library Labs 2013 competition after participating in an online meeting: 

British Library Labs Virtual event, May 2013
British Library Labs Virtual Event (Google hangout), 17 May 2013

http://www.youtube.com/watch?v=RFt0NvbTFHs

My project idea was to apply the DJ's way of working with music, their model of information interaction (developed as part of my doctoral study entitled: "Mixing the Library: Information Interaction and the DJ”), in a prototype for working with multiple data-representations from the British Library digital collections. This practice-led investigation would be used to describe the interface requirements for collecting, enriching, mixing/linking, and visualizing information from large digital libraries.

The prototype would be built on previous work and would attempt to combine the two essential interface features of the DJ's model: continual visual presence of the collection as a writeable menu system; and a "mixing screen" that allows two (or more) data fragments from the collection to be presented, combined, and linked. These simple interface requirements would be used to build sequences through an archive by linking articles, and later be evaluated and extended as a tool for developing semantically rich data, to be incorporated into the RCUK SerenA project (serena.ac.uk) of which my doctoral study was a part.  

The tool would hopefully  operate as a powerful scholarly interface for learning and development in digital collections, for sharing annotated semantically rich data, and for communicative exchange of findings. A screencast demonstrating the interface is available here:

Prototype Mixing Interface
Prototype Mixing Interface

 

http://ablab.org/BLLabs/test/index2.html

I think it would be important to give some background and explanation as to why I think the DJ’s model of Information Interaction is valuable for working with multiple data types (images, video, sound, text etc.)

Background

'The significance of the 'information explosion' may lie not in an explosion of quantity per se, but in an incalculably greater combinatorial explosion of unnoticed and unintended logical connections.' (Swanson 1996).

 

Dan Norton DJaying
Wakanegra
Sonidero and Greenpoint Reggae support Mungos HiFi, Palma. 2012

Creativity in the DJ’s system apparently simple. It is entirely reliant upon bringing together sequences of material and exploring the possible combinations. It can do this simply as a sequence of tracks one after another, or in my complicated way which involve over layering, sampling, and mashups.

The DJ’s creativity enters the system in two fundamental information behaviours: selecting and mixing. With these two behaviours alone personal expression and intent can enter an activity that always reuses stored content.

Selecting is the principle of creative behaviour. It reduces information volume, builds useful groups of related material, and in the live event is done responsively to feedback and personal idea.

Mixing is the second creative information behaviour. It combines the material and explores the connection between articles. Mixing formulates previously unnoticed connections from within the archive.

A Model of Learning

Learning is intrinsic to the DJ’s model of interaction. Learning and memory develop through retrieval, organisation, classification, and addition of metadata. The association of human memory to the digital image of the collection (digital memory), creates a system in which the DJ can work in a creative flow with information, moving between idea and information.

The model incorporates retrieval and learning, with creative development and a publication workflow. Newly created texts are directly tested in a field of listeners, and may also be published and released as informational resources. The model is described in the image below.

 

DJ's Model of Information Interaction (Norton, 2013)
DJ's Model of Information Interaction (Norton, 2013)


The next few blogposts will discuss my experiences of using the British Library’s collections, working with Labs to develop the first functioning iteration of the interface and the future developments of the work.

 

20 November 2013

The georeferencer is back!

Coinciding nicely with GIS Day 2013 the British Library's Lead Curator of Digital Mapping Kimberly Kowal has just released online the biggest batch yet of digitised maps needing georeferencing: 2,700 of them! 

She explains, "We're asking the public to help "place" them, ie identify their locations by assigning points using modern mapping. It can be a challenge, but it is an opportunity to discover familiar areas as they existed around one hundred years ago." 

Read more about it over on the British Library Maps & Views Blog.

Or just dive right in and get georeferencing!

 

 

 

08 November 2013

The Sample Generator - Part 1: Origins

Posted on behalf of Pieter Francois.

Imagine being asked to describe the tool you always wanted when you were writing your PhD.

Imagine being asked (without having to worry too much about technical implementations), to make a case for a digital tool that would have:

  • saved you enormous time
  • allowed you to expand drastically the number of sources to study
  • allowed you to ask new and more relevant research questions 

Which digital tool would you choose?
What functionality seems crucial to you but is surprisingly lacking in your research area? 

It was with this frame of mind I decided to enter the 2013 British Library Labs competition with the idea to create a Sample Generator, i.e. a tool which is able to give me an unbiased sample of texts based on my search criteria. Being one of the chosen winners provided me with an opportunity to put together a small team of people from both within and outside the British Library to make it reality.

When studying the world of nineteenth-century travel for my PhD I used the collections of the British Library extensively. Being able to look for relevant material in roughly 1.8 million records is a researcher's dream. It can also be a curse.

Travel_query_primo

Snapshot of catalogue window with search word "travel"

How did the material I decided to look at fit into the overall holdings? Sure, my catalogue searches did produce plenty of relevant material, but how representative was the material I looked at for the overall nineteenth-century publication landscape? Even when assuming the British Library holdings are as a good a proxy as any for the entire nineteenth-century British publication landscape, this is a very difficult question to answer. Historians and literary scholars have designed many clever methodological constructs to tackle such issues of representativity, to tackle potential biases of the studied sources and to deal with gaps in their source material. Yet very few attempts have been made to deal with these issues in a systematic way.

The ever growing availability of large digital collections has changed the scale of this issue, but it did not change its nature. For example, the wonderful digital '19th Century books' collection of the British Library provides you access to approximately fifty thousand books in digital form and to the enthusiast of text and sentiment mining or scholars interested in combining distant and close reading its potential is phenomenal. However, the impressive size of the collection does not deal with the crucial questions:

How these books relate to the approximately 1.8 million nineteenth-century records the library holds?

How the digitized books of the '19th Century books' collection fit into the overall nineteenth-century publication landscape.?

Large numbers can create a false sense of completeness.

The Sample Generator does provide researchers a way to understand more fully the relation between the studied sources and the overall holdings of the British Library. Whereas a traditional title word search in the British Library Integrated catalogue generates an often long list of hits, the use of the Sample Generator allows you with a few additional clicks to generate structured unbiased samples from this list. The key innovation is that these samples mimic the rise and fall in popularity of the searched terms over the nineteenth-century as it is found in the entire British Library holdings for this period.

Depending on the amount of research time available it is possible to change the sample size (or, for cross-validation purposes, to create several samples based on the same search criteria). Furthermore, as the Sample Generator not only works with the catalogue data (metadata) of all nineteenth-century books the British Library holds, but also keeps a special focus on the metadata of the digital '19th Century book' collection (see, http://britishlibrary19c.tumblr.com/ for a representative sample), it is possible to create samples of only digitized texts. These samples can then be further queried by using advanced text analysis and data mining tools (e.g. geo-tagging). As all the samples generated by the various searches will be stored with a unique URL, the samples become citable and they can be shared with peers and be more easily used in collaborative research.

Whereas in this phase of the project the Sample Generator has only been tried out on the the nineteenth-century holdings of the British Library and on the digital '19th Century book' collection, its application is nearly universal. The Sample Generator can be implemented on any catalogue (or even bibliography) and, if relevant, links can be made to one or more digital collections.

Adding such a link with a digital collection allows users to make a different type of claim. For example, the finding 'I observed trend X in digital collection Y' is replaced by the finding 'I observed trend X in a structured unbiased sample Y which is representative of the entire catalogue/bibliography Z'. This adds an important functionality to the increasing number of large digital collections as it removes the inherent, yet often poorly documented, biases of the digitization process (although it introduces the curatorial biases of the much larger collections which are fortunately usually better documented and understood as generations of scholars have come to term with these).

Finally, the Sample Generator is a great hypothesis testing tool. Its use allows scholars to cover a lot of ground fairly quickly by testing a range of hypotheses and ideas on relatively small sample sizes. This allows for a creative, yet structured and well documented, going back and forth between the conceptual drawing board and the data. Whereas such a structured dialogue is fundamental in the natural and social sciences, it is largely lacking in the humanities where this dialogue between ideas and data has tend to happen in a more haphazardly fashion.

The past four months were spent on turning this general idea (which at times felt as overly ambitious) into reality. We faced several challenges, for example the catalogue data was incomplete and inconsistent. Furthermore, I firmly believed that it was essential to accompany this tool with some case studies highlighting its transformative potential. Given the amount of labour and the range of skill sets necessary to complete both tasks, the project had to be team based. Without both the time and intellectual contributions of Mahendra Mahey, Ben O'Steen, Ed Turner and Justin Lane the Sample Generator would still simply be the digital tool I always wanted to have. 

Pieter Francois ~ is one of the winners of the 2013 British Library Lab competition. He works at the Institute of Cognitive and Evolutionary Anthropology, University of Oxford, where he specializes in longitudinal analysis of archaeological and historical data.

The next blogposts of this short series, written by various members of the team, will focus on how to use the Sample Generator, on explaining the technical nuts and bolts at the back end of the tool, and on recounting the experiences of collecting the necessary data to test drive the tool.

30 October 2013

Guess the journal!

Over recent months I’ve been working on-and-off with a collection of metadata relating to articles published since 1995 in journals the library have categorised under the ‘History’ subject heading. 382497 rows of data (under CC0) about publication habits in the Historical profession, which lend themselves to some interesting analysis and visualisation.

HJA_30Js_ii
To recap from previous posts on this blog and on another, I started this work by extracting words which frequently occurred within journal article titles. Having filtered out words whose meaning was fuzzy (‘new’, ‘early’, ‘late’, ‘age’) or whose presence was not helpful (‘David’), I was left with this list of topwords (I’ve avoided ‘keywords’, I just don’t like the word at the moment):

africa america archaeology art britain british china chinese cultural culture development empire england europe france historical history identity life making medieval national policy political politics power revolution social society state study women world

Next I created a .csv where each row represented an occurrence of a one of these 33 topwords in an article title. This totalled 209210 rows; and though this was less than the total number of rows, as many titles contained more than one of these words some articles were represented more than once.

Before we get to the fun bit, there are a number of problems with the data that need pointing out:

  • There are some odd gaps and declines in article volume for some journals around 2005. This isn’t due to actual publication trends, so we are working on why the data isn’t accurate – huge thanks to the Metadata Services team (especially Corine Deliot) for their hard work.
  • The volume of English language titles smother the various English, Italian and – notably thanks to Zeitschrift für Geschichtswissenschaft – German titles, leaving us with very Anglophonic data. I’d like to do some translating, but for now I’ll restrict myself to trends in English language articles.
  • The data isn’t smoothed by articles per journal issue (or articles per journal per year), thus ‘power’ journals are created on sheer volume of output alone (and, as we all should know and should hope to be the mantra of future academic publication, less can be more…).
  • The data includes reviews, though this isn’t necessarily a bad thing as it adds book titles to the list of titles mines (hence why ‘David’ is one of the unfiltered topwords).
  • Some words have multiple meanings (china) or are ill-suited to simple text mining (art), but then corpus linguists have known this for years.
  • Some journals in the data are not really history journals, but rather politics and current affairs publications with a sprinkling of historical content. Archaeology is similarly problematic, but I’ve left these journals in for now out of a sense of GLAM solidarity.

Despite all of this, I’d like you to play a game of guess the journal from a network graph; a network graph representing data for the 30 highest ranking English language History journals in terms of article volume published between 1995 and 2013. On one hand you doing this will help me validate that my data – and this particular way I’ve chose to represent it (a force-directed ‘Force Atlas’ graph generated using Gephi) – has some value; Adam Crymble has a nice example of how this can be useful. On the other it should be a bit of fun.

HJA_30Js_i
So, onto that long promised fun bit. Knowing the following:

  • That each number on the network represents a journal name,
  • that each word within square brackets is a topword from an article title,
  • that the thickness of the line between the word and the number represents the occurrence of that topword in the numbered journal,
  • and that the colouring represents the group (or modularity) the numbered journal has been assigned to based on the structure of the network;

can you guess which number the following journal is represented by? (Or is this whole thing meaningless?)

  • Antiquity
  • English Historical Review
  • International Journal of African Historical Studies
  • International Journal of Maritime History
  • Journal of American History
  • Journal of Asian Studies
  • Journal of Social History
HJA_30Js
Bimodal Force Atlas graph for History Journal Articles published 1995-2013. For more detail (and with apologies for the fuzzy compression above, you'll probably need it!), download the PNG or SVG version.

To start of you off, I’ll gift you that American Historical Review is number 34 – right at the heart of the network, not surprising given the volume of output. I’ll also give you a little derived data to help you make up your mind.

Answers in the comments please!

@j_w_baker