Digital scholarship blog

01 March 2013

Phylogenetic Tree Visualisation and Annotation

Jmsfiles small

Low resolution image of the file tree of a hard drive using FigTree

Forensic techniques make it possible to extract large volumes of information from the storage media of personal computers. A subsequent challenge lies in the effective presentation of this information. Paper archives are commonly represented by a set of arrangement records, which taken together essentially map the way the papers such as letters and notes were originally held in envelopes within folders within bundles of folders. Computer files are similarly arranged in a logically hierarchical system within digital media.

Phylogenetic software has been designed to show the evolutionary relationships between living organisms, and can enable annotation at each internal node (the nodes akin to folders) as well as at the leaf nodes (the species or entities at the tips of the branches). Annotation may take the form of core metadata created automatically by a program, and supplementary metadata compiled by a scientist or curator, and there may be links to pertinent ancillary information or objects including digital images of the species. There are a number of websites dedicated to the tree of life such as ToL. There is an article about the rationale of the approach in the journal Zootaxa. It also points to some of the other tree of life websites.

The javascript library jsPhyloSVG makes it possible to create very attractive trees (both rectangular and circular) combined with binary and bar charts. The maps may be interactively dynamic using vector based (suitable for close resolution without pixelation) SVG and HTML5.


The Personal Digital Manuscripts project at the British Library is exploring these and other techniques to see which are most suited for specific purposes.

The emphasis in recent years has been on increasing the number of species or entities to be handled. As phylogeneticists themselves have pointed out (eg in an article in Trends in Ecology and Evolution February 2012), there are three approaches to viewing complex trees and networks: (i) multiscreen mosaic; (ii) pre-emptive tiling of very large images; and (iii) focus and context using special geometric effects. Each has its merits and is likely to be most effective in combination with one or more of the others. 

Recently fractal algorithms have been adopted by OneZoom Explorer and the DeepTree system enabling interactive visualisation of the tree of life. One manifestation has been a multitouch table connected to a multiscreen system.

It is usually necessary first to get the structural information into a special form such as the Newick format (named after a celebrated seafood restaurant in New Hampshire where the standard was first agreed).

The versatile FigTree software was used to create a circular tree of nearly 14,000 files from one of the hard drives of John Maynard Smith at the British Library based on Newick (seen at the beginning of this blog). The software Archaeopteryx is able to handle large trees and may be used to convert Newick into other forms such as phyloXML which can in turn be viewed and edited using an XML editor.







Three very low resolution images of the file tree of the same hard drive using Archaeopteryx. The bottom one is the result of clicking on one of the nodes in the second one. This is one of several ways of quickly navigating the tree. The top image of the three images displays the tree in an unrooted form


All of the software mentioned in this blog (except DeepTree) is up and running in Digital Scholarship Labs at the British Library, along with a high resolution multimonitor system. There is a lot more to explore and this will be one of a series of Digital Scholarship blogs about adapting existing technologies for showing large tree and network visualisations with annotations.

Jeremy Leighton John @emsscurator


The comments to this entry are closed.