THE BRITISH LIBRARY

Digital scholarship blog

26 September 2014

Applying Forensics to Preserving the Past: Current Activities and Future Possibilities

First Digital Lives Research Workshop 2014 at the British Library
 


DL Workshop Holder
 

 

With more and more libraries, archives and museums manifestly adopting forensic approaches and tools for handling and processing born digital objects both in the UK and overseas it seemed a good time to take stock. Archivists and curators were invited (via professional email listservs) to submit a short paper for an inclusive and interactive workshop stretching over two days in London. 

Institutions are applying digital forensics across the entire lifecycle from appraisal through to content analysis, and have begun to establish workflows that embrace forensic techniques such as the use of write blockers for the formation of disk images, the extraction of metadata and the searching,  filtering and interpreting of digital data, notably the appropriate management of sensitive information. 

There are two sides to digital forensics for it begins with the protection of digital evidence and concludes with the retrospective analysis of past events and objects. Papers reflecting both aspects were submitted for the workshop (download DLRW 2014 Outline). 

It provided participants with opportunities to report on current activities, highlight gaps, constraints and possibilities, and to discuss and agree collective steps and actions. 

 

DLRW2014 delegates v3

 

As the following list demonstrates delegates came from a diverse range of institutions: universities, libraries, galleries and archives, and the private sector.

Matthew Addis, Arkivum

Fran Baker, John Rylands Library, University of Manchester 

Thom Carter, London School of Economics Library

Dianne Dietrich, Cornell University Library

Rachel Foss, British Library

Claus Jensen, Royal Library of Denmark and Copenhagen University Library

Jeremy Leighton John, British Library

Svenja Kunze, Bodleian Library, University of Oxford

John Langdon, Tate Gallery

Cal Lee, University of North Carolina at Chapel Hill

Caroline Martin, John Rylands Library, University of Manchester (contributor to paper)

Helen Melody, British Library

Stephen Rigden, National Library of Scotland

Elinor Robinson, London School of Economics Library

Susan Thomas, Bodleian Library, University of Oxford 

Dorothy Waugh, Emory University

 

  DLRWtables1

 

I gave an introduction to the original Digital Lives Research project and a brief overview of the ensuing internal projects at the British Library (Personal Digital Manuscripts and Personal Digital Archives), while Aquiles Alencar-Brayner gave an introduction to Digital Scholarship at the British Library including the award winning BL Labs project. 

Short talks presented overviews of current activities at the National Library of Scotland, University of Manchester and London School of Economics and the establishment of forensic and digital archiving at these institutions, including the value of a secure and dedicated workspace, the use of a forensic tool for examining large numbers of emails, the integration of forensic techniques within existing working environments and practices, and the importance of tailored training. 

Other talks were directed at specific applications of forensic tools in the preservation of complex digital objects in the Rose Goldsen Archive of New Media at Cornell University Library, the capture of computer games at the Royal Library of Denmark, and the challenges of capturing the floppy disks of poet and author Lucille Clifton at Emory University, these media being derived from a Magnavox Videowriter

 

  HKA scrnsht

 

My colleagues Rachel Foss and Helen Melody and I presented a paper on the Hanif Kureishi Archive, a collection of paper and digital materials, recently acquired by the British Library’s literary curators: specifically, outlining the use of digital forensics for appraisal and textual analysis.  

Prior to acquisition Rachel and I previewed the archive using fuzzy hashing (a technique for quickly identifying similar files). 

  

HKA1 fmpro scrnsht
 

After the archive was obtained and forensically captured, metadata were extracted from the digital objects and made available along with curatorial versions of the text documents, and Helen catalogued them using the British Library’s Integrated Archives and Manuscripts System

 

  HKA1 catalogue scrnsht2

 

HKA1 catalogue scrnsht3
  

HKA1 catalogue scrnsht1

One of the most exciting aspects of the archive is a set of 53 drafts of Hanif Kureishi’s novel Something To Tell You, which Rachel, Helen and I decided to explore as an example for the workshop. 

 

 

  HKA1 Graph Logical Size (log) vs Modif Date
              

Figure 1. Logical file size plotted against last modified date: an editing history

 

We used the sdhash tool (produced by Vassil Roussev of the University of New Orleans and incorporated within the BitCurator framework). Like the ssdeep fuzzy hashing tool (which has been incorporated into Forensic Toolkit, FTK), it identifies similarities among files but uses a distinct approach.

  Sdhash STTY scrnsht2

With BitCurator it is possible to direct sdhash at a set of files and ask the tool to first create the similarity digests and then to make pairwise comparisons across the similarity digests for all files: each pair of files being assigned a similarity digest score. 

 

  Sdhash apparent date diff modulus dots 2 crp

 

Figure 2. Similarity score (sdhash) plotted against absolute difference in indicated dates (days) between files (each point represents a pair of draft files): apparently and generally the greater the number of days between the files of a pair, the lower the similarity score

 

This is a preliminary analysis and readers of this blog entry who are familiar with statistical methods may recognise that it might be better to use partial regression or a similar statistical approach. A further small point, as Dr Roussev has emphasised, a 100% similarity does not mean that the files are identical; cryptographic hashes can serve this purpose and are to be incorporated in future versions of the sdhash tool which is still under active development.  

 

Following the more formal talks we began an open discussion with the aim to identify some priority topics, and subsequently we divided into three groups to address: metadata, access and sensitivity, respectively, concluding the first day. On the second day, we focussed the conversation even more and as two groups addressed cataloguing and metadata on the one hand, and tools and workflows on the other hand. 

Steps towards specific conclusions and recommended actions were made in preparation for publication and dissemination. 

The desire to continue and extend the collaboration was strongly expressed, and fittingly Cal Lee concluded the workshop by updating us on developments of the BitCurator platform and the launch of the BitCurator Consortium, an important invitation for institutions to participate and for individuals to collaborate. 

BitCurator is going from strength to strength: receiving an extension of the project, formally launching the BitCurator Consortium, and releasing Version 1.0 of the BitCurator software.  

 

Many congratulations to Fran and Caroline on their email project becoming a finalist for the Digital Preservation Awards 2014: the University of Manchester Library’s Carcanet Press Archive project which among many things explored the use of the forensic tool Email Examiner along with Aid4Mail (which incidentally has a forensic version). 

 

  IMGP6218crp

 

The workshop was jointly organised by me, Cal Lee (University of North Carolina at Chapel Hill) and Susan Thomas (Bodleian Library, University of Oxford).  

Very many thanks to the delegates for all of their participation over the two days. 

Jeremy Leighton John, Curator of eMSS 

@emsscurator

Comments

The comments to this entry are closed.