15 October 2019
Why we collect digital publications
Recently there have been a number of discussions on Twitter and elsewhere about access to e-books at the British Library. We know some Readers have concerns specifically about the way that EPUB format books are displayed on our computer terminals in the Reading Rooms. We also know that there is some confusion around how and why the Library is collecting some books in digital and some in print.
From our own testing of access to e-books in our Reading Rooms, we are aware that there are problems in some cases, particularly for certain formats of e-books and also for Readers who want to read a whole book or significant portion of a book.
Readers asked why the British Library didn’t also purchase print copies of books and give Readers the option to request a work in either print or digital. Others pointed out difficulties with citing eBooks created in the EPUB format that do not have page numbers, and print and digital versions having different citation standards. Readers asked why they cannot use their own devices to access content or take photographs or screen captures. There were also questions about whether the Library could improve the access service for Readers that have a disability and require a print copy or a digital experience that was more accessibility-friendly.
I personally think it’s important for libraries to be transparent about how they collect, preserve, and make accessible their collections: libraries exist to provide people with access to information and to preserve a record of published works for the future. Libraries also benefit from feedback, which informs improvement to services. Additionally, Reader comments are a great way of proving that preservation is successful and alerting library staff to potential issues.
As someone who works with Non-Print Legal Deposit (NPLD) and digital publications on a daily basis, this blog post is an opportunity to provide some context to the points raised by Readers on social media. It will also touch upon some broader points about digital publishing in relation to the UK’s legal deposit collection and the Legal Deposit Libraries' (LDLs) work to manage this collection.
What is legal deposit?
Legal Deposit is not specific to the United Kingdom; many countries have legislation in place that allows memory organisations to collect, preserve, and provide access to published works. Legal deposit in the UK has existed in some form for print materials since 1662 with the involved libraries changing over time.
The current UK Legal Deposit Libraries consist of the British Library; the National Library of Scotland; the National Library of Wales; the Bodleian Libraries at the University of Oxford; Cambridge University Libraries; and the Library of Trinity College, Dublin. Together they share responsibility for the national collection of print and digital works published or distributed in the UK.
In 2013, UK Parliament updated existing legal deposit legislation to reflect the digital age and respond to the continual evolution of publishing by allowing the deposit of published works in digital formats, thereby ensuring their long-term preservation. This update also mitigated the possibility of a 'digital black hole' in the national collection for works created in digital-only formats. Where a publication exists in print and digital form, the LDLs can agree with publishers to collect the digital publication instead of print.
In many cases, the LDLs and publishers have agreed to move from a model of print deposit to digital deposit, as this provides benefits for the LDLs and publishers. We believe that there are also significant benefits for Readers.
However, the LDLs do continue to collect some publications in print where a digital version also exists. We do this where the print medium is crucial for the delivery and understanding of the content. Examples include loose leaf format books as well as books that have high-resolution images. Representatives in collection management from each of the LDLs determine which publishers could be transitioned from print to digital deposit, and which works are best suited for deposit in print.
Benefits of digital deposit
There are benefits to both publishers and libraries for moving from deposit of print books to deposit of digital books. Many publishers have established digital distribution systems, to which the Legal Deposit Libraries can be added with relative simplicity. Digital deposit also requires only one copy to be deposited for all six libraries, which can again help to simplify the effort required from publishers.
For the LDLs, there are a number of benefits. Our experience has been that the supply of books increases following a move from print to electronic deposit. In some cases, we have been able to acquire digital copies of books that we had previously been unsuccessful in claiming in printed form.
The costs associated with managing digital collections are different to those for print collections, and direct comparisons are hard to make. It is not, as many people expect, always more cost effective to collect digital publications compared to print. However, digital legal deposit has allowed all the LDLs to work more collectively and collaboratively to share resources and solve problems.
We believe that there are benefits to Readers too, beyond the increase in publications that we now receive. Publications that we receive in digital form are automatically processed and are much more quickly discoverable and accessible following receipt compared to print. Once in our digital repository, the publications can be delivered almost immediately, provided it is at least seven days after publication by the publisher. For the British Library, this means immediate access for our Readers at Boston Spa as well as in London. We don’t need to put restrictions on the numbers of items ordered in one day, so Readers can request as many digital publications as they need.
Despite these benefits, we know that we haven’t solved all the problems with access to e-books in particular. We are working to improve access, as well as anticipate the needs that Readers will have in the near-future, as research topics, practices, and tools change.
Finally, we know that publishing is changing as well as research practice. Digital technologies have changed what can be published, who can publish and how publications are distributed and read (including reading by machines). The collections we are building now, and the systems and services we develop to support those collections, need to be fit for purpose for now and in the future. Collecting digital publications at large scale helps us to build our capabilities, and understand how we need to respond to change.
Digital Infrastructure
There’s a lot of work that happens behind the scenes to allow Readers to access digital works within the collection. This section provides an overview of what’s needed to support the ongoing implementation of legal deposit.
The legal deposit collection grew exponentially with the ability to collect published works in digital formats, and implementation is ongoing. This work involves helping publishers make the transition from print to digital deposit; taking on board new publishers that currently do not deposit their published works; ensuring that deposit is ongoing, sustainable, and that there are no gaps; as well as building and maintaining a network of systems and knowledge needed to support the collection.
At the beginning of implementation, the LDLs prioritised collecting e-books and e-journals from larger publishers, tackling the “short tail” of available publications. Collecting the UK web domain as well as building thematic collections of websites was also in scope, and web archiving became a large focus for preservation and access. In recent years, the Libraries have started collecting other content types, including sheet music and geospatial data.
To support these digitally published works, the LDLs built a digital collection management infrastructure. This infrastructure is primarily located at the British Library and is comprised of a complex network of systems, workflows, processes, tools, and policies that work together to support an end-to-end collection management lifecycle. This lifecycle includes acquisition and deposit; ingest into a digital repository and active preservation once securely stored; cataloguing; discovery; and access at each of the LDLs.
The content files, as well as any associated files (e.g. metadata, cover images), are securely stored on four geographically separate nodes located at the British Library’s locations in London and Boston Spa as well as at the National Library of Scotland and the National Library of Wales.
Where possible, workflow steps are automated, but they rely heavily on the knowledge of Library staff to build and support. And this knowledge is growing as publishers create their works in an array of formats—some of which might only be suitable for certain software applications and hardware—and apply a range of approaches for structuring and supplying metadata.
Non-Print Legal Deposit and its ongoing implementation speak to the changing nature of digital publishing and specifically to how digital technology affects the creation of digital publications as well as to how Readers consume and access content. As digital technology continues to evolve—and publishers apply whichever technology to create their works—the Libraries will need to continue to develop their digital collection management service.
File formats and content types
A main objective for allowing the Libraries to collect published works in digital formats was to ensure comprehensive collecting. Another main driver is preservation for ensuring the longevity of the collection.
For a publication to be deposited with the British Library, it must be the version that is made publically available and created in a format that is suitable for long-term preservation.
At present, the British Library accepts the following formats for content types collected under NPLD:
- eBooks (EPUB, PDF)
- eJournals (PDF)
- Web archive (WARC)
- Geospatial data (raster and vector formats)
- Sheet music (PDF)
This list of formats will change over time as the collection continues to grow and file formats evolve or even become obsolete. The Libraries actively monitor the stability of file formats already represented in the collection as well as remain aware of trends in digital publishing to understand what is on the horizon.
eBooks created in the EPUB file format
Digital preservation is a crucial discipline needed to inform how digital files can be preserved and made accessible to current and future generations of Readers.
Many of the comments we have received concern e-books and the EPUB file format, so it’s important to spend some time here explaining why the LDLs collect this content type in this format.
In comparison to PDF, EPUB is the more suitable preservation format. This is for a few reasons:
- It is widely used and supported within the publishing community
- It is based on open standards
- It is community supported and the specification is openly available
- There are a considerable amount of software applications and hardware devices that support access
Whilst EPUB is preferred from a preservation perspective, Readers might experience challenges with citing content from publications created in this format since it commonly does not include page numbers. A feature of EPUBs is that they can be reflowable, where content appears as one long document and the presentation adapts to the viewing software. Readers can change the size of the font, as well as the font itself in some cases. If a reflowable EPUB did support page numbers, these could be different with each individual viewing and the appearance would adjust to how the Reader has chosen to view the content within the viewer.
The EPUB 3 version of this format supports fixed layout, which resembles more closely the layout of a print book and metadata helps specify the orientation and position of the pages. This means that the page will stay the same no matter how it is viewed and can support page numbers. A citation challenge remains, however, since the page numbers in the EPUB version might not be the same as in its print counterpart where both exist.
For all types of EPUBs, citation guidance recommends using paragraph numbers to reference content and counting the paragraphs from the beginning of an eBook's chapter in which the cited content appears. While this solution is not ideal, the challenge of citing eBooks is not specific to Readers using legal deposit publications but exists for all publications created in this format.
Access to digital publications
The LDLs endeavour to provide as good an access experience as possible, but it must also be in compliance with the restrictions outlined in the Legal Deposit libraries (Non-Print Works) Regulations 2013.
Access to Non-Print Legal Deposit publications is restricted to use onsite at computer terminals at each of the Libraries. There are additional restrictions that apply:
- Single concurrent access per item per library
- Readers at different LDLs can access the same eBook at the same time, but not if they are at the same site.
- No digital copies can be removed from a reading room
- No digital sharing or screenshots
Whilst some Readers might not find the access experience to be ideal, the Libraries must enforce access conditions that are, in most cases, unique to them. A bespoke access solution was built to accommodate the access restrictions as outlined in the Regulations. This solution is comprised of commercially available and open source software and browsers (where these exist) as well as software that has been either developed or configured by British Library staff or external specialists.
What's next for UK legal deposit?
The Libraries are actively reviewing access to legal deposit publications to identify how to improve the access experience. The recent discussions are therefore topical and comments from Readers help us better understand what a better experience would look like.
In 2018, the UK’s Department for Digital, Culture, Media and Sport also undertook a post-implementation review of the implementation of the NPLD Regulations and made the following recommendations:
- Accessibility for disabled users to be brought in line with the Equality Act 2010
- Ability to collect newspapers in the form of digital facsimiles.
- Understand how access to the UK Web Archive can be increased while protecting rights holders
- Understand how NPLD regulations can better align with UK copyright law
This review, as well as responses from library and publisher representatives, is publically available on DCMS’ website, and these recommendations will be reviewed by library and publisher representatives as well as external subject matter experts and the general public in the coming months.
As I mentioned earlier in this post, the implementation of Non-Print Legal Deposit is ongoing. It is important for the Libraries to build this collection and ensure that it is preserved and made accessible now and in the future. The Libraries welcome feedback from Readers about their experience of using this collection; Readers can email [email protected]. The Twitter hashtags #UKNPLD and #UKLegalDeposit publicise news about this collection, including publications and ongoing research to support its collection management.
Caylin Smith
Legal Deposit Libraries Senior Project Manager