11 October 2013
What’s in a name?
This week Stephen Andrews discusses a problem he has identifying author's names.
So here’s the problem; it’s difficult, if not impossible, to find everything that a particular author produces – and sometimes even to know whether the author is the same person. This is because the articles, books etc usually have unique identifiers – Digital Object Identifiers in the case of articles and datasets and International Standard Book Number in the case of books – but their authors do not. As more and more information becomes available through internet search engines, such as Google, I often find myself staring at a list of names that all look very similar and then seeking out personal web pages to try to determine which one relates to the creator of the output I am interested in. A somewhat time-consuming process.
It appears I am not alone. Jackie Knowles of the Repository Support Project provided a good example of the sort of issue I find I am facing. Each of the variations below is a valid entry for one particular individual:
Collis, G P
Paddy Collis
G P Collis
Gerard Paddy Collis
Collis, Gerard P
Gerard P Collis
Collis, Gerard Paddy
Collis, Paddy
Are these the same person? How do I know which variation I should be searching on? In my experience, the more common the name, the more difficult it is to identify an individual.
Libraries have been at the forefront in terms of identifying authors of physical items, such as books, but this doesn’t really scale when it comes to the plethora of heterogeneous outputs that are posted in the digital realm. What is needed is a means by which authors/creators of these outputs can be disambiguated and assigned a unique identifier so that I can use it to retrieve all the known outputs that might emanate from a particular individual. There have been attempts at this, e.g. the Dutch Digital Author Identifier, but up until now it has been a fragmented approach.
More recently we have seen two major international initiatives aimed at assigning identifiers to individuals – the International Standard Name Identifier (ISNI) and Open Researcher and Contributor ID (ORCID). The British Library has been involved in both from the outset and is on the board of the ISNI consortium. Through its involvement in projects such as ODIN (ORCID and DataCite Interoperability Network) and Names, the Library is exploring how such services might be built upon to include links to the outputs of research. Both projects are looking at how published outputs can be linked to their creators, which is ultimately what I’m after. In the case of the EU-funded ODIN project, the aim is to use ORCIDs to connect individuals with the datasets referenced in DataCite. In the JISC-funded Names project the Library partnered with Mimas, a data centre based at the University of Manchester, to develop a pilot system for a name authority service for UK repositories that would uniquely and persistently identify individuals active in research. The project developed an algorithm to automatically disambiguate names using data from a variety of sources. Once disambiguated the individual is assigned an ISNI (see the collated record below).
The fact that the project aimed to encompass grant information, learning materials, presentations and data, as well as papers, in both institutional and subject-based repositories suggests to me that if it were ever to be implemented as a live service the Names system could become a valuable resource. At present around 50,000 individuals have been identified by the system.
So things are looking up. The question now is how these initiatives will interoperate. I don’t want to find myself back to square one having to search multiple sources because an individual’s identifiers haven’t been integrated into the information resources I regularly use. But at least I can see a glimmer of light at the end of the tunnel.