Practical people, who believe themselves to be quite exempt from any interest in persistent identifiers, will soon find themselves the victim of broken website links.
Keynes (much modified)
Although seemingly a rather abstract topic, the issue of how we assign identifiers that remain persistent for objects, both physical (e.g. object identifiers as used in museums for centuries) and digital (e.g. electronic catalogue identifiers for the objects) is a necessary part of a museums’ infrastructure that many other services & technologies rely upon. The equivalent of building a house on sound foundations.
For a museum, the common-spead use of digital persistent identifiers in a standard format in reference to a museum object (and/or the versioned museum object record, and/or the museum object digital derivative) on-line and in academic literature would allow for two major advances:
- Automated bibliographic compilation – any journal article or digital publication referencing a museum object could be added into that objects’ bibliography immediately following the publication of the article
- Tracking the ‘reach’ of the objects – to discover how often museum objects are referenced in academic research
Although it should be noted, the former, whilst appealing from a collection management point of view, has the potential to rapidly overwhelm an object bibliography with references. Entering references manually may sometimes be beneficial as it tends to filter out non-creditable research.
A similar use case exists within academic publishing. To allow for a layer of management reporting tools to be built on top of a foundation of journal articles, citations and repositories, a scholarly digital infrastructure needed to be built. This solves problems such as the discoverability of an articles’ publication (identifying & recording where research work is published can be a major administrative task for an institution) and preventing link rot (broken links causing articles to disappear from the internet) creeping in over time and undermining the system. Fundamentally to this infrastructure was the creation of the Digital Object Identifier system (DOI). This allows references to articles to be resolved to wherever the current location of the digital version of the article is. This is achieved by appending the articles persistent DOI number (e.g. DOI: 10.1057/palcomms.2016.105) to a DOI resolver (e.g. http://dx.doi.org/) in order to be redirected to wherever the article is currently hosted: (e.g. http://dx.doi.org/10.1057/palcomms.2016.105 , although if it is not an open access article or journal, the contents of the article might not then be retrievable without a subscription or one-off payment – open access only resolver exists e.g. http://doai.io/.
This approach has now been extended to cover research data that forms the basis for journal articles as well as the article itself. For example, if an article is discussing the changes in materials used by an artist over their life, this may be based on a spreadsheet or database of works and their materials compiled by the author of the article. The DataCite organisation provides DOIs for researchers and institutions to assign to their datasets in the same way that DOIs are assigned to articles e.g. the Archaeology Data Service (ADS) issues DOIs for archaeological investigations. In this way the results, published in the article, can be retrieved from the dataset DOI and then in theory, reproduced or re-analysed by other researchers using the same dataset and methods described in the article.
Although there have been some uses in the humanities, such as the ADS, these developments have been mainly driven by researchers in scientific subjects for various reasons. For example, monographs and other avenues for dissemination of research are seen as more important than journal articles (see for example the HEFCE report on monographs and open access). And then for museums, who both publish research and provide the source material objects (or should that be dataset? see discussion below on terminology) for others research, integrating these developments into museum documentation and practice is at a very early theoretical stage.
It was with excellent timing then that the British Library and the DateCite organisation (as part of the THOR project) organised a workshop before Christmas on this issue of ‘Persistent Identifier Services for the Humanities’.
It was apparent from the discussions in the workshop that the implementation of this infrastructure in the humanities is still very much in its infancy in all institutions. Some of the basic concepts inherited from scientific research do not seem to map directly across. For example, do humanities’ researchers consider their source material ‘data’. Or should we even be referring to ‘data’ as a ‘dataset’? It is not immediately obvious what the distinction between the two terms is. Is an individual museum object a dataset or is a set of museum objects a dataset in the same way as a set of data points in scientific research can be?
A separate point of discussion is how to distinguish between the physical object, its digitised version, its associated catalogue record and different versions of this record, (as knowledge is accumulated/revised) as this is not currently clear in DataCite. Although a similar situation was mentioned in the sciences with ice-core samples, where different digital datasets continue to be published from the same physical ice-core samples.
It’s reassuring to attend a workshop and discover everyone is in the same boat, even if it appears the boat has not been built yet. But this gives us the opportunity to experiment – working internally with our colleagues in Research, VARI and Collections Management, and externally continue to participate and follow potentially relevant research activity (e.g. the Scholix Framework ).
Together, we will reach Valhalla!