For better or worse, English is the predominant language used by the international
scientific and medical communities to disseminate knowledge. The 26 characters of
the Latin alphabet are also arranged in names: non-unique patterns. At the time of
the origins of modern biomedical research, names may have been relatively unique,
at least within the biomedical research community. However, this is no longer the
case.1 We now possess the capacity to visualise atoms using atomic force microscopy.
We also possess the capacity to launch telescopes into space to peer into distant
galaxies. However, biomedical researchers do not possess the capacity to automatically
distinguish between two researchers who happen to share the same, or similar, names.
One decade after the publication of articles on this subject in PLOS Medicine and
PLOS Blogs,2–4 the embarrassment of this realisation is eclipsed perhaps only by the
continued need to plea for a solution to this ‘intractable’ problem.
Before the National Institutes of Health (NIH) of the USA and its National Library
of Medicine (NLM) launched the modern PubMed system, the math, physics and computer
science community solved this problem with the creation of arXiv in the early 1990s.
Like modern digital object identifiers (DOIs) for unique electronic documents, this
largely self-curated system linked non-unique, ‘clickable’ author names with unique
author identifiers. Although arXiv and self-curation are not without flaw, this problem
has plagued the biomedical research community since at least the inception of arXiv
over two decades ago. As a dearth of electronic archival technology is not the problem,5
what continues to drive this problem?
When the biomedical research community was relatively small (approximately one to
three authors per publication), the first–last/corresponding author paradigm sufficed.
At least as recently as the 1970s, biomedical researchers could still publish dozens
of pages meticulously describing how something seemingly as trivial as ‘dirt’ on electron
microscopy slides was actually a seminal scientific discovery.6 With the modern pressure
of word limits, it cannot be known how much insight into this process of discovery
of new knowledge is now lost to the need for concision. International collaborations
with thousands of physicists now relegate authorship to alphabetical appendices.7
In the case of one of the first genomics publications with >1000 authors,8 the archaic
first–last/corresponding author paradigm was maintained.
By the 1950s, it was ‘too much to expect a research worker to spend an inordinate
amount of time searching for the bibliographic descendants of antecedent papers’,
which led to the creation of an impact factor.9 Initially used in part by libraries
to select the best journals to purchase, the use of the term impact factor in this
context is different from its modern use by the Science Citation Index (Thomson Reuters).
By the 2000s, the need for an index to quantify individual researcher productivity
led one physicist to create the h-index.10 However, when the Royal Society of Chemistry
attempted to determine the most impactful chemist by h-index, this task was deemed
almost intractable due to the amalgamation of researchers with the name Tanaka K.11
This use of the Western-driven (surname/family name|given/first name|middle initial)
system is particularly problematic for Asian biomedical researchers in general: Japan,
China and especially Korea, where only a few surnames predominate and middle names
often do not exist.
The NIH recently announced a novel Relative Citation Ratio to better measure the true
impact of scientific articles.12 However, the NIH/NLM National Center for Biotechnology
Information (NCBI) SciENcv system, which allows biomedical researchers to link unique
‘My NCBI Bibliographies’ with NIH Biosketches, as well as automatically pull US federal
grant information from the NIH Electronic Research Administration system (‘eRA Commons’),
is still not fully linked with the PubMed Advanced Search Builder. Related to the
launch of the NLM ‘computed author display’ in 2012, these systems include ‘unique’
author search functionality algorithms.
This subject is not new.13
14 However, the solution to this problem requires innovation and leadership.15 Many
unique author identifier systems already exist: ORCID, Google Scholar, Mendeley, Scopus,
ResearcherID, ResearchGate, etc. Some are open access. Others are proprietary. Some
are based largely on self-curation, but all contain some automated component. Several
are even linked together. However, every biomedical researcher cannot create and maintain
dozens of ‘unique’ identifiers. The time has come for ‘DOIs for authors’. Beyond peer-reviewed
publications, a universal unique author identifier system would allow researchers
to better track and document the totality of their true scientific productivity: textbooks,
textbook chapters, teaching, computer coding, Wikipedia editing and more. The implications
of such a system are self-evident,16 including everything from academic advancement
to research funding and plagiarism.
For the rare biomedical researcher with a truly unique last name, or at least last
name and first initial, perhaps this is not a major concern. However, for the Tanaka
Ks and Harrison AMs of this world, it is. As long as these researchers continue to
publish in differing academic fields, manual curation will continue to struggle in
the absence of unique author identifiers. However, we already know that this system
is fundamentally problematic.11 Maybe some biomedical researchers will eventually
add or invent additional middle names.6 (We will not even touch the subject of name
changes,17 which is a complex legal matter in the USA and can be a protracted process
of obtaining a ‘deed poll’ in the UK.) However, when the Tanaka Ks and Harrison AMs
of the biomedical research world begin to publish within similar fields,18
19 and/or together in collaborative scientific endeavours, what will happen then?
The solution to this problem is for PubMed to shift to an arXiv-like, self-curation
system, which requires not only this continued plea but also vision and leadership
from the highest levels of the international biomedical research community. The pathway
to achieve this solution is not trivial and not unique. One pathway to reach this
solution is for PubMed to adopt an existing unique author identifier system, such
as ORCID, which is already used by many publishing groups. Another option is for PubMed
to create its own unique author identifier system, which already partially exists
in forms such as eRA Commons and SciENcv. No pathway will be free. Although self-curation
has worked well for arXiv, a comparatively greater amount of supervised-curation,
which is already the case for proprietary systems such as Scopus, may be required
for biomedical researchers to mitigate some of the flaws of self-curation. It should
also be noted that the worldwide ‘PubMed research community’ is significantly larger
than the worldwide ‘arXiv research community’, which increases the challenge of implementation
of this solution.
Any pathway to this solution should also optimise implementation time, which is already
an area of active informatics research. However, the complexity of the relationship
between clever biomedical researchers,20 publishing groups and funding organisations
continues to increase. Thus, a renewed push for urgency for this change is needed
from the increasingly fast-paced communities of science and medicine.