Unique Author Identification

November 10, 2006 Ian Hamilton Technology

I recently gave a brief presentation at the yearly CrossRef member meeting on unique author identification in scientific publishing. I had gathered information for the presentation from speaking with PLoS staff and online articles, but didn’t put pen to paper until the night before the meeting. Given my procrastination and rambling presentation, I think that it’s a good idea to write down my notes so that they are more understandable.

The idea of unique author identification in scientific publishing has been around for some time (I’m not even going to broach the topic of author identification in libraries). The American Mathematical Society has made attempts since 1940 to identify authors of papers listed in the Mathematical Reviews Database. But with the increasing trend of using databases for scientific publishing, it’s getting harder to identify authors publishing in many different scientific journals.

Some of the general problems encountered:

Authors share first and last name (e.g. John Smith).
The middle initial of a name may or may not be included.
Author names vary from one paper to another.
There are variable spellings of non-Roman names.
Author names can be truncated or split.
Author names can change (e.g. women that change their last name after getting married).

With the advent of electronic publishing, publishers are now encountering problems relating author identities in the many systems and databases that hold author information. While most scientific publishers own their data across their systems, I doubt that many have a way to associate information stored in the submission/peer-review database to the information that is stored in the ePublishing content management database.

At PLoS, we use an end-to-end system but have no way to correlate the author in the peer-review system with a paper published online. This will pose significant problems with the launch of PLoS ONE, as we want to provide a different way to represent author annotations from public annotations. Since we don’t have unique author identification across the systems, we’re building in methods to correlate registered users with author names but will have to manually verify and associate this information.

Although I loathe to suggest a big brother approach to identifying authors, but a centrally administered, all encompassing system would have some great benefits for both authors and publishers. Author names would be unmistakable no matter where they published or under what name. It would allow search engines, browsers and applications to create references between the author and their published works. It could be used to easily evaluate productivity and seniority in their field of publication. And it could be used to create new networks of data (e.g. friend-of-a friend) in the semantic web.

Creating unique author identification numbers would also help smaller journals, and authors too, as one could then measure reliably the citations of individual papers or authors, rather than just relying on a journal’s impact factor as the sole measure of impact (the PLoS Medicine editors have written a good editorial on the subject of impact factors). Publishers could also control their data easier as they wouldn’t have to worry about storing identification in multiple applications/databases or how the data would be saved beyond the life of the publishing company.

Matthew E. Falagas wrote a correspondence in PLoS Medicine raising the idea again of a unique author identification number (UAIN) for electronic databases of scientific information, while Etienne Joly responded with some further advantages for a UAIN and why the system should be applied retroactively.

Some questions

What would an unique author identification (or DOI) look like and could it represent other information (e.g. first publication date, institution, etc.)?
How would you make a system secure and verify author information?
How would the information be kept relevant?
Could the system be incorporated into other authentication protocols such as Open ID?

A lot of people have spent a lot of time on this subject but there hasn’t been a system put into place yet. CrossRef has successfully promoted digital object identifiers (DOI) for scientific publications and has an opportunity to push forward with the idea of unique author identification. It will take a lot of work but I believe that if CrossRef were to take on this project, they would have the support of all scientific publishers.