Unique Author Identification
I recently gave a brief presentation at the yearly CrossRef member meeting on unique author identification in scientific publishing. I had gathered information for the presentation from speaking with PLoS staff and online articles, but didn’t put pen to paper until the night before the meeting. Given my procrastination and rambling presentation, I think that it’s a good idea to write down my notes so that they are more understandable.
The idea of unique author identification in scientific publishing has been around for some time (I’m not even going to broach the topic of author identification in libraries). The American Mathematical Society has made attempts since 1940 to identify authors of papers listed in the Mathematical Reviews Database. But with the increasing trend of using databases for scientific publishing, it’s getting harder to identify authors publishing in many different scientific journals.
Some of the general problems encountered:
- Authors share first and last name (e.g. John Smith).
- The middle initial of a name may or may not be included.
- Author names vary from one paper to another.
- There are variable spellings of non-Roman names.
- Author names can be truncated or split.
- Author names can change (e.g. women that change their last name after getting married).
With the advent of electronic publishing, publishers are now encountering problems relating author identities in the many systems and databases that hold author information. While most scientific publishers own their data across their systems, I doubt that many have a way to associate information stored in the submission/peer-review database to the information that is stored in the ePublishing content management database.
At PLoS, we use an end-to-end system but have no way to correlate the author in the peer-review system with a paper published online. This will pose significant problems with the launch of PLoS ONE, as we want to provide a different way to represent author annotations from public annotations. Since we don’t have unique author identification across the systems, we’re building in methods to correlate registered users with author names but will have to manually verify and associate this information.
Although I loathe to suggest a big brother approach to identifying authors, but a centrally administered, all encompassing system would have some great benefits for both authors and publishers. Author names would be unmistakable no matter where they published or under what name. It would allow search engines, browsers and applications to create references between the author and their published works. It could be used to easily evaluate productivity and seniority in their field of publication. And it could be used to create new networks of data (e.g. friend-of-a friend) in the semantic web.
Creating unique author identification numbers would also help smaller journals, and authors too, as one could then measure reliably the citations of individual papers or authors, rather than just relying on a journal’s impact factor as the sole measure of impact (the PLoS Medicine editors have written a good editorial on the subject of impact factors). Publishers could also control their data easier as they wouldn’t have to worry about storing identification in multiple applications/databases or how the data would be saved beyond the life of the publishing company.
Matthew E. Falagas wrote a correspondence in PLoS Medicine raising the idea again of a unique author identification number (UAIN) for electronic databases of scientific information, while Etienne Joly responded with some further advantages for a UAIN and why the system should be applied retroactively.
- What would an unique author identification (or DOI) look like and could it represent other information (e.g. first publication date, institution, etc.)?
- How would you make a system secure and verify author information?
- How would the information be kept relevant?
- Could the system be incorporated into other authentication protocols such as Open ID?
A lot of people have spent a lot of time on this subject but there hasn’t been a system put into place yet. CrossRef has successfully promoted digital object identifiers (DOI) for scientific publications and has an opportunity to push forward with the idea of unique author identification. It will take a lot of work but I believe that if CrossRef were to take on this project, they would have the support of all scientific publishers.
We are developing a web application called SciLink to give every scientific author on the web a public resume and advanced content filtering engine. We are also developing a method to uniquely identify authors and assign them a unique identifier. Please see our website at:
We’d be delighted to give each and every PLOS author a link to their public resume from a PLOS article.
CEO & CO-Founder
>1. What would an unique author identification (or DOI) look like and could it represent other information (e.g. first publication date, institution, etc.)?
Why do you need this other information to be represented in the identifier? Seems like once you have the unique identifier, linking to all this other information (across various DBs, etc) will be relatively painless.
>2. How would you make a system secure and verify author information?
I think this is a solved problem? Fairly certain existing identifiers (e.g. OpenID) have good authentication.
>3. How would the information be kept relevant?
What information? The various DBs that list information about the author? I think you have no choice but to leave it up to each individual DB to keep their data relevant, it’s not as if PLoS would want to make sure the information about me on MIT’s server’s is up-to-date (nor could they if they wanted to).
>4. Could the system be incorporated into other authentication protocols such as Open ID?
This is pretty important. I think it is critical that this be something which is an open (preferably widely adopted) standard. Especially something which will be adopted outside of specialty science / science publishing sites. This will give much more flexibility down the line to people trying to build new science-related websites.
To provide a concrete example, Mediawiki already supports OpenID so any science wiki could incorporate that tomorrow. The more specialized the standard, the more of a barrier there is to adoption, since software packages won’t support it out of the box.
What would be the barriers to adopting, for instance, OpenID? What would be the next steps? What organization would make the most sense to lead the effort – NCBI? The nascent web-science community is in serious need of a unique identifier to allow aggregation of user comments / reviews / publications across disparate online communities. Without a unique, authenticated identifier, it will be really challenging to establish an online reputation that might eventually hold similar standing to the reputation derived from a scientist’s paper publication record.
we really need to get started on this now, glad to see PLoS One taking some initiative in spreading the word.
A completely decentralized and human-friendly approach to unique identification has been proposed, studied, published with open access, reviewed, commented on, revised, and republished in RFC 4151, Tim Kindberg and Sandro Hawke
The essence of their approach is to combine existing human-centric ids with human-readable dates. IDs like email addresses and web site URLs are mostly easy to use, but they are no longer unique when people move, organizations change, and so on. Dated IDs overcome this problem of natural change without creating a central authority or bizarre long numbers.
In a recently published article we have addressed several issues regarding unique author identification. e-letter in Science magazine online We hope that the discussion will go on in order to increase public awareness to this urging problem to a level to motivate an independent organization (NCBI/CrossRef, why not PLOS?) to take the lead to establish a database of unique author identifiers. The multiple advantages of such a database as outlined in an increasing number of articles is just overwhelming.
With regards to your specific questions:
Ad 1) The unique author identifier should be as simple as possible and should not represent “other information”. “Other information” should only be linked to it since it is likely subject to change. Therefore, a unique author identifier containing “other information” would inherently not remain unique (or at least not adequate). Furthermore, contrary to E Joly’s suggestion as a
Several posts on this blog express a preference for a completely ‘neutral’ form of UAIN, and I beg to differ on this issue. I predict that widespread using of UAINs will take off only if people can remember them, pass them to others on a piece of paper at conferences, and become somewhat sentimentally attached to them… This would work with an UAIN that contains whole or part of your name and the first year you ever published anything, and it will not work with a random string of 7 or 8 numbers and/or letters.
Cheers for now
Etienne Joly (JOLY-E-89-01)
I have to say that I like the elegance of Etienne’s solution. It would make me:
Following the nicely put questions by Richard Cave, I propose here are some suggestions:
1) LOOK and CONTENT OF DOI:
Like other references (isbn, pubmed id, etc) the “look” should be data-base centred and thus include only serial number-letter. This secure approach does not preclude to include the year of submission in 4-digit code (2006).
Including name or e-mail would be by far less reliable. Just think about translation of Asiatic names…
This obviously not human-friendly look should not preclude the usage of author DOI. Indeed, it could be included in any address and easily accessible. Likewise, copying and pasting a phone number is not difficult !
Regarding the content, this should be adapted to needs. I guess that current data on name, affilation, address, phone, fax and e-mail would be enough to contact the authors.
2) SECURE AND ACCURACY:
The security and accuracy could be easily reached if a submission or creation of doi applies upon a submission of scientific data (manuscript, data submission into data base,..). This implies that submitting DOI for authors would be prospective only. This would like any scientific submission.
3) UPDATE THE DATA:
The authors themself would be responsible to update their data either through login or submission to data base authorities.
4) OPEN ID:
I guess this could be feasible.
This also reminds me a recent post on Nature/Nautilus about “web visibility”…
It would be nice if the NCBI could store a unique identifier for each scientist (including those working in physics, humanities…) and for each laboratory in the world just like LinkedIn . The information would be available as a semantic FOAF. Anyone could modify is own information.file (who I am, who I know, my interests, my publications, where am I, my connotea/citeulike profile… etc…) and this unique identifier would be shown in articles.
Ask the NCBI ! 🙂