Evaluating the evaluators

Should we judge the quality of articles based on the journals that published them, and if so, how do we judge whether or not those journals are good at this evaluation? Put more briefly—how do we evaluate the evaluators?


(This is an opinion article.)

Today is “Impact Factor day.” Or more accurately, it is Journal Citations Report day—the overall publication that includes in it, among other things, the Impact Factor.

The problematic Impact Factor.

Every year about this time, there is no shortage of think-pieces decrying not only how the Impact Factor (IF) gets misused, but the fact that it exists at all; and even the fact that researchers’ articles get judged by the IF of the journals that they’re in. To get a sense of the sound and fury, just try this canned search: https://www.google.com/#q=problems+with+Impact+Factor

So what exactly is this IF? The IF is a calculation of citations in a given year to articles published in a journal the previous two years, and calculated and published by a division of Thomson Reuters called ISI. So the numbers coming out this week will calculate citations in 2014 to articles published in 2012 and 2013.

Eugene Garfield, who invented it in the 1960s, meant it as a way of helping librarians evaluate which journal subscriptions they should maintain (or acquire). It became a short-hand indicator of journal prestige, and in the last 20 years or so, administrators have used it, more and more, as a gauge of researchers’ work.

In addition, ISI seems to obscure the details of its source data, and it is impossible to accurately reproduce their calculations (and speaking personally, I’ve tried—repeatedly).

Article-level vs. journal-level

The one thread, though, that I want to pull on in this think-piece is the question of whether or not we should judge the quality of articles based on the journals that published them, and if so, how do we judge whether or not those journals are good at this evaluation? Put more briefly—how do we evaluate the evaluators?

One response to the IF has been the development of article-level metrics (also sometimes called “alternative metrics” or “altmetrics”—but note that “Altmetric” is actually the name of a specific company, and the source of this counterpoint to this piece). Article-level metrics are just that—they track citations, usage, social media traffic, etc., for individual articles, independent of the journal that published them.

Which, to me (from my acknowledged perspective as someone who has worked in journal publishing for quite some time) discounts a large part of the work that journal editors do. And I don’t only mean getting good reviewers to do substantive and timely reviews (essential as that is). I mean the subjective work of evaluating the articles.

I do strongly believe in that expert subjective evaluation. For one thing, journal editors see all the submissions to their journal, not just the ones that they choose to publish. Most importantly, this means that they don’t only see the articles they use for their own research, and they also see the articles they reject.

This, to me is a big weakness of the “article-level metrics” approach, because most researchers evaluating a single article, as they may come across it, focus on their own work. They consume and comment on articles important for their own research (or that they’re invited to review) and are much less likely to have the time to wade through, read in-depth, and compare the entire corpus of articles, as editors do. Whatever evaluation is happening there—from citing to tweeting—is independent of any comprehensive comparison, and while done by experts in the subject, not done by experts in evaluation.

Additionally, there is also the “10,000-hour rule”—that one develops skill and expertise at a task with practice. This again argues the case for editors who invest time in evaluating articles and developing their skills in doing this. Time that other researchers haven’t invested.

There is one article-level exception to this, and it’s article-level evaluation (rather than metrics); services offered by companies like Peerage of Science and Rubriq which offer full editorial evaluation, including a comprehensive report (These still-new ventures, for what its worth, personally excite me.)

But how do we judge editorial performance?

This still doesn’t answer my first question though—how do we know how good editors are at evaluating research? Or as I formulated it above, who evaluates the evaluators? One can acknowledge the IF’s problems without throwing the journal-level baby out with the IF bathwater. Yet even doing so, the need for some sort of measure still exists. Part of me wants to look for a new “objective” measure, and yet part of me—going back to the subjective theme—thinks maybe we need some sort of subjective measure of editorial quality; and I’m not sure what that might look like.

(ETA: Updated with the link to Cat Chimes’ guest post June 25 11:26 AM EDT.)

View the latest posts on the SpringerOpen blog homepage



Well, I personally think that much of the debate lies in the question: what do we evaluate for? The old, good idea in Physics that one first seeks for a model and then starts measuring otherwise the measuring will always match whatever needs one has, seems useful here, to me. If we want to evaluate journals, your point is definitely a good one, but if we want to evaluate researchers it is another story. And most of criticism towards IF comes from that: it was not meant to evaluate researchers but now it is misused as such.

Björn Brembs

You write:

“The IF is a calculation of citations”

No, the IF is negotiated and not calculated. If you try to reproduce the ‘calculation’ you fail [1].

“Or as I formulated it above, who evaluates the evaluators?”

We did – or at least we reviewed the literature of those who actually did [1]. It turns out, the higher the IF, the methodologically worse the science is. In other words, the editors at high ranking journals are worse than the editors and lower ranking journals.

That’s the data. Evidence-based policy should thus either get rid of journal rank or prevent hiring of people who have published in high ranking journals (which would include me). You see, your question has already been answered.

[1] Brembs B, Button K and Munafò M (2013) Deep impact: unintended consequences of journal rank. Front. Hum. Neurosci. 7:291. doi: 10.3389/fnhum.2013.00291

P.S.: The top journals explicitly acknowledge our findings: http://bjoern.brembs.net/2013/06/everybody-already-knows-journal-rank-is-bunk/

Comments are closed.