Along with over 10,000 others, I signed the San Francisco Declaration on Research Assessment (DORA). Why? I believe that the impact factor was a useful tool for the paper age, but that we now have the capability to develop much more powerful tools to evaluate research. For hundreds of years scientific discourse took place on paper – letters written and sent with the post, research cited in one’s own articles printed and distributed by publishers. Citation was the most direct way in many cases to respond directly to the research of another scientist. In the 1970s as scientific output began to boom, the impact factor was developed as a measure for librarians to manage their collections. Impact factors are calculated exclusively by Thompson Reuters for specific journals from the number of citations received by articles in year X published during the previous two years. Therefore, an impact factor can reflect the changing status of a journal within a research area, as the number of citations increases or declines – assuming that the number of citations an article receives is a measure of its “importance” for the scientific community. But how did we get from there to a number that informs all hierarchies in the natural sciences, from professor to lab tech from elite university to community college. For decades the impact factor was considered the single criterion to quantify the prestige of journals with which to attract submissions from researchers and subscriptions from libraries. During my academic career as a physicist I published more than 20 papers in international journals between 1993 and 2001. I was eager to succeed and made sure that all of my papers were published in journals which had an impact factor, some of them even submitted to high-impact “luxury” journals (to quote Randy Schekman). However I also submitted one of my papers to a journal with no impact factor because it had been recently launched. The result was that this work was only rarely read as the new journal had a low subscription base so only very few people could access it – which resulted in fewer citations, the coin of the realm.
“For decades the impact factor was considered the single criterion to quantify the prestige of journals with which to attract submissions from researchers and subscriptions from libraries.”
While the impact factor may be useful in evaluating a journal, it can neither reflect the quality of a specific article nor evaluate the relevance of the research summarized in that article. George Lozano calculated in a London School of Economics blog post that the number of most highly cited papers that are published in high impact factor journals have been continually dropping since 1991. Even less can an impact factor be used to evaluate the performance of an individual researcher. Nevertheless, that is still the common practice for example in evaluating scientific research of an institution or for application procedures for young professors. Since there were no other metrics available, impact factors were unfortunately used to evaluate things for which it was never intended. Many scientists are frustrated with this system and are speaking out. See for example the excellent writing by Stephen Curry, Bob Lalasz, Paul Wouters, Björn Brembs and Michael Eisen.
“The widely held notion that high-impact publications determine who gets academic jobs, grants and tenure is wrong.” (Michael Eisen 2012)
Some other apparent and genuine problems associated with the impact factor are the static way in which that factor is calculated: The time span for citations is limited to 24 months at maximum. This does not reflect the totally different average citation periods in different scientific disciplines. For example, in mathematics the median of citations will appear after several years at average and therefore the majority of citations are not reflected in the impact factor for the journal in which that article appeared. In other disciplines, as for example biochemistry, citations occur more quickly than in other areas of research. As a result, the impact factor cannot be compared between different disciplines. In biology, the impact factor of 4 of the top 5 journals in that field was at about 6, whereas in medicine the IF span from 12 to above 50 for the top 5 medical journals. As a result of its rigid time period of two calendar years to calculate the impact of citations for a given journal, the following effect is not considered, too: If an article was published in January 2011, citations of this article in 2013 will be counted over the full 24 months for the 2013 impact factor. If the same article will have appeared in December 2011, one can easily recognize that there were only 13 rather than 24 months for this given article to gotten read and considered for a new publications which will have cited that article. Publishers know this and try to save review articles and work by prominent scientists for their January issues. Of course, an author himself can push the number of citations for his article up when citing himself elsewhere (66 journals were banned for this practice). This manipulation is sometimes hard to discover but specific niche disciplines can develop “citation circles” in which each member frequently cites the other. Moreover, having understood how an impact factor is calculated, it is also an easy task for an experienced editor of a journal to systematically “boost” an impact for example by editorials, conference proceedings, or letters in in which further citations for a given journal can be generated which influence the citation count but not the basis of articles when calculating the impact factor. Therefore, some editors like to invite editorials or special papers for their journal (or a journal which is part of that publishing house) in which other articles of this journal were cited. It also happened to me and some of my colleagues in the past that an editor of a reputed high-IF journal asked to include one or more additional references of that specific journal before acceptance of publication. So let me ask: is this really an appropriate measure for the quality of scientific work?
Some high-impact factor “journals aggressively curate their brands, in ways more conducive to selling subscriptions than to stimulating the most important research. The exclusive brands are then marketed with a gimmick called ‘impact factor’” (Randy Schekman in TheGuardian)
For a fair and unique evaluation of science and those who did that research a transparent and flexible metrics is required. An alternative approach was suggested by J. E. Hirsch in 2005 who defined an index to quantify an individual’s scientific research output which was later called h-index. The h-index concentrates of the scientific output and its citations for an individual researchers rather than a journal. However, one problem for developing new metrics is that citation information is proprietary information, owned either by the CrossRef member publishers, by Thompson Reuters, Scopus or Google. A free, open access citation database would allow new experiments in evaluating articles and researchers. And of course there are further problems with concentrating solely on citations as the measure of influence as papers are increasingly made public as preprints and working papers. Nobel prize winner Paul Krugman states that by the time his most important paper was finally published, there were already over 150 derivative papers that he knew of. A new quality metric should consider not only the citation of an article in the classical sense but also the different channels of feedback and citations which are used today to comment on a scientific article, for example via Twitter, Facebook, Mendeley, CiteULike, blogs, or other social media.
“…an easily computable index, h which gives an estimate of the importance, significance, and broad impact of a scientist’s cumulative research contributions.” (J.E. Hirsch 2005)
Recently, a US-based start-up company Altmetric started to collect mentions of academic research in all relevant social media channels on a day-to-day basis which appears to be more appropriate today to display the feedback about a scholarly paper than counting citations in classical journals. Users, both readers and authors can monitor, search, and measure all feedback and conversations concerning specific articles, their own articles, as well as those published by competitors. If for example a new paper has been twittered three times within the first week after publication, Altmetric shows these tweets explicitly and links to the original comment. In comparison, the first “classical” citation of the same article in another journal will be probably published several months (or years) later. And two years later these citations will count up for the impact factor of the journal in which that article appeared. So why should we wait for that feedback and use a metrics which is based on the journal brand rather than the individual relevance of an article? This is also the reason why we integrated Altmetric as an article level metrics feature for our new venture ScienceOpen which I have initiated and created from my experiences in publishing indusrty and as a researcher. It seems quite obvious that the impact factor is not the best or even a good measure for the quality of a scientific article, a researcher or a research program. But it will take a concerted effort by universities, funding bodies, and researchers themselves to change habits. A collective change in evaluation procedures and personal behavior in academic publishing could boost the development towards the right direction. And it will take new experiments and new tools to develop optimal quality metrics. Both researchers and librarians have a powerful key role in this movement and I am interested how seriously and quickly they will make use of this role in the near future.
“… it will take a concerted effort by universities, funding bodies, and researchers themselves to change habits.”