Are We Publishing Too Many Articles?

Michael Hochberg
Dec 22, 2021
12 min read

Updated: Aug 20, 2022

You will not be surprised to learn that the number of journal articles published yearly is growing. What you probably don’t know is that this increase is exponential. Can the growth of something associated with our freedoms and the epitome of knowledge, understanding and progress be anything but good?

Exponential Growth

If you are a young career scientist, then consider this. Current estimates put annual article growth at about 4%. And so, you might be saying “What’s 4%? I only read about 50 new articles per year and wouldn’t be too inconvenienced if it were 52 or 53 next year, and 55 or 56 the year after that”.

True, but:

1. Actual reading is only a subset of potential interest. Everyone has their own reading ceiling and increasing the number beyond means either encroaching on other things in life, or being even more choosy in what other papers you read.

2. Single-digit growth could translate into a huge number of additional to-read papers over a career. At 4%, you would need to read 240 new papers (rather than 50) during the 40th year of your career. What about the total number of to-read papers in your collection? The total over X years is the sum over N=1 to X of 50*(1+GROWTH)^N. If growth were 0%, then this would just be 2000 (which is a large number!). At 4% annual growth the total is 4941 (more than doubling the number of papers read, on average, per year). But now imagine that we consider the total number of papers you need to sort through to decide what you read. Assume that this too grows at 4% and you start with 1000 interesting papers to obtain your 50 reads in year 1. At 40 years this becomes 4801 per year, for a total library of 98827 papers. So, you start with a library of 1000 papers and end up with a whopping 100 times it in year 40!

3. Knowledge growth is what makes science tick and evolve. Non-exposure to potentially important findings negatively impacts (1) the scholarship and scientific quality of one’s own work and (2) transmission of important work to readers. The reverberations for scientific progress even if certain are difficult if not impossible to quantify.

Information Tragedy

Whether there can ever be too much of something good is a matter of opinion. If good is per unit and there are too many units to experience, then yes, good could conceivably turn to bad. For example, I love fine dining, but I do not want to be Monsieur Creosote in Monty Python’s The Meaning of Life. Exponentially-growing scientific article numbers, beyond a point, must lead to smaller percentages read. What is not known is whether percentages decrease at the same rate or faster than article growth. That is, there may be some degree of abandonment, such that despite growing libraries, fewer and fewer articles are actually read.

Growth

Let’s look a bit closer at growth and then how it can generate challenges. Growth can be seen in two interrelated ways. First, output per unit time can be (1) a constant number of new papers per year, (2) a constant proportion of papers from year to year, or (3) a growing proportion of papers from year to year. Second, output translates into a linear increase in the total number of papers for (1), an exponential increase for (2) and a hyper-exponential increase for (3). The sciences are currently in scenario (2).

Scientific articles from the distant and not-too distant past, even if rarely or never read, are always available. They and their findings accumulate in the scientific commons. Accumulation is not so much a problem insofar as growth in knowledge renders some past work more historical than relevant. But although the rate at which past work loses actual relevance would be difficult to quantify, it stands to reason that this is lower than the rate at which viable articles become obsolete. This could mean that as the mass of newly published articles grows, the expected lifetimes of previously published papers shrink.

Making the Mark

Science evolves – it is never set in stone for long. New material in the form of live presentations, media and written documents is how findings are communicated, come under scrutiny and contribute to understanding. On the face of it, scientific growth is conducive to progress, because a process akin to natural selection will favor increments in understanding, but will also tend to weed-out results not considered interesting or useful.

A consequence of papers being read less than (perhaps) they should, is that some are rarely or never cited. Of course, some would rarely or never be cited irrespective of how much they are read[1]. I’ve called this “disposable science”, not because papers did not merit being read, but rather some don’t get the attention they deserve simply because we do not make the time[2]. Worse, it is possible that reading effort is becoming increasingly biased towards those articles published in the most prestigious scientific journals or by the most highly cited or productive scientists.

There are several easy-sounding solutions to the dual realities of not enough time to read and lowered impact of under-read articles:

Have improved ways to identify what we should read. This helps but does not solve the problem.
Spend more time reading. Idem.
Hire fewer scientists. No.
Publish less. Heresy! ... Heresy?

Every Manuscript has a Home

We do science for various reasons including curiosity, passion, the craft, careers. Regardless of why, we have considerable freedom to explore. Our papers metaphorically have lives, and who doesn’t want their infant article to have a great future? But not all futures are equal, since science has another natural regulation mechanism: The choosiness of peer reviewed journals. So, if it were only a world of the relatively small number of journals covered by highly selective platforms such as Clarivate’s Web of Science, then perhaps this essay would be irrelevant. But this is far from so. The supply of science is not only the sheer number of papers, but an ever-growing market for new journals, notably including so-called ‘predatory’ journals. Publishers reply to manuscript supply and author demand by creating new journals, and new journals make publishing easier, which creates incentives for some scientists to produce even more papers, and so on.

Thus, our addiction to publish dopes journal growth, which, in turn, provides homes for each and every paper no matter what the importance or quality.

But what is behind this loop that appears to be spiraling out of control?

The Elephant in the Room

Have you ever really thought about why you publish? Discovery? Communication? Team work? Career goals? Student mentoring? The next vista?... I would guess, like me, you don’t give it much thought beyond “science only exists if published” ... and that’s good enough! The more we publish and the higher the impact, the more we can publish in the future. This positive feedback is driven, in part, by competition for jobs, funding, stellar students and, importantly, the acceptance of our work in the top journals. More research[3], more results, higher impact and back again.

This positive feedback supercharges an ‘evaluation culture’. Even if you don’t know the term, you know what it is: Journals, granting agencies and academic departments making decisions based on indices such as publication number, journal impact factors, the H-index and article citations.

How does this work? Supplies are less – often much less – than demands. Top journals in the biological sciences are turning away the majority of submissions, many through desk rejection without review. The stakes are high and evaluation committees are confronted with hard choices. Their decisions can sometimes make or break careers and, more modestly, influence research programs and personal well-being. Candidates either succeed or fail – there is no in-between. Evaluation committees are confronted with complex decisions but obliged to render a dichotomy: “this is better than that”, “…more interesting than that”, and so on. The fittest tend to prevail in an open market where publishing becomes an end unto itself. The adage “publish or perish” is a hyperbole of a more general reality.

This all sounds bleak, but understanding these variables can help think about why we publish and possibly lead to more fulfilled science and happier careers. Thinking about why we publish may reveal the underestimated importance of other facets of science including reading, teaching and giving seminars, mentoring, participating in discussion groups and workshops, and public outreach. Many of us do value these, but frankly it is uncommon that scientists express a burning desire to, for example, increase their teaching commitments.

I or We?

The heart of the problem is overly equating individual scientific value with publication number.

Consider this. Doing and publishing science takes a lot of time. I have clocked this myself on papers where I did all or the vast majority of the work, and come up with 500-600 person-hours to research, write and revise a Review-type article. I can only presume that a primary research, data-based study would take more time. But beyond the initial completion of a manuscript there’s what is often an obstacle course in getting the paper published, sometimes involving revisions and submission to many different journals. These are not reason not to publish, but rather real parameters that should be taken into consideration when deciding how to dedicate time to various facets of science, and dare I say having a personal life! Indeed, the "I" question itself can be huge.

Hypothetical set of possible productivity-quality strategies (solid line and area below it) for a given researcher. If the scientist is currently at point A, she maximizes quality by reducing productivity and attaining B, or increase productivity at the expense of quality to C. She could also seek an optimal combination (for example, D) that increases both quality and productivity. From Hochberg 2019.

And there is a subtler facet to publishing less, this being who should publish less? Publishing is so fundamental to what we do that the very notion of publishing less to improve an intangible science commons is anathema for the vast majority of scientists. “If others are not ready to commit to publishing less, then why should I?” The logical but naïve reply is if we published less, a little less, then both the quality of what is published would increase and the mass of articles that merit being read would grow at a slower rate. But there are many problems with this proposal. First and foremost, as above, who’s going to go first? Second, although the number of articles a given person publishes and the quality per article are subject to an inevitable tradeoff, some high-functioning scientists can avoid the quality downturn and still publish 10s of excellent and important papers per year. Should they, or even the majority working at high scientific standards, curtail their productivity? Clearly not[4],[5].

A Proposal

Here’s an idea. The next time I appear as an author on a peer-reviewed published article, I add a footnote or say in the acknowledgements something to the effect: “I adhere to a publication policy of submitting only my finest work to peer review journals and post other work worthy of attention as preprints only”. This is useful as it signals policy, but could be embarrassing if, for example, the person was to publish in a journal what was ostensibly substandard (e.g., retracted) work. Less emphatic would be simply to adhere to the content of the statement and say nothing. More emphatic would be to publish less altogether, for example through living documents or slow-science.

A preprint-forever policy, if widely adopted, could have positive effects on the science commons. It assorts science in a more categorical way than does journal impact alone, makes the science commons more approachable, preserves the reviewer commons [6], curtails predatory journal growth and addresses certain excesses in the evaluation culture.

Could it Work?

Despite what would appear to be benefits of shifting science to preprints, there are numerous reasons why it is highly unlikely to ever happen, at least on the massive scale necessary to have a noticeable effect on the science commons:

1. Many don’t see a problem at all and sometimes quite the contrary. Any "problem" is subjective and philosophical. Suggestion of self-limitation or rules and regulations in how much one can publish is against the spirit of discovery. The raison d’être of a scientist is doing and publishing science. Readership and impact are not essential reasons to publish.

2. Co-authors may not agree that their work be relegated to a preprint server only. In particular, this penalizes students, postdocs and young faculty who are ironically the most affected by the evaluation culture.

3. Academic departments and funding agencies will wonder if their faculty and grantees are doing substandard work in the form of preprints-only. This could affect careers, future research funding and therefore science itself. There is nevertheless a growing push for preprints to be recognized and valued as scientific products and included in academic assessments.

4. One may underestimate the current importance of their work and cannot possibly know its future influence. The rule of thumb ‘publish less and put more effort into each paper’ is good in promoting quality, but not to the point of sacrificing discovery.

5. Similarly, should the preprinted work be deemed important at a future date, the authors may have moved on and/or are unwilling to embark on the demands of publication in peer reviewed journals.

6. Should one’s scientific output decrease at some future time, one may regret having opted manuscripts for preprint-only. This could have reverberations for science careers.

7. Being overly self-critical and therefore preprinting too many manuscripts may eventually sap motivation.

8. The dichotomy journals vs preprints may shift efforts to those studies most likely to be published in peer reviewed journals. This has its advantages viz fewer published units and higher quality science, but may also lower the standards of those studies relegated to preprints.

9. As earlier in this essay: Who’s going to go first? The impact of publishing less in peer reviewed journals on the science commons accrues with community adherence. Individuals will be reticent to pay the costs of adherence if there’s little collective effort.

10. What merits peer review vs preprint-only is hardly obvious. Your ‘good’ is most everyone else’s ‘great’, etc. Community reactions to a preprint could be used as a gauge in approaching a journal. But should this require more reviewers (preprint + journal) than going for peer review in journals only, it may negatively impact the reviewer commons.

11. Perhaps most damming reason is the fact that to counteract exponential growth in published articles, allocation to preprint servers would need to grow exponentially as well. There are many problems here, just for example, researchers whose article output is low would ‘pay the price’ of paper growth stemming from new scientists entering research. We can hardly blame young career scientists for doing science!

Simple Answers?

There are no magic wands to the title of this essay. Posting reprints just 'kicks the can down the road', since like full-fledged articles, preprints exist to be read, and like articles, they also accumulate. Making things even more complicated, many clamor to get rid of the classic journal model altogether. Paywalls and excessive costs to publishing borne by authors would be addressed by Open Science, but not the problems of seeing, reading and citing an exponentially growing literature. Indeed, it stands to reason that easier publishing as preprints and open access in journals would just make things worse.

Discovery

What about technology helping to find what we could/should/need to read? This may take the form of machine learning, where an algorithm follows what you read, why you read it, how you read it, etc. Such algorithms have actually existed for some time, a notable example being Google’s PageRank.

A platform tailored to your science would build a custom search engine that continuously learns and evolves to your reading preferences. But delegating 100% of searches to an algorithm has its limitations, and we see this clearly when we conduct a keyword search on Google Scholar. We cannot compare the consequences of shortcomings in article search technology to those of sophisticated robots, such as autopilots in aircraft or self-driving cars, but the nature of learning complex, contextual information is broadly similar. The main difference perhaps is the added layers of assessment and prevention in critical systems, faced with the unfathomable consequences of failure. Nevertheless, quality control – and information control full stop – are real concerns, since big data analytics companies play a major and growing role in the selective access of the scientific literature and data.

Challenges notwithstanding, an enriching and accomplished future for scientists will depend on us – that is, learned societies, academic institutes and scientists themselves assembling to discuss, foresee and to act.

And Then...

The most important progress we can make is thinking about and discussing possible issues in the growth of scientific information. This will require both uncomfortable self-assessments and, more obliquely, how we view the science of others. The associated questions are complex and philosophical. Nevertheless, in practical terms, we should consider thinking more carefully about possible products before actually embarking on a study. Do I choose to evolve towards a slow science and therefore fewer publications, a mixed article / preprint-forever science, or simply a fewer papers policy? And to what extent do I actually adopt such policies?

So, concretely

I adhere to the policy of fewer authored publications and pushing for some of these to be submitted to scientist-run outlets such as the Peer Community Journal.

[1] Although, it could be argued that in being more read, lower-standard papers will be more cited, thereby hampering scientific progress. [2] An objective of publishing science is not necessarily to maximize getting read and cited. For instance, some articles are read and cited little for reasons such as small audiences and timeliness . [3] Notably pressure to publish the smallest unit possible, so-called ‘salami slicing’. [4] Indeed, the rich get richer appears to apply to productivity. [5] This said, a model indicates that high output can lead to poorer methods and increased false discovery rates – natural selection for bad science. [6] Note that as I write, the notion of ‘preprint’ is giving way to generic journals, with options for preprints only or peer review to become articles. This could potential reduce the burden on the reviewer commons. See for example https://peercommunityin.org/pc-journal/

Are We Publishing Too Many Articles?

Recent Posts

Comments