4 reasons to publish software articles – even if you’re not a computer scientist

Data shows citations surging for software articles, and the formats are becoming more innovative and user-friendly

Elsevier digital illustration

All too often, researchers find that the dataset they want to analyze is so novel or so huge that regular software packages no longer suffice – so they decide to write the code themselves. For a long time, this may have been necessary; developing a customized piece of software as a byproduct in order to get to great and novel scientific results.

But things have shifted over the years. The time when this scientific software was hidden away on a developer’s personal website or published in highly specialized computer science journals is past. Why not also get credit for the many hours you have put in developing your code? After all, that’s usually a big chunk of work, and the actual scientific output is “just” the result of all the hours put into the code.

It’s no surprise that we see more and more software articles being published in regular scientific journals. Moreover, these articles are well-read and cited (as you’ll see below, that’s something of an understatement). All in all, software publications are here, and they’re here to stay! Here are four reasons you should check them out, or consider writing one yourself.

1. Software articles are booming

While we have seen scientific output growing for many years across virtually all disciplines, software articles seem to outpace even this growth rate. Whereas areas like diabetes, solar or climate research – which are considered “hot” academic and societal topics – already show a steep increase in published articles, the amount of software-related publications is even greater.

Software publications are also no longer strictly the domain of computer science journals; we now see them in all scientific disciplines, from engineering and physics to biochemistry, genetics, molecular biology and the medical sciences.

Published articles in various scientific fields (Source: Scopus)

The recently published Elsevier report on artificial intelligence paints a similar picture: AI research has grown annually by 5.3 percent in the last decade, and at an astonishing 12.9 percent in the last 5 years.

Published articles on artificial intelligence (Source: Scopus)

2. Software articles are citation magnets

Publishing software in a scientific journal significantly increases the chance of your software being discovered, used and cited. Readers can be made aware of the article in various ways: by the article recommender functionality on platforms like ScienceDirect, by having the article shared through Mendeley by peers, or simply because they set an alert for themselves to receive a notification when a new and relevant article is published.

This works. From the top 10 cited articles published in the past 30 years, four of them are software articles, according to Scopus – among them, an article receiving a staggering 69,203 citations. This is the work of Prof. George Sheldrick, the developer of a program called SHELX which is used for small-molecule refinement. The author had developed an early version of this software back in the 1970s, and over the years, he wrote about it in various computer science book chapters or published the occasional scientific article in a journal – describing only certain features or improvements to the code. But in 2008, he took a different tack, tying all the information together in a review article titled “A short history of SHELX.” The citations made him one of the highest cited authors in his field.

Another attractive feature of software articles is that citations tend to keep coming in. Look, for example, at the article describing BLAST (basic local alignment search tool), published in Elsevier’s Journal of Molecular Biology. This article was published back in 1990 and has received over 54,000 citations so far. Because the software is still widely used in genome research, it gathered more than 900 citations in the early months of 2019.

3. Software articles are pushing the boundaries

Recently, we are seeing more innovative ways of publishing software articles. For example, the open access journal SoftwareX focuses solelyon software publications for all scientific areas. It does not ask its authors for the standard scientific article format but instead publishes so-called original software publications (OSPs) and has created a special template for that. Authors are requested to submit just a brief description of the novelty and scientific impact of their code, but most of the emphasis is on the code itself, with its specifications listed in the code metadata. Also, authors are mandated to share their software by posting it on a repository, archive, etc, where the code is publicly available. The journal also maintains a dedicated SoftwareX page on GitHub that contains all the codes. On top of all this, the software is refereed.

Wait – refereed software? Yes, this journal is indeed asking its reviewers to referee the code and has developed some novel guidelines for that. To step up this review process, the journal has recently rolled out a pilot with CodeOcean: a “reproducibility platform that provides researchers and developers an easy way to share, discover and run code published in academic journals and conferences.” Authors of SoftwareX are requested to upload their code to CodeOcean, and reviewers can either just have a peak at it to see if it runs, how and what the output looks like and – for the real enthusiasts – check out the actual lines of code or rerun it. The fact that the software is viewable on CodeOcean is already a proof of concept as CodeOcean would simply not allow non-functioning code to be hosted on its platform.

4. Software users are loyal to their code

Analysis of some well-known community software packages that are widely used in a specific research community shows an interesting pattern almost every time: that users are very loyal to their software package. Not only do researchers keep using the same program for their research – which to some extent may be considered obvious assuming that a software package remains to be considered the default program to use in a given field – but also citations to the package shift from the current version to the newest, reflecting the actual adoption and usage of the new release by its users.

This loyalty is nicely illustrated in three slightly different but equally interesting examples: from the software journal SoftwareX (publishing GROMACS), the biology journal BioInformatics (publishing CLUSTAL) and the computational physics journal Computer Physics Communications (publishing PYTHIA).

GROMACS is a frequently used simulation program for proteins, lipids and nucleic acids, while CLUSTAL consists of a series of computer programs used for multiple sequence alignment. PYTHIA is a computer simulation program for particle collisions. For all packages, the list of official publications and instructions on which version to use and how to cite these are clearly indicated on the organization website. And it seems users do read these well!

In the case of GROMACS, we see a clear shift of citations from one version to the next each time a new version is released. “Old” versions for which citations keep climbing are those for which the version number is not mentioned or that include functionalities that continue to have use and that therefore continue to gather citations in parallel with newer versions. Hence, users behave very well and immediately start citing the new version when it is released, and if they do continue to cite the old version, they usually have good reason to do so.

With GROMACS, we see a clear shift of citations from one version to the next each time a new version is released.

The CLUSTAL package shows this community behavior even better. The plot nicely illustrates that researchers used (and cited) CLUSTAL W according to the instructions on its website. Once services were discontinued, three different packages, CLUSTAL OMEGA, MUSCLE and MAFFT (indicated by their makers as successors of the CLUSTAL package) start gaining more traction and citations.

Community behavior in the CLUSTAL package, 1994-2018.

The community behavior for the PYTHIA package has an analogue trend, showing a clear shift in the usage of the package to newer versions. In addition, the authors of PYTHIA published four of their five major releases in the same journal, giving users the advantage of knowing exactly where to turn to for any updates and new releases of the software.

The community behavior for the PYTHIA package shows a clear shift in the usage to newer versions.

In all three examples, as well as for the other software packages that we looked at, the overall number of citations keeps growing, showing an increase throughout the years in both the development and usage of software (see graphs below).

The overall number of citations keeps growing for software articles. (Source: GROMACS, CLUSTAL and PYTHIA)

Conclusion

Software articles appear everywhere these days and dominate the citation records lists. Elsevier’s journal teams have additional initiatives in the pipeline to ensure that software publications continue to meet the needs of the people who publish and read them. As such, it will be easier than ever for you to get credit for your hard work.

Citation goldmines

Top-cited articles since 1990

Articles in bold are software-related.

Publication year Article titleAuthorsJournal Lifetime citations
1996 Generalized gradient approximation made simple Perdew J.P., Burke K., Ernzerhof M. Physical Review Letters 794,81
2001 Analysis of relative gene expression data using real-time quantitative PCR and the 2<sup>-ΔΔC</sup>T method Livak K.J., Schmittgen T.D. Methods* 73,459
1993 Density-functional thermochemistry. III. The role of exact exchange Becke A.D. The Journal of Chemical Physics 72,691
2008 A short history of SHELX Sheldrick G.M. Acta Crystallographica Section A: Foundations of Crystallography 69,203
1990 Basic local alignment search tool Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Journal of Molecular Biology* 54,760
1997 Gapped BLAST and PSI-BLAST: A new generation of protein database search programs Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Nucleic Acids Research 51,669
1994 CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Thompson J.D., Higgins D.G., Gibson T.J. Nucleic Acids Research 49,512
1996 Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set Kresse G., Furthmuller J. Physical Review B - Condensed Matter and Materials Physics 43,229
2004 Electric field in atomically thin carbon films Novoselov K.S., Geim A.K., Morozov S.V., Jiang D., Zhang Y., Dubonos S.V., Grigorieva I.V., Firsov A.A. Science 35,309
1997 Processing of X-ray diffraction data collected in oscillation mode Otwinowski Z., Minor W. Methods in Enzymology 34,446

*Elsevier journal

References

GROMACS

CLUSTAL

Quick question for you

Which terms do you most associate with Elsevier? (check all that apply)

Data and analytics
Research platforms
Technology
Decision support tools
Publishing
Books and journals
Scientific articles
Healthcare content

Tags


Contributors


Related stories


Comments


comments powered by Disqus