All too often, researchers find that the dataset they want to analyze is so novel or so huge that regular software packages no longer suffice – so they decide to write the code themselves. For a long time, this may have been necessary; developing a customized piece of software as a byproduct in order to get to great and novel scientific results.
But things have shifted over the years. The time when this scientific software was hidden away on a developer’s personal website or published in highly specialized computer science journals is past. Why not also get credit for the many hours you have put in developing your code? After all, that’s usually a big chunk of work, and the actual scientific output is “just” the result of all the hours put into the code.
It’s no surprise that we see more and more software articles being published in regular scientific journals. Moreover, these articles are well-read and cited (as you’ll see below, that’s something of an understatement). All in all, software publications are here, and they’re here to stay! Here are four reasons you should check them out, or consider writing one yourself.
1. Software articles are booming
While we have seen scientific output growing for many years across virtually all disciplines, software articles seem to outpace even this growth rate. Whereas areas like diabetes, solar or climate research – which are considered “hot” academic and societal topics – already show a steep increase in published articles, the amount of software-related publications is even greater.
Software publications are also no longer strictly the domain of computer science journals; we now see them in all scientific disciplines, from engineering and physics to biochemistry, genetics, molecular biology and the medical sciences.
The recently published Elsevier report on artificial intelligence paints a similar picture: AI research has grown annually by 5.3 percent in the last decade, and at an astonishing 12.9 percent in the last 5 years.
2. Software articles are citation magnets
Publishing software in a scientific journal significantly increases the chance of your software being discovered, used and cited. Readers can be made aware of the article in various ways: by the article recommender functionality on platforms like ScienceDirect, by having the article shared through Mendeley by peers, or simply because they set an alert for themselves to receive a notification when a new and relevant article is published.
This works. From the top 10 cited articles published in the past 30 years, four of them are software articles, according to Scopus – among them, an article receiving a staggering 69,203 citations. This is the work of Prof. George Sheldrick, the developer of a program called SHELX which is used for small-molecule refinement. The author had developed an early version of this software back in the 1970s, and over the years, he wrote about it in various computer science book chapters or published the occasional scientific article in a journal – describing only certain features or improvements to the code. But in 2008, he took a different tack, tying all the information together in a review article titled “A short history of SHELX.” The citations made him one of the highest cited authors in his field.
Another attractive feature of software articles is that citations tend to keep coming in. Look, for example, at the article describing BLAST (basic local alignment search tool), published in Elsevier’s Journal of Molecular Biology. This article was published back in 1990 and has received over 54,000 citations so far. Because the software is still widely used in genome research, it gathered more than 900 citations in the early months of 2019.
3. Software articles are pushing the boundaries
Recently, we are seeing more innovative ways of publishing software articles. For example, the open access journal SoftwareX focuses solelyon software publications for all scientific areas. It does not ask its authors for the standard scientific article format but instead publishes so-called original software publications (OSPs) and has created a special template for that. Authors are requested to submit just a brief description of the novelty and scientific impact of their code, but most of the emphasis is on the code itself, with its specifications listed in the code metadata. Also, authors are mandated to share their software by posting it on a repository, archive, etc, where the code is publicly available. The journal also maintains a dedicated SoftwareX page on GitHub that contains all the codes. On top of all this, the software is refereed.
Wait – refereed software? Yes, this journal is indeed asking its reviewers to referee the code and has developed some novel guidelines for that. To step up this review process, the journal has recently rolled out a pilot with CodeOcean: a “reproducibility platform that provides researchers and developers an easy way to share, discover and run code published in academic journals and conferences.” Authors of SoftwareX are requested to upload their code to CodeOcean, and reviewers can either just have a peak at it to see if it runs, how and what the output looks like and – for the real enthusiasts – check out the actual lines of code or rerun it. The fact that the software is viewable on CodeOcean is already a proof of concept as CodeOcean would simply not allow non-functioning code to be hosted on its platform.
4. Software users are loyal to their code
Analysis of some well-known community software packages that are widely used in a specific research community shows an interesting pattern almost every time: that users are very loyal to their software package. Not only do researchers keep using the same program for their research – which to some extent may be considered obvious assuming that a software package remains to be considered the default program to use in a given field – but also citations to the package shift from the current version to the newest, reflecting the actual adoption and usage of the new release by its users.
This loyalty is nicely illustrated in three slightly different but equally interesting examples: from the software journal SoftwareX (publishing GROMACS), the biology journal BioInformatics (publishing CLUSTAL) and the computational physics journal Computer Physics Communications (publishing PYTHIA).
GROMACS is a frequently used simulation program for proteins, lipids and nucleic acids, while CLUSTAL consists of a series of computer programs used for multiple sequence alignment. PYTHIA is a computer simulation program for particle collisions. For all packages, the list of official publications and instructions on which version to use and how to cite these are clearly indicated on the organization website. And it seems users do read these well!
In the case of GROMACS, we see a clear shift of citations from one version to the next each time a new version is released. “Old” versions for which citations keep climbing are those for which the version number is not mentioned or that include functionalities that continue to have use and that therefore continue to gather citations in parallel with newer versions. Hence, users behave very well and immediately start citing the new version when it is released, and if they do continue to cite the old version, they usually have good reason to do so.
The CLUSTAL package shows this community behavior even better. The plot nicely illustrates that researchers used (and cited) CLUSTAL W according to the instructions on its website. Once services were discontinued, three different packages, CLUSTAL OMEGA, MUSCLE and MAFFT (indicated by their makers as successors of the CLUSTAL package) start gaining more traction and citations.
The community behavior for the PYTHIA package has an analogue trend, showing a clear shift in the usage of the package to newer versions. In addition, the authors of PYTHIA published four of their five major releases in the same journal, giving users the advantage of knowing exactly where to turn to for any updates and new releases of the software.
In all three examples, as well as for the other software packages that we looked at, the overall number of citations keeps growing, showing an increase throughout the years in both the development and usage of software (see graphs below).
Software articles appear everywhere these days and dominate the citation records lists. Elsevier’s journal teams have additional initiatives in the pipeline to ensure that software publications continue to meet the needs of the people who publish and read them. As such, it will be easier than ever for you to get credit for your hard work.
- Basic local alignment search tool (1990), Journal of Molecular Biology – 54,152 citations so far
- GEANT4 - A simulation toolkit (2003), Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment – 11,253 citations so far
- EMBOSS: The European Molecular Biology Open Software Suite (2000),Trends in Genetics – 4,782 citations so far
- A brief introduction to PYTHIA 8.1 (2008), Computer Physics Communications – 2,338 citations so far (the first version “High-energy-physics event generation with PYTHIA 5.7 and JETSET 7.4” also published in CPC, 2003, has almost 2,700 citations)
- GADGET: a code for collisionless and gasdynamical cosmological simulations (2001), New Astronomy – 1,057 citations so far
- GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers (2015), SoftwareX – 1,600 citations so far
- Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function (1998), Journal of Computational Chemistry – 7,069 citations so far
- CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice (1994), Nucleic Acids Research - 49,147 citations so far
Top-cited articles since 1990
Articles in bold are software-related.
|Publication year||Article title||Authors||Journal||Lifetime citations|
|1996||Generalized gradient approximation made simple||Perdew J.P., Burke K., Ernzerhof M.||Physical Review Letters||794,81|
|2001||Analysis of relative gene expression data using real-time quantitative PCR and the 2<sup>-ΔΔC</sup>T method||Livak K.J., Schmittgen T.D.||Methods*||73,459|
|1993||Density-functional thermochemistry. III. The role of exact exchange||Becke A.D.||The Journal of Chemical Physics||72,691|
|2008||A short history of SHELX||Sheldrick G.M.||Acta Crystallographica Section A: Foundations of Crystallography||69,203|
|1990||Basic local alignment search tool||Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.||Journal of Molecular Biology*||54,760|
|1997||Gapped BLAST and PSI-BLAST: A new generation of protein database search programs||Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J.||Nucleic Acids Research||51,669|
|1994||CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice||Thompson J.D., Higgins D.G., Gibson T.J.||Nucleic Acids Research||49,512|
|1996||Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set||Kresse G., Furthmuller J.||Physical Review B - Condensed Matter and Materials Physics||43,229|
|2004||Electric field in atomically thin carbon films||Novoselov K.S., Geim A.K., Morozov S.V., Jiang D., Zhang Y., Dubonos S.V., Grigorieva I.V., Firsov A.A.||Science||35,309|
|1997||Processing of X-ray diffraction data collected in oscillation mode||Otwinowski Z., Minor W.||Methods in Enzymology||34,446|