The Importance of Reviewing the Code

The Importance of Reviewing the Code illustration

Contributors to journals, as well as researchers and politicians, are currently focused on such subjects as open access, data mining, and the growth of the Internet and information technologies together with their associated problems and solutions. At the same time, there is one extremely significant topic in scientific research and computing that is rarely addressed: the importance of the code.

Nowadays, the use of software is essential in many different research fields. It is possible to access a vast amount of research data thanks to the use of computers, software, and storage facilities. If you work in the field of geosciences—as I do—you probably rely on the use of satellite data collected for use by governmental or intergovernmental agencies that has undergone rigorous testing. Normally, there is a peer-reviewed paper to which you can refer and that you can cite when you use the data. It is possible to create your own scripts and code in order to work with the data, study the results, and formulate a hypothesis about the cause(s) of a phenomenon. In some cases, you might also use software packages that have been developed and released by others, such as spreadsheets and statistical programs. You might also use the functions that are available in your commercially released high-level programming language that make your daily programming tasks easier. When you have computed your results you might use them to publish a paper. Yet how often do reviewers or editors ask about the software used during research? You might receive a large amount of criticism about the statistics, methods, and data when you submit papers for publication, but how often do you receive comments about the software—who cares about that?

Given the lack of comments on software, the issue arises as to whether we are systematically violating basic principles of the scientific method. One such principle is that experiments should be reproducible. Yet it is often the case that reviewers, editors, and other scientists who read your paper cannot reproduce your experiment because they do not have access to essential information about the software you used. In order to address this problem, we must think beyond merely citing the programs and their version numbers. Different programs or different versions of the same program can make the same computation in different ways, that is, by using different algorithms, some of which will yield results with different degrees of precision or accuracy. It is generally the case that people are simply too willing to believe the results of computations, especially in view of the frequency with which bugs are present in most commonly used programs. Indeed, it is arguable that when using proprietary software it is a question of faith to rely on the results, because it is not possible to check the code (see http://www.gnu.org/philosophy/categories.html#ProprietarySoftware).

It is arguable that when using proprietary software it is a question of faith to rely on the results, because it is not possible to check the code.

In light of the foregoing, we may well ask whether we should call for software specifications and code reviewers in scientific publishing. In fact, publishing the software specifications should be a requirement for authors and journal editors. The author’s own source code should be published, at least on the Internet, along with the research results, and that source code should be accessible to referees. This does not mean that reviewers should be required to study the code in detail before accepting a paper, because this would require too much work to be viable. However, having the source code available to those who are interested would be a big step forward. In fact, a relatively quick check of the software code by an expert would be beneficial and would encourage authors to place greater emphasis on the reliability of the software they use. This principle should clearly apply to code that one writes oneself. In addition, prepackaged software (whether commercial or not) should be tested, verified, and certified with its code filed and accessible, and checked in detail by independent programmers or agencies. If such certification were available, it would suffice when submitting a paper for publication to indicate that certified software had been used.

In order to realize the state of affairs described here, the most desirable choice is to use free software (see http://www.gnu.org/philosophy/free-sw.html). Free software lets you go into the code and check it. Using free software also follows the spirit of science, in that scientists can disseminate any modifications they make to the code within the scientific community.

Clearly, the challenges involved in applying the framework described in this Viewpoint will vary between different fields of research. However, the amount of work entailed should not be seen as an excuse for not doing it. Furthermore, it can be argued that in some fields of study, the possibility of investigating a phenomenon using different approaches and theories, obtaining similar results, and testing similar hypotheses should be sufficient to render the type of software used unimportant. Yet to argue in this way would be to miss the point. What if the results differ? How do we explain the discrepancy? One possibility is that the difference lies in the software code used. Thus, doing things in the right way, by using free software, will bear fruit. At least it is something we should aspire to, along with what we could call the scientific ideal.

Acknowledgments

The author would like to thank Richard M. Stallman from the Free Software Foundation, Michael McIntyre from the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge, Gerald J. Sussman from the Computer Science and Artificial Intelligence Laboratory at MIT, and Brian Gough and José E. Marchesi from the GNU Project for their useful comments and suggestions.

Footnotes

This Viewpoint was accepted for publication in February 2010; in the intervening time prior to publication other material addressing this topic has appeared in Communications.

DOI: http://doi.acm.org/10.1145/1941487.1941502