Computer science is both a science and an art. Its scientific aspects range from the theory of computation and algorithmic studies to code design and program architecture. Yet, when it comes time for implementation, there is a combination of artistic flare, nuanced style, and technical prowess that separates good code from great code.
Like art, code is simultaneously subjective and non-subjective. The non-subjective aspects of coding include "hard" ideas that must be followed to create good code: design patterns, project structures, the use of common libraries, and so on. Although these concepts lay the foundation for developing high-quality, maintainable code, it is the nuances of a programmer's technique and toolsalignment, naming, use of white space, use of context, syntax highlighting, and IDE choicethat truly make code clear, maintainable, and understandable, while also giving code the ability to clearly communicate intent, function, and usage.
This separation between good and great code occurs because every person has an affinity for his or her own particular coding style based on his or her own good (or bad) habits and preferences. Anyone can write code within a design pattern or using certain "hard" techniques, but it takes a great programmer to fill in the details of the code in way that is clear, concise, and understandable. This is important because just as every person may draw a unique meaning or experience from a single piece of artwork, every developer or reader of code may infer different meanings from the code depending on naming and other conventions, despite the architecture and design of the code.
From another angle, programming may also be seen as a form of "encryption." In various ways the programmer devises a solution to a problem and then encrypts the solution in terms of a program and its support files. Months or years later, when a change is called for, a new programmer must decrypt the solution. This is usually not an enviable task, which can mainly be blamed on a failure of clear communication during the initial "encryption" of the project. Decrypting information is simple when the necessary key is present. So, too, is understanding old code when special attention has been paid to what the code itself communicates.
To address this issue, some works have defined a single coding standard for an entire programming language,7 while others have acquiesced to accepting naming conventions as long as they are consistent.6 Beautiful code has been defined in general terms as readable, focused, testable, and elegant.1 The more extreme case is the invention of an entire programming language built around a concrete set of ideals, such as Ruby or Python. Ruby emphasizes brevity, simplicity, flexibility, and balance.4 The principles behind Python are clear in the Zen of Python,5 where the focus lies on beauty, simplicity, readability, and reliability.
Our approach to this issue has been to develop a system of coding guidelines (available online3). While these guidelines come from an educational environment, they are designed to be useful to practitioners as well. The guidelines are based on a few broad principles that capture some fundamental principles of communication and elevate the notion of coding conventions to a higher level. The use of these conventions will also improve the sustainability of a code base. This article looks at these underlying principles.
One area not considered here is the use of syntax highlighting or IDEs. While either one may make code more readable (because of syntax highlighting or code folding, among others) and easier to manage (for example, quickly looking up or refactoring functions and/or variables), our guidelines have been developed to be IDE and color neutral. They are meant to reflect foundational principles that are important when writing code in any setting. Also, while IDEs can help improve readability and understanding in some ways, the features found in these tools are not standard (consider the different features found in Visual Studio, Eclipse, and VIM, for example). Likewise, syntax highlighting varies greatly among environments and may easily be changed to match personal preference. The goal of the following principles is to build a foundation for good programming that is independent of the programming IDE.
Consider a Program as a "Table"
In a recent ACM Queue article, Poul-Henning Kamp2 makes the fascinating point that much of the style of programming languages stems from the ASCII character set and typewriter-based terminals. Programming languages make no use of the graphical properties and options of modern devices. While code must be written with the clarity of good English grammar, it is not English text. Instead it is more like math and tables.
This is a far-reaching principle. First, it speaks directly to the use of fonts. Do not use a variable-width (proportional) font for program code, as code is not text. Fixed-width fonts (for example, Courier and Data Gothic) look appealing and allow easy alignment of code. Proportional (variable-width) fonts prevent proper alignment, and even more importantly, do not "look like" code.
While one should continue to think of a program as a sequence of actions or as an algorithm at a high level, each section of code should also be thought of as a presentation of a chart, table, or menu. In figures 1, 2, and 3 notice the use of vertical alignment to show symmetry. This is a powerful method of communication.
In the case when a long line of code spills into multiple lines, we suggest breaking and realigning the code.a For example, instead of
Let Simple English be Your Guide
A programmer creates a name for something with full knowledge of its use, and often many names make sense when one knows what the name represents. Thus, the programmer has this problem: creating a name based on a concept. The true challenge, however, is precisely the opposite: inferring the concept based on the name! This is the problem that the program reader has.
Consider the simple name
taken from the common C++ header file
<iostream.h>. An inexperienced or unfamiliar programmer may suddenly be mentally barraged with a bout of questions such as: Is it an integer? A pointer? An array or a structure? A method or a variable? Does sp stand for saved pointer? Is sput an operation to be done n times? Do you pronounce it sputn or s-putn or
sput-n or s-put-n?
We advocate basing names on conventional English usagein particular, simple, informal, abbreviated English usage. Consider the following more specific guidelines:
- Variables and classes should be nouns or noun phrases;
- Class names are like collective nouns;
- Variable names are like proper nouns;
- Procedure names should be verbs or verb phrases;
- Methods used to return a value should be nouns or noun phrases;
- Booleans should be adjectives;
- For compound names, retain conventional English syntax; and
- Try to make names pronounceable.
Some examples of this broad principle are shown in Figure 4.
There is an interesting but small issue when considering examples such as:
countFiles is a good name, it is not an optimal name since it is a verb. Verbs should be reserved for procedure calls that have an effect on variables. For functions that have no side effects on variables, use a noun or noun phrase. One does not usually say
We suggest that
is a slight improvement. More importantly, this enforces the general rule that verbs denote procedures, and nouns or adjectives denote functions.
Rely on Context to Simplify Code
All other things being equal, shorter programs are always better. As an example, local variables that are used as index variables may be named
i, j, k, and so on. An array index used on every line of a loop need not be named any more elaborately than
elementNumber obscures the details of the computation through excessive description. A variable that is rarely used may deserve a long name: for example,
MaxPhysicalAddr. When variable names are long, especially if there are many of them, it quickly becomes difficult to see what's going on. A variable name can often be shortened by relying on the context in which it is used. For example, the variable
Store in a stack implementation rather than
Major variables (objects) that are used frequently should be especially short, as seen in the examples in Figure 5. For major variables that are used throughout the program, a single letter may encourage program clarity.
Use White Space to Show Structure
While written and spoken communication may reach a high level of clarity, it is often left wanting of meaning if not accompanied by the personal touch of nonverbal cues and tendencies. An individual's body language helps clarify the spoken word. In a similar sense, the programmer relies on white spacewhat is not said directlyin the code to communicate logic, intent, and understanding.
An example is the use of blank lines between conceptually different sections of code. Blank lines should improve readability as they separate logically different segments of the code and thus provide the literary equivalent of a section break. Appropriate places to use blank lines include:
- When changing from preprocessor directives to code;
- Around class and structure declarations;
- Around a function definition of some length;
- Around a group of logically connected statements of some length; and
- Between declarations and the executable statements that follow.
Consider the code listing in Figure 6. Individual blank spaces should also be used to show the logical structure within a single statement. Strategic blank spaces within a line simplify the parsing done by the human reader. At a minimum, blank spaces should be included after the commas in argument lists and around the assignment operator
"=" and the redirection operators
On the other hand, blank spaces should not be used for unary operators such as unary minus
(), address of
(*), member access
(++), and decrement
Also, if it makes sense, put two to three statements on one line. This practice has the effect of simplifying the code, but it must be used with discretion and only where it is sensible to do so.
Let Decision Structures Speak for Themselves
The case statement used in Figure 1 brings up a general point: very simple decision statement structures can be tersely presented, showing the alternative code simply, and, if possible, without braces, as in the example in Figure 7.
It is not uncommon for simple conditions to be mutually exclusive, creating a kind of generalized case statement. This, as is common practice, can be printed as a chain, as in Figure 8.
Of course, it may be that the structures are truly nested, and then one must use either nested spacing or functions to indicate the alternatives. Again, the general point is to let the structure drive the layout, not the syntax of the programming language.
In the brace wars, we do not take a strong stand on the various preferences shown in Figure 9, but we do feel strongly that the indent is vital, as it is the indent that shows the structure.
Focus on the Code, Not the Comments
The ability to communicate clearly is an issue that is faced in all facets of the human experience. Programmers must achieve a level of clarity, continuity, and beauty when writing code. This means focusing on the code and its clarity, balance, and symmetry, not on its length or comments. While this concept does not advocate the removal of comments or negate their use and importance in appropriate situations, it does suggest that programmers must use comments wisely and judiciously. The focus should be on developing code that, for the most part, clearly communicates intent and functionality. This practice will automatically reduce the need for many comments.
Although the guidelines presented here are used in an educational setting, they also have merit in industrial environments. Students who are educated using these guidelines will most likely use them (or some variant) as they enter industry. To demonstrate this, we have developed an example that applies these guidelines to two very different styles. The first is the Unix style. It is terse, often making use of vowel deletion, and is often found in realistic applications such as operating-system code. This is not to imply that all or most system programmers use this style, only that it is not unusual. Figure 10 shows a small example of this style.
We call the second style the textbook style, as illustrated in Figure 11. Again, this in no way means to imply that all or most textbooks use this style, only that the style in the example is not unusual. In this style the focus is on learning. This means that there is frequent commenting, and the code is well spread out. For the purposes of learning and understanding the details of a language, this style can be excellent. From a practical perspective or for any program of some scale, this style does not work well as it can be overwhelming to use or to read. Moreover, this style makes it difficult to see the overall design, as if one is stuck under the trees and cannot see the forest around.
Figure 12 is a rework of the function in figures 10 and 11, using the guidelines discussed here to make a smooth transition between academic and practical code. This figure shows a balance of both styles, relying more directly on the code itself to communicate intent and functionality clearly. Compared with the textbook style, the resultant code is shorter and more compact while still clearly communicating meaning, intent, and functionality. When compared with the Unix style, the code is slightly longer, but the meaning, intent, and functionality are clearer than the original code.
Figure 13 illustrates the guidelines presented here in another setting. This is a function taken from a complex program (10,000 lines) related to power-system reliability and energy use regarding PHEVs (plugin hybrid electric vehicles). The program makes numerous calculations related to the effect that such vehicles will have on the current power grid and the effect on generation and transmission systems. This program attempts to evaluate the reliability of power systems by developing a model for reliability evaluation using a Monte Carlo simulation.
While the previous examples show the merit of the guidelines presented here, one argument against such guidelines is that making changes to keep a certain coding style intact is time consuming, particularly when a version-control system is used. In the face of a time-sensitive project or a project that most likely will not be updated or maintained in the future, the effort may not be worthwhile. Typical cases include class projects, a Ph.D. thesis, or a temporary application.
If, however, the codebase in question has a long lifespan or will be updated and maintained by others (for example, an operating system, server, interactive Web site, or other useful application), then almost any changes to improve readability are important, and the time should be taken to ensure the readability and maintainability of the code. This should be a matter of pride, as well as an essential function of one's job.
The authors would like to thank David Marcus and Poul-Henning Kemp for their insightful comments while completing this work, as well as the software engineering students who have contributed to these guidelines over the years.
Beautiful Code Exists, if You Know Where to Look
Software Development with Code Maps
Robert DeLine, Gina Venolia and Kael Rowan
Reading, Writing, and Code
1. Heusser, M. Beautiful code. Dr. Dobb's (Aug. 2005); http://www.ddj.com/184407802.
2. Kamp, P-H. Sir, please step away from the ASR-33! ACM Queue 8, 10 (2010); http://queue.acm.org/detail.cfm?id=1871406.
3. Ledgard, H. Professional coding guidelines. 2011 Unpublished report, University of Toledo; http://www.eng.utoledo.edu/eecs/faculty_web/hledgard/softe/upload/.
5. Peters, T. The Zen of Python. PEP (Python Enhancement Proposals). Aug. 20, 2004; http://www.python.org/dev/peps/pep-0020/.
7. Sun Developer Network. Code conventions for the Java programming language, 1999; http://java.sun.com/docs/codeconv/.
©2011 ACM 0001-0782/11/1200 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.