Static program analysis is a key component of many software development tools, including compilers, development environments, and verification tools. Practical applications of static analysis have grown in recent years to include tools by companies such as Coverity, Fortify, GrammaTech, IBM, and others. Analyses are often expected to be sound in that their result models all possible executions of the program under analysis. Soundness implies the analysis computes an over-approximation in order to stay tractable; the analysis result will also model behaviors that do not actually occur in any program execution. The precision of an analysis is the degree to which it avoids such spurious results. Users expect analyses to be sound as a matter of course, and desire analyses to be as precise as possible, while being able to scale to large programs.
Soundness would seem essential for any kind of static program analysis. Soundness is also widely emphasized in the academic literature. Yet, in practice, soundness is commonly eschewed: we are not aware of a single realistic whole-programa analysis tool (for example, tools widely used for bug detection, refactoring assistance, programming automation, and so forth) that does not purposely make unsound choices. Similarly, virtually all published whole-program analyses are unsound and omit conservative handling of common language features when applied to real programming languages.
The typical reasons for such choices are engineering compromises: implementers of such tools are well aware of how they could handle complex language features soundly (for example, assuming that a complex language feature can exhibit any behavior), but do not do so because this would make the analysis unscalable or imprecise to the point of being useless. Therefore, the dominant practice is one of treating soundness as an engineering choice.
In all, we are faced with a paradox: on the one hand we have the ubiquity of unsoundness in any practical whole-program analysis tool that has a claim to precision and scalability; on the other, we have a research community that, outside a small group of experts, is oblivious to any unsoundness, let alone its preponderance in practice.
Soundness is not even necessary for most modern analysis applications, however, as many clients can tolerate unsoundness.
We introduce the term soundy for such analyses. The concept of soundiness attempts to capture the balance, prevalent in practice, of over-approximated handling of most language features, yet deliberately under-approximated handling of a feature subset well recognized by experts. Soundiness is in fact what is meant in many papers that claim to describe a sound analysis. A soundy analysis aims to be as sound as possible without excessively compromising precision and/or scalability.
Our message here is threefold:
- We bring forward the ubiquity of, and engineering need for, unsoundness in the static program analysis practice. For static analysis researchers, this may come as no surprise. For the rest of the community, which expects to use analyses as a black box, this unsoundness is less understood.
- We draw a distinction between analyses that are soundymostly sound, with specific, well-identified unsound choicesand analyses that do not concern themselves with soundness.
- We issue a call to the community to identify clearly the nature and extent of unsoundness in static analyses. Currently, in published papers, sources of unsoundness often lurk in the shadows, with caveats only mentioned in an off-hand manner in an implementation or evaluation section. This can lead a casual reader to erroneously conclude the analysis is sound. Even worse, elided details of how tricky language constructs are handled could have a profound impact on how the paper's results should be interpreted, since an unsound handling could lead to much of the program's behavior being ignored (consider analyzing large programs, such as the Eclipse IDE, without understanding at least something about reflection; most of the program will likely be omitted from analysis).
Unsoundness: Inevitable and, Perhaps, Desirable?
The typical (published) whole-program analysis extolls its scalability virtues and briefly mentions its soundness caveats. For instance, an analysis for Java will typically mention that reflection is handled "as in past work," while dynamic loading will be (silently) assumed away, as will be any behavior of opaque, non-analyzed code (mainly native code) that may violate the analysis' assumptions. Similar "standard assumptions" hold for other languages. Indeed, many analyses for C and C++ do not support casting into pointers, and most ignore complex features such as
Could all these features be modeled soundly? In principle, yes. In practice, however, we are not aware of a single sound whole-program static analysis tool applicable to industrial-strength programs written in a mainstream language! The reason is sound modeling of all language features usually destroys the precision of the analysis because such modeling is usually highly over-approximate. Imprecision, in turn, often destroys scalability because analysis techniques end up computing huge resultsa typical modern analysis achieves scalability by maintaining precision, thus minimizing the datasets it manipulates.
Soundness is not even necessary for most modern analysis applications, however, as many clients can tolerate unsoundness. Such clients include IDEs (auto-complete systems, code navigation), security analyses, general-purpose bug detectors (as opposed to program verifiers), and so forth. Even automated refactoring tools that perform code transformation are unsound in practice (especially when concurrency is considered), and yet they are still quite useful and implemented in most IDEs. Third-party users of static analysis resultsincluding other research communities, such as software engineering, operating systems, or computer securityhave been highly receptive of program analyses that are unsound, yet useful.
Evaluating Sources of Unsoundness by Language
While an unsound analysis may take arbitrary shortcuts, a soundy analysis that attempts to do the right thing faces some formidable challenges. In particular, unsoundness frequently stems from difficult-to-model language features. In the accompanying table, we list some of the sources of unsoundness, which we segregate by language.
All features listed in the table can have significant consequences on the program, yet are commonly ignored at analysis time. For language features that are most often ignored in unsound analyses (reflection,
We strongly feel that:
- The programming language research community should embrace soundy analysis techniques and tune its soundness expectations. The notion of soundiness can influence not only tool design but also that of programming languages or type systems. For example, the type system of TypeScript is unsound, yet practically very useful for large-scale development.
- Soundy is the new sound; de facto, given the research literature of the past decades.
- Papers involving soundy analyses should both explain the general implications of their unsoundness and evaluate the implications for the benchmarks being analyzed.
- As a community, we should provide guidelines on how to write papers involving soundy analysis, perhaps varying per input language, emphasizing which features to consider handlingor not handling.
a. We draw a distinction between whole program analyses, which need to model shared data, such as the heap, and modular analysesfor example, type systems. Although this space is a continuum, the distinction is typically well understood.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.