Opinion
Computing Profession Letters to the editor

A Division in Computer Science

Posted
  1. Introduction
  2. Author's response:
  3. Keeping CALM
  4. Authors' response:
  5. References
  6. Footnotes
letters to the editor, illustration

The January 2022 Editor-in-Chief's column "Is the Global Computing Community Irrevocably Divided?" raises an important question. From a computing perspective, it is difficult to attempt a complete answer, let alone a solution, as the column is also related to political science; more precisely, its subfield of international relations.

However, there is a division in the computing literature, which is not irrevocable. Speedup laws and models for parallel and distributed computing are merely ignored by theoretical publications, but widely relied on by engineering publications. For example, in a highly influential paper, cited by more than 1,700, Hill and Marty1 rely on Amdahl's Law to propose asymmetric (heterogeneous) chip multi-processor architectures. Amdahl's Law, imposing a constant upper limit on speedup by infinitely many processors, was stated in 1969.

Since then, more than 6,800 engineering papers rely on or refer to Amdahl's Law. During the same time period, polynomial speedup by a polynomial number of processors has also been shown by numerous theoretical papers2 but none of them is even mentioning Amdahl's Law, let alone attempting to clarify the contradiction. This magazine could help to overcome this huge division, in particular because it contributed to that division.

Experimental results by the first recipients of the Gordon Bell Prize already contradicted Amdahl's Law. However, in the May 1988 Communications article, "Reevaluating Amdahl's Law," Gustafson inappropriately claimed Amdahl's model assumes a fixed problem size, and proposed a new speedup model, based on fixed time instead. Therefore, Amdahl's model still remains widely accepted as the fixed-size, while Gustafson's model got accepted as the fixed-time speedup model.

Unfortunately, both Amdahl's and Gustafson's models are wrong. Even if we fix the problem size, speedup can increase with increasing number of processors. Strong-scaling results of 67% efficiency reported by the 2016 winners of the Gordon Bell Prize contradict Amdahl's Law even as the fixed-size speedup law. Gustafson's Law is contradicted by established theoretical results.

As the Gordon Bell Prize is an award presented by ACM, there is a division in ACM itself. The best way to overcome the division would be to clarify the contradiction between Amdahl's Law and the Gordon Bell Prize.

Ferenc Dévai, London, U.K.

Back to Top

Author's response:

The statistician George Box said, "Essentially, all models are wrong, but some are useful." Amdahl's Law, our extension,1 and Gustafson's Law are models of complex reality. Only reality is correct, but often provides little insight. Models are never completely correct and should be judged on their balance of error and insight. Calling these models "laws" is unfortunate—as it is for Moore and Newton—but difficult to change.

Amdahl's "model" and our extensions are wrong as computation is not perfectly serial or thread-level parallel (whither vectorization?) and the serial portion is rarely fixed. However, we extend Amdahl's insight on how serial-like computation limits overall performance in different ways for homogeneous, heterogeneous, and dynamic multicore systems. Many have found this valuable.

Mark D. Hill, Madison, WI, USA

Back to Top

Keeping CALM

Distributed consistency is a hard problem in the presence of process failures, network unreliability, and asynchronous communication ("Keeping CALM: When Distributed Consistency Is Easy," Sept. 2020). The difficulty arises from the co-ordination of multiple independent processes in an unpredictable manner. Negative results such as the FLP impossibility result and the CAP theorem helped us to realize what we cannot do in specific systems. On the other hand, the CALM theorem paves the way for building systems that achieve consistency without (or with minimal) coordination. The CALM theorem is a positive result that potentially enables engineers to not give up in the face of impossibility results, and instead, try to find a way (or make one) to build consistent systems without the burden of comprehensive coordination. There are two discussions about the CALM article that might benefit from clarification: the role of quorums and the possibility of availability under partition.

On two occasions, the article states coordination requires all machines to participate in the action. This gives the impression that comprehensive coordination is only possible if all machines know each other's current state. However, fault-tolerant systems that employ consensus algorithms have shown that usually, a quorum of machines would suffice to reach an arbitrary agreement in a distributed environment. One important question is the size of the quorum that is sufficient to solve the problem. Certain optimizations could also yield a smaller quorum, which reduces the cost of coordination. The curious reader might also claim that the size of the quorum depends on the types of failures in a system. For instance, if machines are unreliable but the network infrastructure is capable of sophisticated operations such as atomic broadcasts, then coordination becomes even cheaper without requiring all machines to participate in the process.

Observation 1 in the article states "Coordination-freeness is equivalent to availability under partition." This observation raises the following question: Are all coordination-free problems available under partition, and vice versa? The article states "a coordination-free program is by definition available under partition." This is true if the problem at hand is reducible to a simpler problem. For instance, consider the problem of distributed deadlock detection. If the question is whether or not a machine has observed a deadlock, then the answer is simple and it does not require coordination. However, if the problem is to determine a deadlock for a given transaction, then the problem becomes much more difficult.

The CALM theorem is an intriguing result that brings the best out of engineers by reminding them that they do not have to succumb to impossibility results if the problem at hand allows it. CALM also takes one step further than mere optimization by showing us that if we can treat the problem flexibly, then we can enjoy building consistent systems without the comprehensive coordination of processes, which is extremely valuable. We hope that these discussions will elucidate the topic and engage Communications readers by creating new discussions.

Mohsen Sharifi, Tehran, Iran, and Amirhossein Sayyadabdi, Isfahan, Iran

Back to Top

Authors' response:

Thanks for bringing up these interesting issues. You mention two: the potential relationship between quorums and CALM, and concerns about the stated equivalence in the article between coordination-freeness in the CALM Theorem and availability in the CAP Theorem.

On the first point, quorums are indeed used in many distributed agreement protocols, and not mentioned in the article. Note that quorum consistency is only helpful in a global sense if it brings non-quorum members into alignment—post-hoc—with a quorum decision. (At a gloss, "majority rules" are not helpful if the minority will not comply!) To illustrate, consider the non-monotonic example in the article—distributed garbage collection—which looks for objects in a pointer graph that have no connection to root. A quorum of nodes may only contain a subset of all the objects, with outgoing pointers to objects in other nodes. The quorum could in principle declare that any objects they do not actually store are "garbage," but doing so would break the typical semantics of the garbage collector, and thereby cause program crashes.

What this illustrates is that quorums are sufficient to guarantee distributed agreement, but agreement alone does not provide consistency—some kind of compensation actions are required to bring the non-quorum nodes into consistency with the quorum. There is a long tradition of compensations ("inverses," "apologies") in distributed systems, and it is interesting to consider how to model them formally in concern with the CALM results.

Regarding your second point on coordination-freeness and availability, the discussion requires some care. You actually pose two changes to the deadlock example in the article a variant of the problem restricted to a given transaction's perspective, and the phrasing "whether or not" a deadlock exists. On the first point, you pose the query "report if transaction T is in a deadlock" (that is, the node T appears on a cycle in the waits-for graph). This is, like the original example in the paper, monotone and coordination-free: once a positive answer is established, additional information will not change the outcome. Changing the perspective to a single transaction does not make the program non-monotonic. On the second point, you are in fact complicating the discussion with the wording "whether or not." The query "report if transaction T is not in a deadlock" (that is, no cycles pass through T) is indeed non-monotone: a positive answer can be refuted by additional information, so coordination is required to ensure no such information could appear. Hence "whether or not" is indeed more difficult—the "or not" aspect is nonmonotonic. Returning to the larger point, monotonic programs ("whether") can come to a conclusion during partition, whereas the non-monotonic questions ("or not") cannot.

However, your framing of this in the context of availability does illuminate an important aspect of the CALM Theorem: Ameloot's formalism distinguishes possible stalls for data dependencies (which affect all distributed programs) from required stalls for coordination (which only affect non-monotonic programs). Returning to the CAP frame of reference, during a partition a monotonic program can safely complete whatever useful work it can do on the data it can observe, whereas a non-monotonic service must stall even on the data it can observe, just in case data it cannot observe would invalidate its conclusions.

Joseph M. Hellerstein, Berkeley, CA, USA, and Peter Alvaro, Santa Cruz, CA, USA

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More