Systems Abstractions

Abstractions, the mental constructs that organize thinking to expedite the design and construction of reliable computations, are often heralded as bedrock principles of computational thinking.³ We lamented that the CT curricula for K–12, being intended for beginners, give few hints at the kinds of abstraction that computing professionals use in their work. A step in the direction of exposing more advanced abstractions was taken when the 2021 Turing Award celebrated the programming language and compiler abstractions devised by Alfred Aho and Jeff Ullman.¹ In this column, I would like to continue with examples of abstractions that were invented for operating systems and spread into the rest of computing.

Computing professionals invented these advanced “systems abstractions” to deal with large systems that are too big for lone programmers. I begin with some of the important ones from operating systems.

Hello, Operating Systems

Perhaps the most fundamental abstraction in operating systems is the “process,” meaning a program in execution. This idea was invented to solve reliability problems in early operating systems.

Between 1960 and 1965 operating system designers undertook to build computer utilities—powerful computing systems that would distribute computing power cheaply across large networks of users. These systems aimed to integrate a host of new inventions in time sharing, virtual memory, input-output streams, shared file systems, directory systems, and programming support interfaces. These inventions maximized information sharing, minimized development time, and spread the costs of expensive CPU and memory resources across many users.

The main abstraction for large programs circa 1960 was “modules and interfaces.” It called for decomposing the complex system into simple modules that would exchange information via their interfaces. For example, an operating system would be organized with modules for job and CPU scheduling, memory management, input-output, files, directories, and programming interface support. Unfortunately, this approach did not work. No matter how carefully designers specified module functions and interfaces, the systems would invariably crash when the modules were plugged together and subjected to the user workload. Debugging was fiendishly difficult.

The problem was that modules are a control structure for directing a CPU to work tasks one at a time. Operating systems, however, had to manage many computations for many users. There was no easy way to visualize the work of many users simultaneously flowing through the modules and their interfaces. Large systems were not simply small systems with more users; multiple users created new dynamics when implementing private memory, sharing files and memory, and contending for limited resources of CPU, devices, and memory. These dynamics included race conditions, deadlocks, busy waiting, data circulating between memory levels, access to files, users extending the system by creating new autonomous services, and predicting throughput and response time. A new kind of thinking was needed. This new kind of thinking came to be called concurrency control.

The process abstraction, which emerged by 1964, unlocked elegant solutions to the other problems. A process is more than a program in execution. It is an autonomous agent that performs services for other processes on request. Processes are the entities that demand CPU time and memory space, synchronize with other processes, create and access files, search directories, respond to events, and mesh with other processes to make dynamic computation structures.

The process idea spawned another important abstraction—the nonterminating computation. Service processes were designed as endless loops. After completing a request, a service process would return to a “homing position” and await the next incoming request. Daemon processes hidden in the background performed beneficial housekeeping functions such as memory reclamation or writing modified contents of memory back to the disk. Designers learned to think in terms of continuously running computing systems. By the late 1960s, most OS designers saw the operating system as a society of cooperating, mostly nonterminating processes rather than a mountain of modules. Today, your laptop’s activity monitor will typically show your operating system is running anywhere from 200 to 500 processes.

In contrast, most programming classes to this day teach only standalone terminating programs: those that start with an input and stop with an output. In this context, a nonterminating program looks like a bug—the infinite loop.

Systems abstractions are essential for building large complex systems with large numbers of processes, users, devices, and network connections. Every major domain of computing systems has its own characteristic abstractions.² The Internet, for example, has the IP protocol for addressing hosts, the TCP protocol for overcoming noisy transmissions, domain naming, URLs, Web pages, markup languages, and more. The cloud has universal unlimited name space, unforgeable pointers to stored files, datacenters, redundancy to prevent data loss, and more. Database systems have records, fields, tables, projections, joins, queries, atomic transactions, persistent storage, permanent commitment of files to the storage, and more. The list goes on.

A process is more than a program in execution.

A major source of complexity in a computing system is a large number of digital objects. System abstractions simplify this complexity in two ways. First, they lump all objects of the same kind into a class and design a single manager for all of them. The manager provides a single interface for the allowable operations processes can perform on those objects. Second, the class manager assigns unique names to objects and verifies every access for authorization. The pointers containing those names and access codes must be protected from alteration. The sidebar “Files Abstraction” illustrates these ideas for a Files Manager.

Capabilities

The need for unalterable pointers to objects is met in operating systems and cloud storage with a lower-level abstraction called “capability.” A capability is a bit-bundle consisting of (type, access, handle) fields. The type field indicates which type of object is pointed to. The access field is a multi-bit code specifying which subset of the class operations may be performed on that object. The handle field is a unique code for the object that distinguishes it from all others of the same type.

As long as capabilities are kept in kernel space, they are protected because no user process can alter anything in kernel space. When they are passed outside, they are augmented with a cryptographic checksum that enables recipients to verify they have not been altered since their creation.

Capabilities were introduced in 1966 and became a principle for implementing object oriented programming languages.⁴ They are used in some cloud storage systems, such as TAHOE-LAS, to guarantee privacy of files.

Many Levels of Abstraction

A systems abstraction for a class of objects can be described as an “abstract machine” whose instruction set is the operations provided at the interface and whose hidden internal data structures keep track of all the objects. The Files Manager is an example.

In operating systems and networks we can stack up abstractions into a series of levels. Each level can be composed from abstractions defined at lower levels, but cannot use any information about abstractions at higher levels.

One of the first working examples of this layering was the THE multiprogramming system, designed by Edsger Dijkstra approximately 1965.⁵ The idea has been extended to modern operating systems. A typical rendering into levels is:

Interrupts and stacks
Processes and semaphores
Virtual memory
Message exchange
Internet naming
I/O streams (files, pipes, devices)
Directories
Virtual machines
Shell (user interface)
Services

Levels 1–5 are the microkernel; they operate in kernel mode and can access all memory. Levels 6–10 are the user kernel; they operate in user mode and can only access the private memories of the processes calling them. Each level is an abstract machine that manages the class of objects of that level.

Level 10 is a collection of user services outside the kernel, such as graphical user interface, libraries of application programs (apps), and performance analysis tools. Each user service has its own system abstractions.

Objects and operations may be composed from lower-level objects and operations. In effect, the abstract machine of a level is nested inside the abstract machine for the next level up. The user interface for an abstract machine consists of the union of the interfaces for all the nested machines. This nesting hides lower-level details from higher levels.

In this stack of levels, programmers must design so they only call downward and never call upward. This prevents circular waits (deadlocks) and self-referential code loops, and it enables the system to be proved and tested one level at a time.⁶ This constraint requires some reorientation of thinking. As an example, consider a programmer of the file manager wanting to use a file as a container of a directory. This might seem to require an upward call from the file manager (Level 6) to the directory manager (Level 7) requesting creation of a directory that the file manager could then populate. To avoid the upward call, we move the responsibility for creating the directory and populating it with files to the shell (Level 9). The shell can call down to Level 7 to create the directory and then to level 6 to load files into it. This reorientation of thinking simplifies the code and eliminates any problems with circularity.

The layered system abstraction has an unexpected payoff. Many designers reacted against Dijkstra’s proposal, fearing it oversimplified, overconstrained, and lost function. The constraints would introduce complexity to get around them. This is not what happened. Without fail, layered systems consistently led to smaller kernels. The smaller kernels were faster and more easily tested and verified. Over the years, the only provably secure OSs have been layered structured. The modern Sel4 kernel is the latest example. It is small, compact, and fits into the constrained memories of IoT devices.

Conclusion

Much of the amazing progress of computing has been enabled by designers creating and using system abstractions. The OS abstractions outlined here are just a few of the systems abstractions used to organize large systems. It is a shame that few of the abstractions for large programs and compilers, and none of the systems abstractions, are discussed in the CT curricula offered to newcomers. CT for professionals is meaningless without these advanced abstractions.

Do abstractions, system or otherwise, fully characterize computing? I think not. Computing is the study of information processes, natural and artificial. Systems abstractions are one of many tools we bring to design and study large-scale, complex information processes.

Sidebar: Files Abstraction

Files are a common digital object. A file is a container for a string of bits, with a name and a size. The file itself is an abstraction of a set of fixed-size records stored in disk memory as a tree structure rooted at an index table. Users of files do not see the records, the tree, and the index table.

A Files Manager is the software subsystem that manages files, providing six basic operations—see the figure here. CREATE creates a new file and DELETE removes a file. OPEN gathers copies of all a file’s component records into a linear file-image in a main memory buffer. CLOSE copies the buffer back to its component records and deletes the buffer. The buffer is needed because direct reads and writes to the memory structure would be intolerably slow. File buffers are a second type of object managed by the Files Manager.

The READ operation copies the contents of the opened buffer to the caller’s memory region of length L starting at address S, denoted (S,L). The WRITE operation copies data from the caller’s memory region (S,L) into the buffer, replacing the former contents of the buffer.

This is of course a simplified view of the interface to the file system. Real file systems may have a few more operations, use more complex tree structures on the disk, and allow incremental reads and writes of the buffer.

The left side of the accompanying figure shows the capabilities involved at the user interface. Here, type F means file capability and type B buffer capability. The CREATE command creates a file capability that can be used as a parameter for the DELETE and OPEN commands. The OPEN command creates a buffer capability that can be used by the CLOSE, READ, and WRITE commands. Hidden within the Files Manager are tables for mapping file handles (fh) to the disk and buffer handles (bh) to main memory buffers.

Systems Abstractions

Hello, Operating Systems