Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The pre-dominantly used imperative languages force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend toward specialized hardware devices to further increase efficiency.
Many emerging domain-specific languages (DSLs) used in performance-demanding domains such as deep learning attempt to simplify or even fully automate the optimization process. Using a high-level—often functional—language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad hoc predefined APIs that the compiler writers have exposed.