High-performance code is essential to many of the most exciting applications of computing. Without core kernels that exploit the best hardware we can build near the peak of its potential—from CPUs to GPUs to DSPs and neural accelerators—we wouldn't have seen the past decade's breakthroughs in deep learning, smartphone photography, virtual reality, or self-driving cars.
But writing high-performance code is difficult. On today's hardware, the best-optimized code is often orders of magnitude faster than a clean, straight-forward implementation—even in a "fast" language like C.