Proteins are a class of large molecules that are involved in the vast majority of biological functions, from cell replication to photosynthesis to cognition. The chemical structure of proteins is very systematic5—they consist of a chain of atoms known as the backbone, which consists of three-atom (nitrogen-carbon-carbon) repeats known as residues, each of which features a sidechain of atoms emanating from the first carbon. In general, there are 20 different options for sidechains, and a residue with a particular type of sidechain is known as an amino acid (so there are also 20 different amino acid types). For billions of years, the process of evolution has optimized the sequence of amino acids that make up naturally occurring proteins to suit the needs of the organisms that make them. So we ask: Can we use computation to design non-naturally occurring proteins that suit our biomedical and industrial needs?
This question is a combinatorial optimization problem, because the output of a protein design computation is a sequence of amino acids. Due to the vast diversity of naturally occurring proteins, it is possible—and very useful—to begin a protein design computation with a naturally occurring protein and then to modify it to achieve the desired function. In this article, we focus on protein design algorithms that perform this optimization using detailed modeling of the 3D structure of the protein.5,8 Thus, they will begin with a starting structure, a 3D structure of a (typically naturally occurring) protein we wish to modify.