Technical Perspective: SQL on an Encrypted Database

Cloud services are very popular today. One can rent platforms, software, or applications from companies like Amazon, Google, Microsoft, or Sales-force. But, whenever we rent their services, we trust these companies with our confidential data, ranging from benign personal email messages and pictures to highly sensitive financial data or medical records. There is some risk in trusting the cloud providers with sensitive data. A curious administrator may peek inside the data for amusement or for financial profit; a hacker may break into the cloud server and steal the entire data. So far, cloud service companies have been spared a major disaster and their users still trust them with data, but these companies are only one headline story away from a trust crisis. Currently, the only means for cloud companies to earn our trust is to have very strict internal policies for managing and restricting access to users’ data, and to use conventional system security to resist hackers and external adversaries.

Why not encrypt the data stored in cloud services? Once encrypted with the user’s key, the data is safe from the curious administrator because he does not have the decryption key. Similarly, if a malicious attacker breaks into the system, he still does not have the decryption key. All data in the cloud system, persistent or transient, is encrypted, and the system never receives the secret key. Only after the data is returned to the user can it be decrypted by the user with her secret key. This sounds like an ideal solution for ensuring the confidentiality of data in the cloud. The problem is that without decryption keys, the cloud provider cannot perform general computations on the encrypted data. Thus, if the data is encrypted and the cloud service provider does not have the key, the service it can provide is very limited.

The following paper by Popa, Redfield, Zeldovich, and Balakrishnan describes a system that allows encrypted data to be processed without the decryption keys. Their solution is to have the data accessed only through a database system, and to use some specialized encryption techniques that perform limited computations directly on the ciphertext. The standard database query language SQL is strictly more limited than general-purpose languages like Java. Although it is impossible to execute a general Java program on encrypted data, computing over encrypted data is possible if the language is some fragment of SQL. The idea of a database-as-a-service emerged more than 10 years ago,² long before the cloud, and the possibility of processing data without the decryption key was suggested then. But implementing it turned out to be challenging.

Commercial database systems like DB2, Oracle, and SQL Server allow the database to be encrypted, but either require the application to be rewritten, or require the database server to have access to the encryption key.¹ In the first case, the encryption is done at the cell level, and the user must modify the database schema to change the type of every encrypted attribute to binary, and must manually encrypt/decrypt each value. Performance degrades because the database system can no longer use indexes on the encrypted columns. In the second case, the encryption is done at the physical storage level, where data is immediately decrypted when read from disk; the application and most of the database engine remain unchanged, but now the database system needs the key in order to encrypt/decrypt pages. For many years, the quest to provide transparent and efficient processing of encrypted data has remained elusive. The CryptDB system described in the following paper is the first that demonstrates how this goal can be achieved.

Consider a simple strawman example of a table with an attribute city; the data is encrypted at the cell level, so every city is encrypted with the user’s key. In order to select all records where city = 'Seattle' one does not need to decrypt all values of the city attribute, instead it suffices to encrypt the constant 'Seattle', and modify (automatically) the SQL query to test for equality with the ciphertext. Thus, the database can answer the query on the encrypted data without the key. This may sound simple, but it is actually challenging. Strong encryptions are randomized, such that different occurrences of the same value have different ciphertexts. If our application filters on city, then for this attribute the system must choose a deterministic encryption, which is slightly weaker than a randomized encryption, but allows us to test for equality. To complicate matters, if we perform other operations on an attribute, for example, inequality predicates, or addition, then the system must use some other specialized encryption method that commutes with that operation. How can the system automatically choose the right encryption method for each attribute? CryptDB has an ingenious solution based on onions of encryption, where it dynamically selects the right encryption for each attribute. The solution is both elegant and quite efficient, as the authors demonstrate on five different applications. Curious? Read their paper.

Technical Perspective: SQL on an Encrypted Database

Technical Perspective: SQL on an Encrypted Database

DOI

September 2012 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Technical Perspective: SQL on an Encrypted Database

DOI

September 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.