Arthur Musgrove Home

Protect Vital Data End-to-End With Interior Encryption

Synopsis: Twitter exposed the plaintext passwords of 330 million users because those passwords were being written in to log files. Stepping back, the design flaw is larger than simply injudicious logging: it is that in most systems, only the 'last hop' is actually encrypted. In this article, we will explore the problem as well as solid solutions to it. Example code is included.

Updated: 19/August/2018

Overview

In May of this year, Twitter warned that the plaintext passwords of all 330 million users had been exposed. How did this happen given that twitter was following all the best practices, including storing passwords as hashes and using TLS encryption between clients and server?

In most installations, only a small part of the traffic is encrypted, and data is generally unencrypted at the first entry point into the service deivery network and dealt with in unencrypted fashion from there on. Below is a typical architecture.

As you can see, only the single hop from the client into the load balancer is actually encrypted. Everything is plaintext from there on. Twitter was good to come clean with the situation, but you can be sure it is very common. Instead of workarounds of more rigorous reviews, let's look at a solid solution.

Introduction to Asymmetric Key Cryptography

This section is a lightning fast introduction to asymmetric cryptography or public/private key cryptography, in particular RSA. This is very basic and only enough to understand this security technique. If you understand asymemtric cryptography, you can safely skip this section.

Encryption is the science of coding messages to frustrate those that may try to spy on your communications. Whenever you communicate with https you are encrypting your communication with the server. This is shown to the user with the little lock symbol on the address bar.

The encryption type that most people are familiar with, and is used in so many aspects of security, is the shared secret method of cryptography. In this mode, we have preagreed our keys. Remember those secret decoder rings in boxes of cereal from when you were a kid? Those are shared secret decoders. This type of cryptography is called symmetric cryptography.

Shared secret encryption systems are the most commonly used because they are secure and they are fast. The only real problem is that both parties must keep the secret, well, secret. This is a problem in the world of the Internet, because we want encryption coming from the client machine, and that is by definition an insecure environment.

Asymmetric cryptography uses 2 keys: one public and one private. One key can encrypt data, and the other decrypts it. So you can give one key to the world to encrypt data destined for you, with the confidence that only you will be able do decrypt and read the data.

A full explanation of asymmetric cryptography and all of its various uses, including confidentiality, integrity and nonrepudiation, is well outside the scope of this article. Suffice it to say there are many other uses than the simple case I am including here.

A Virtual Tunnel

While vital data is passed around the various parts of your architecture, it is usually only needed by a specific component. For instance, the username and password may pass through several components but it is only a single component - the authentication server - that actually needs the password. And it only needs it for a fleeting instant and that plaintext password should not be persisted anywhere.

Wouldn't it be ideal if vital information could be sealed at the point it is entered and can only be unsealed when needed by the component that needs it, without it being accessible to any intermediate system? Well, you can do exactly that.

The example here is using RSA encryption, but it can be equally done with any asymmetric algorithm, including the newer elliptic curve cryptography (ECC).

If you are unfamiliar with ECC, here is a quick explanation. RSA depends on the difficulty of factoring large prime numbers, where ECC depends on the difficulty of back-solving eliptical equations for the standard equation:
y2 = x3 + ax + b.

The basic process is simple.

  1. Generate an RSA key pair
  2. Distribute the public key to clients, usually within the JavaScript file distributed to clients
  3. Client encrypts the subset of encrypted data and includes it within the payload (perhaps only a subset of JSON fields are encrypted while most are 'normal')
  4. The encrypted vital data is passed around the system, and can only be decrypted by the ultimate component that needs the data.

Working Example

To illustrate this design I've done a quick code example for you. On the client side, this uses an excellent RSA library from a Stanford University student which is available here. Other than that, straight Java (server) and JavaScript (client) is used. Explore the example. There is a README.txt file to explain the contents.

Summary

The traditional design - encrypted upfront then plaintext all through the backend - has shown weakness and violates the defence-in-depth principle. By implementing interior encryption you greatly improve the security of the system and dramatically reduce attack surface.