First of all, a web application is a two-part program running in a browser (client side) and on a web server (server side). It can also be regarded as a finite state machine that changes from one state to another according to user input and computations that are processed by the server. The communication between the client and the server therefore plays an important role, however it could also lead to substantial information leakage.
Let's illustrate this with an example, say online tax payment. The classic scenario is that a user sends a request to the server, the sever then responds with an address where a form can be found, the user fills it in and sends it back to the server, and finally the server does the computations and responds with a new page according to the information that has been previously provided. Note that these steps can be repeated any number times, according to the particularities of each user and the detail depth that is required. Let's assume the first form the user had to fill in contains a question about the income, with a two choice answer: under 40k, which would imply no tax, and over 40k, which implies taxation. Although the communication is itself protected by protocols such as TLS/SSL, information about the user's choice and thus their income can still be inferred, since an attacker (eavesdropper) can get access to the state the web app goes into by analysing the traces, i.e. the size and number of packets that are exchanged between the client and the server. Basicly, these amounts differ for the two choices: the over 40k option is one byte larger than the other. Also, the size of the webpage response differs depending on the user input, so the number and size of packets originating from the server will again be higher, respectively larger. One way to protect against this would be randomizing packet sizes. Another option would be to increase the entropy (number of available choices), but this might further imply a usability cost.
Other real life examples include healthcare, investment or websearch. The first paper we looked at, "Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow" (Chen et al., May 2010) [pdf] reports some information leaks in several real-world web apps under pseudonyms.
The challenge of this paper is assessing how vulnerable a web app is, and clearly the main criterion so far is the distinguishability of the two options. A first attempt is introducing a rough heuristic, namely the density of a trace which is given by:
density(T) = |T| / (max(T)-min(T))
The same authors come up with a better attempt at quantifying in "Sidebuster: Automated Detection and Quantification of Side-Channel Leaks in Web Application Development" (Chen et al., Oct. 2010) [pdf]. Here they introduce a tool designed to work out which inputs leak information, and quantify entropy loss after an observation, where entropy is a measure of uncertainty of a random variable.
Going back to our example, supposing the user's income is larger than 40k, by selecting the appropriate option the web application has now transitioned from state S to state S' where the user again has to answer a question with two possible answers, e.g. "Are you self employed?". The entropy before SCA is H(A)=1, and the conditional entropy (after observing) H(A|O)=0, i.e. no remaining secrecy. Still, even assuming that the attacker can see the same observation O1 for whatever choice the user selects, if he sees O1 twice he will know which path has been selected and thus again gain access to information.
However, their white-box approach for leakage quantification requires that all traffic data is invariant, i.e. the complete input space is known. In the black box setting for an attacker in practice this isn't usually going to be the case, and hence measurements he/she can make are likely to be noisy.
Finally, the third article we discussed was "Automated Black-Box Detection of Side-Channel Vulnerabilities in Web Applications" (Chapman and Evans, Oct. 2011) [pdf]. Here, the authors describe a black-box tool for detecting and quantifying the severity of side-channel vulnerabilities by analyzing network traffic over repeated crawls of a web application.
Four distance metrics are introduced, namely:
- Total-Source-Destination - returns the summed difference of bytes transferred between each party, i.e. the difference in the number of bytes transfered to the client, added to the difference in the number of bytes transfered to the server.
- Size-Weighted-Edit-Distance - adds robustness by tracking the control-flow of the transfered information. Every transfer is treated as a symbol in a string of the sequence of transfers, thus the sequence of the transferred data matters.
- Edit-Distance - similar to previous metric; reveals how well an attacker can do against a perfect packet-padding strategy.
- Random - serves as a baseline in order to judge the distinguishability gained from the distance metrics beyond the assumption that the adversary can distinguish page breaks.