## Sunday, December 7, 2014

### 52 Things: Number 9: What are Shannon's definitions of entropy and information?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post tries to introduce some fundamental concepts in Information Theory: What are Shannon's definitions of entropy and information?

Information Theory was founded by Claude E. Shannon in 1948. It was originally developed to study signal processing but its application has been broadened to various disciplines through decades. This blog is intended to briefly introduce two fundamental concepts, entropy and information. If you are interested in more details, I personally suggest you to find more in [1].

Entropy

Entropy is a measurement to evaluate the uncertainty[3] of one or more variables.

Assume we are investigating the first web page people visit when they open a web browser. We use samples from two groups of people: 4 cryptographer from Bristol Cryptogroup and 4 passenger grabbed at Bristol coach station. Let's make a more ambitious assumption that the cryptographers always visit http://bristolcrypto.blogspot.co.uk/ first when they open a web browser, while the others each visits a different portal.

Now let's evaluate the uncertainty of their answers: apparently, the answers from cryptographers are quite certain (low uncertainty) whilst it can hardly be guessed if an answer is from a passenger (high uncertainty). In other words, we  say the answers in the cryptographers group has a low entropy but  that in the passengers group has a high entropy.

So one of the most remarkable inventions of Shannon's is his definition of entropy(Shannon's Entropy):

H = -{\sum}_{i}{p_i \log_b{p_i}}

where $p_i$ is the probability of a message (an answer in previous example) appears. In computer science, we usually use $b = 2$ (bits).

If we compute the entropy in our example, we will have:

\begin{eqnarray}

H_{cryptographer} = -{\sum}_{1}^{4}{1 \log_2{1}} = 0\\

H_{passenger} = -{\sum}_{1}^{4}{{1 \over 4} \log_2{1 \over 4}} = 2

\end{eqnarray}

So the passengers' answer do have a higher entropy than the cryptographers'!

Information

Formally, the definition of Shannon's information is given in [2] as:

" Information is a measure of one's freedom of choice when one selects a message."

To explain this, let's make a minor modification to our previous example. Let's grab another 4 passengers from Bristol train station and assume their answers are also random portals just like the passengers' in coach station.

Here is the question: Given an answer $y$, can you tell which group the answer is from?

We can instantly tell that the answer is from our cryptographer group if $y$ is http://bristolcrypto.blogspot.co.uk/. However, we will be struggling if $y$ is a portal; therefore, we could say the message of http://bristolcrypto.blogspot.co.uk/ contains more information (less freedom) than a portal (more freedom).

So how does it relate to entropy?

Extending the definition of entropy, we define the Conditional Entropy as:

$$H(Y|X) = \sum_{x \in X}{p(x)H(Y|X = x)}$$

which describes the entropy of $Y$ when given condition $X=x$. To make it more explicitly, since entropy is the uncertainty of a variable; hence the previous definition of conditional entropy is in fact the uncertainty of $Y$ when given the "clue"(condition) $X$.

Observation: consider two variables $X$ and $Y$. If $X$ contains only a minimal information of $Y$, then given an exact value of $X$ should not help us much on deducing the value of $Y$, that is, it does not obviously reduce the uncertainty of $Y$; on the other hand, if $X$ contains essential information of $Y$, then the entropy of $Y$ is expected to be low when $X$ is given. Therefore, the conditional entropy can be viewed as a rational measurement to the information $X$ has of $Y$!

Another important measurement is called Mutual Information. It is a measure of mutual dependency among two variables. One way to define it is the loss of entropy(uncertainty) when given a condition:

$$I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)$$

Cryptographic Example
The concepts of information theory is widely used in cryptography. A classic example is to view a cryptographic process as a channel with plaintext and ciphertext being input and output respectively. Research of side channel analysis also benefits from the usage of information theory.

References

[1] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory
2nd Edition. Wiley-Interscience, 2 edition, July 2006.

[2] S. Vajda, Claude E. Shannon, and Warren Weaver. The mathematical
theory of communication. The Mathematical Gazette, 34(310):312+,
December 1950.

[3] http://en.wikipedia.org/wiki/Entropy_%28information_theory%29