Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Discrete Information Sources

1Exercise 1 DMS entropy

Compute the entropy of the following distribution:

S:(s1s2s3s4s5120181418)\sV{S}{\frac{1}{2}}{0}{\frac{1}{8}}{\frac{1}{4}}{\frac{1}{8}}

1.1Solution

The entropy of a DMS is given by:

H(S)=i=1np(si)log2p(si)H(S) = - \sum_{i=1}^{n} p(s_i) \log_2 p(s_i)

In this case, we have:

H(S)=12log212+0log20+18log218+14log214+18log218=12(1)018(3)14(2)18(3)=12+38+12+38=1.75bits\begin{aligned} H(S) & = - \frac{1}{2} \log_2 \frac{1}{2} + 0 \log_2 0 + \frac{1}{8} \log_2 \frac{1}{8} + \frac{1}{4} \log_2 \frac{1}{4} + \frac{1}{8} \log_2 \frac{1}{8} \\ & = - \frac{1}{2} (-1) - 0 - \frac{1}{8} (-3) - \frac{1}{4} (-2) - \frac{1}{8} (-3) \\ & = \frac{1}{2} + \frac{3}{8} + \frac{1}{2} + \frac{3}{8} \\ & = 1.75 \, \text{bits} \end{aligned}

2Exercise 2 Optimal questions

Consider the following game: I think of a number between 1 and 8 (each number has equal chances), and you have to guess it by asking yes/no questions.

Answer the following questions:

2.1Solution

a). How much uncertainty does the problem have?

The problem can be modeled as a DMS with 8 equiprobable messages:

S:(s1s2s3s4s5s6s7s81818181818181818)\sVIII{S}{\fIoVIII}{\fIoVIII}{\fIoVIII}{\fIoVIII}{\fIoVIII}{\fIoVIII}{\fIoVIII}{\fIoVIII}

The uncertainty of the problem is given by the entropy of this source:

H(S)=i=1818log218=3bitsH(S) = - \sum_{i=1}^{8} \frac{1}{8} \log_2 \frac{1}{8} = 3 \, \text{bits}

b). How is the best way to ask questions? Why?

The best way to ask questions is to ask questions that split the set of possible numbers in two sets of equal probability. This is because every answer you obtain to a question behaves like a binary source, and this provides maximum information when the probabilities are equal:

S:(YesNo1212)\snII{S}{Yes}{\frac{1}{2}}{No}{\frac{1}{2}}

Thus, the best way to ask questions is to maximize the entropy of the source at each step.

Optimal decision tree for guessing a number between 1 and 8

Figure 1:Optimal decision tree for guessing a number between 1 and 8

c). On average, what is the number of questions required to find the number?

With the optimal questions, the number of questions required to find the number is always 3. This is because the entropy of the source is 3 bits, and the entropy of an answer is always 1 bit.

d). What if the questions are not asked in the best way?

This is just discussion during lecture.

If questions are not asked in the best way, the Yes/No answers will not have equal probabilities, and therefore the answers will provide, on average, less than 1 bit of information. Therefore the number of questions required to find the number will be more than three.

3Exercise 3 Optimal decision tree

What is the optimal decision tree for guessing a number chosen according to the following distribution:

S:(s1s2s3s412141818)\sIV{S}{\fIoII}{\fIoIV}{\fIoVIII}{\fIoVIII}

3.1Solution

The difference with the previous case is that the probabilities are not equal. But we still use the same principle: we want to maximize the entropy at each step, which means that we want to split the set of possible numbers in two sets of equal probability. This corresponds to the following optimal decision tree:

Optimal decision tree for guessing a number between 1 and 4

Figure 2:Optimal decision tree for guessing a number between 1 and 4

4Exercise 4 Optimal decision tree

What is the optimal decision tree for guessing a number chosen according to the following distribution:

S:(s1s2s3s40.140.290.40.17)\sIV{S}{0.14}{0.29}{0.4}{0.17}

4.1Solution

We still use the same principle as before. However, we cannot split the set of probabilities into equal amounts every time. Instead, we try to split them into sums which are as close to each other as possible, even though a perfect split is not always possible.

A good way to ask questions is shown below:

Optimal decision tree for guessing a number between 1 and 4

Figure 2:Optimal decision tree for guessing a number between 1 and 4

5Exercise 5 DMS

A DMS has the following distribution

S:(s1s2s3s4s5120181418)\sV{S}{\frac{1}{2}}{0}{\frac{1}{8}}{\frac{1}{4}}{\frac{1}{8}}

5.1Solution

a). Compute the information of message s1s_1, s2s_2 and s3s_3

The information of a message is given log2p(si)- \log_2 p(s_i), so we have:

i(s1)=log212=1biti(s2)=log20=bitsi(s3)=log218=3bits\begin{aligned} i(s_1) & = - \log_2 \frac{1}{2} = 1 \, \text{bit} \\ i(s_2) & = - \log_2 0 = \infty \, \text{bits} \\ i(s_3) & = - \log_2 \frac{1}{8} = 3 \, \text{bits} \end{aligned}

b). Compute the average information of a message of the source

The average information of a message is the entropy:

H(S)=i=15p(si)log2p(si)=1.75bitsH(S) = - \sum_{i=1}^{5} p(s_i) \log_2 p(s_i) = 1.75 \, \text{bits}

c). Compute the efficiency, absolute redundancy and relative redundancy of the source

For a DMS with 5 messages, the maximum entropy is log25=2.32bits\log_2 5 = 2.32 \, \text{bits}.

Therefore we have:

η=H(S)Hmax=1.752.32=...R=HmaxH(S)=2.321.75=0.57bitsρ=HmaxH(S)Hmax=0.572.32=...\begin{aligned} \eta &= \frac{H(S)}{H_{max}} = \frac{1.75}{2.32} = ... \\ R &= H_{max} - H(S) = 2.32 - 1.75 = 0.57 \text{bits} \\ \rho &= \frac{H_{max} - H(S)}{H_{max}} = \frac{0.57}{2.32} = ... \end{aligned}

d). Compute the probability of generating the sequence s1s4s3s1s_1 s_4 s_3 s_1

Since the source is memoryless, the messages are independent, and the probability of generating the sequence is the product of each message probability:

P(s1s4s3s1)=p(s1)p(s4)p(s3)p(s1)=12141812=1128P(s_1 s_4 s_3 s_1) = p(s_1) \cdot p(s_4) \cdot p(s_3) \cdot p(s_1) = \frac{1}{2} \cdot \frac{1}{4} \cdot \frac{1}{8} \cdot \frac{1}{2} = \frac{1}{128}

6Exercise 6 Kullback-Leibler distance

Compute the KL distance between the following two probability distributions:

P=[0      0      0      1],                  Q=[0.1      0.05      0.05      0.8]P = [0 \;\;\; 0 \;\;\; 0 \;\;\; 1], \;\;\;\;\;\;\;\;\; Q = [0.1 \;\;\; 0.05 \;\;\; 0.05 \;\;\; 0.8]

6.1Solution

Recall the definition of the KL distance:

DKL(P,Q)=i=1npilog2piqiD_{KL}(P,Q) = \sum_{i=1}^{n} p_i \log_2 \frac{p_i}{q_i}

In our case, we have:

DKL(P,Q)=0log200.1+0log200.05+0log200.05+1log210.8=0+0+0+10.3219=0.3219bits\begin{aligned} D_{KL}(P,Q) & = 0 \log_2 \frac{0}{0.1} + 0 \log_2 \frac{0}{0.05} + 0 \log_2 \frac{0}{0.05} + 1 \log_2 \frac{1}{0.8} \\ & = 0 + 0 + 0 + 1 \cdot 0.3219 \\ & = 0.3219 \, \text{bits} \end{aligned}

Again, we have used the limit

limp0plog2pq=0\lim_{p \to 0} p \log_2 \frac{p}{q} = 0

for the terms where pi=0p_i = 0.

7Exercise 7 Source with memory

This is a single exercise with many questions.

Consider a discrete, complete, information source with memory, with the graphical representation given below.

Graphical representation of the source

Figure 4:Graphical representation of the source

The states are defined as follows:

State

Definition

S1S_1

s1s1s_1s_1

S2S_2

s1s2s_1s_2

S3S_3

s2s1s_2s_1

S4S_4

s2s2s_2s_2

Questions:

7.1Solution

a. What are the values of xx and yy?

We find the values of xx and yy from the condition that the sum of all probabilities leaving a state must be 1, because the source must go to some state after emitting a message (including the case of a self-loop).

Therefore we have x=34x = \frac{3}{4} and y=58y = \frac{5}{8}.

b. What are the memory order, mm, and the number of messages of the source, nn?

We take a look at the graphical representation of the source and the definition of the states. From the state definition it is clear that there are only two messages, s1s_1 and s2s_2, so n=2n = 2.

The memory order is the number of messages which define a state, so also 2, m=2m = 2.

c. Write the transition matrix [T][T];

Since there are 4 states, [T][T] is a 4×44 \times 4 matrix, with every element pijp_{ij} being the transition probability from state SiS_i to state SjS_j.

[T]=[121200001434583800001212][T] = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & \frac{3}{4} \\ \frac{5}{8} & \frac{3}{8} & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix}

Note how the sum of every row is equal to 1.

d. What is the probability of generating s1s_1 if the current state is S3S_3?

If the current state is S3S_3 it means that the last two messages were s2s_2 followed by s1s_1 (s1s_1 is the last one, and s2s_2 is the one before). If we generate a new message s1s_1, the last two messages will become s1s1s_1s_1, i.e. state S1S_1.

Therefore, generating s1s_1 in state S3S_3 means transitioning from state S3S_3 to state S1S_1. We read this value from the matrix (or graph) as 58\frac{5}{8}.

e. If the initial state is S4S_4, what is the probability of generating the sequence s1s2s2s1s_1 s_2 s_2 s_1?

This is an extension of the previous question. We start in state S4S_4, generating s1s_1 will move to state S3S_3, then generating s2s_2 will move to state S2S_2, then generating s2s_2 moves to state S4S_4, and finally generating s1s_1 moves to state S3S_3. Thus we go through the sequence of states S4S3S2S4S3S_4 \rightarrow S_3 \rightarrow S_2 \rightarrow S_4 \rightarrow S_3.

Sequence of states when generating s_1 s_2 s_2 s_1 from state S_4

Figure 5:Sequence of states when generating s1s2s2s1s_1 s_2 s_2 s_1 from state S4S_4

Each transition has a probability, found in the transition matrix, and we multiply them together to obtain the probability of the chain:

P=P(S4S3)P(S3S2)P(S2S4)P(S4S3)=P(S3S4)P(S2S3)P(S4S2)P(S3S4)=12383412=9128\begin{aligned} P &= P(S_4 \rightarrow S_3) \cdot P(S_3 \rightarrow S_2) \cdot P(S_2 \rightarrow S_4) \cdot P(S_4 \rightarrow S_3) \\ &= P(S_3|S_4) \cdot P(S_2|S_3) \cdot P(S_4|S_2) \cdot P(S_3|S_4) \\ &= \frac{1}{2} \cdot \frac{3}{8} \cdot \frac{3}{4} \cdot \frac{1}{2} \\ &= \frac{9}{128} \end{aligned}

f. Compute the entropy in state S4S_4;

In state S4S_4, the probabilities are the ones in the 4th row of the transition matrix. There are only two messages which can be generated, s1s_1 and s2s_2, but the other two zeros do no harm anyway to the entropy.

[001212][0 \quad 0 \quad \frac{1}{2} \quad \frac{1}{2}]

The entropy in state S4S_4 is therefore the entropy computed for this distribution:

H(S4)=12log21212log212=1bitH(S_4) = - \frac{1}{2} \log_2 \frac{1}{2} - \frac{1}{2} \log_2 \frac{1}{2} = 1 \, \text{bit}

g. Compute the global entropy of the source;

First we compute the entropy in every state, then we compute the stationary probabilities, and then we compute the average.

The entropy in every state is:

H(S1)=12log21212log212=1bitH(S2)=14log21434log234=0.8113bitsH(S3)=58log25838log238=0.9544bitsH(S4)=12log21212log212=1bit\begin{aligned} H(S_1) &= - \frac{1}{2} \log_2 \frac{1}{2} - \frac{1}{2} \log_2 \frac{1}{2} = 1 \, \text{bit} \\ H(S_2) &= - \frac{1}{4} \log_2 \frac{1}{4} - \frac{3}{4} \log_2 \frac{3}{4} = 0.8113 \, \text{bits} \\ H(S_3) &= - \frac{5}{8} \log_2 \frac{5}{8} - \frac{3}{8} \log_2 \frac{3}{8} = 0.9544 \, \text{bits} \\ H(S_4) &= - \frac{1}{2} \log_2 \frac{1}{2} - \frac{1}{2} \log_2 \frac{1}{2} = 1 \, \text{bit} \end{aligned}

Next, we compute the stationary probabilities from the system:

[p1,p2,...pN][T]=[p1,p2,...pN][p_1, p_2, ... p_N] \cdot [T] = [p_1, p_2, ... p_N]

which, in our case, means the following system of equations, written either in matrix form or as separate equations:

[p1p2p3p4][121200001434583800001212]=[p1p2p3p4][p_1 \quad p_2 \quad p_3 \quad p_4] \cdot \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & \frac{3}{4} \\ \frac{5}{8} & \frac{3}{8} & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix} = \begin{bmatrix} p_1 \quad p_2 \quad p_3 \quad p_4 \end{bmatrix}

i.e.:

{12p1+58p3=p112p1+38p3=p214p2+12p4=p334p2+12p4=p4\begin{cases} \frac{1}{2}p_1 + \frac{5}{8} p_3 &= p_1 \\ \frac{1}{2}p_1 + \frac{3}{8} p_3 &= p_2 \\ \frac{1}{4}p_2 + \frac{1}{2} p_4 &= p_3 \\ \frac{3}{4}p_2 + \frac{1}{2} p_4 &= p_4 \end{cases}

However, this is not a linear independent system, and one equation is dependent on the others. Therefore, we eliminate one of the equations (any one, which ever looks harder), and we replace it with p1+p2+p3+p4=1p_1 + p_2 + p_3 + p_4 = 1.

Eliminating the second equation in the system results in:

{12p1+58p3=p112p1+38p3=p214p2+12p4=p334p2+12p4=p4p1+p2+p3+p4=1\begin{cases} \frac{1}{2}p_1 + \frac{5}{8} p_3 &= p_1 \\ \sout{ \frac{1}{2}p_1 + \frac{3}{8} p_3} &\sout{= p_2} \\ \frac{1}{4}p_2 + \frac{1}{2} p_4 &= p_3 \\ \frac{3}{4}p_2 + \frac{1}{2} p_4 &= p_4 \\ p_1 + p_2 + p_3 + p_4 &= 1 \end{cases}

Solving this system through any method gives the stationary probabilities:

p1=...p2=...p3=...p4=...\begin{aligned} p_1 &= ... \\ p_2 &= ... \\ p_3 &= ... \\ p_4 &= ... \end{aligned}

Finally, we compute the global entropy of the source as the average of the state entropies, weighted by the stationary probabilities:

H(S)=p1H(S1)+p2H(S2)+p3H(S3)+p4H(S4)=\begin{aligned} H(S) &= p_1 H(S_1) + p_2 H(S_2) + p_3 H(S_3) + p_4 H(S_4) \\ &= \dots \end{aligned}

h. If the source is initially in state S2S_2, in what states and with what probabilities will the source be after 2 messages?

We use the general formula which gives the probabilities at time (n+1)(n+1) based on the probabilities at time nn:

[p1(n),p2(n),...,pN(n)][T]=[p1(n+1),p2(n+1),...,pN(n+1)][p_1^{(n)}, p_2^{(n)}, ... , p_N^{(n)}] \cdot [T] = [p_1^{(n+1)}, p_2^{(n+1)}, ... , p_N^{(n+1)}]

In our case, the initial probabilities are [0,1,0,0][0, 1, 0, 0], because we are told the source is initially in state S2S_2. After 2 messages, it means we multiply twice with [T][T]:

[0100][T][T]=[p1(2),p2(2),p3(2),p4(2)]\begin{bmatrix} 0 & 1 & 0 & 0 \end{bmatrix} \cdot [T] \cdot [T]= [p_1^{(2)}, p_2^{(2)}, p_3^{(2)}, p_4^{(2)}]

Therefore we have:

[p1(2),p2(2),p3(2),p4(2)]=[0100][121200001434583800001212][121200001434583800001212]=[001434][121200001434583800001212]=[5323323838]\begin{aligned} [p_1^{(2)}, p_2^{(2)}, p_3^{(2)}, p_4^{(2)}] &= \begin{bmatrix} 0 & 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & \frac{3}{4} \\ \frac{5}{8} & \frac{3}{8} & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix} \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & \frac{3}{4} \\ \frac{5}{8} & \frac{3}{8} & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix}\\ &= \begin{bmatrix} 0 & 0 & \frac{1}{4} & \frac{3}{4} \end{bmatrix} \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & \frac{3}{4} \\ \frac{5}{8} & \frac{3}{8} & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix}\\ &= \begin{bmatrix} \frac{5}{32} & \frac{3}{32} & \frac{3}{8} & \frac{3}{8} \end{bmatrix} \end{aligned}

The most likely state in which the source will be after 2 messages is S3S_3 or S4S_4, and the least likely state is S2S_2.