Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Error control coding

1Introduction

Remember the block scheme of a communication system:

Communication system

Figure 1:Communication system

Error control coding is the second task of coding in a communication system, and refers to the protection of information against channel errors.

During any transmission, the data (symbols, bits) go through a transmission channel, The transmission channel is not ideal, and in particular it introduces errors. For example, some bit values may be changed due to electrical noise: a bit is sent as 0, but received as 1, or viceversa. This is a problem in all communication systems, and to some degree it is unavoidable, in general, because electrical noise is unavoidable.

However, we usually require that all data is received correctly. For example, when you download a file from the internet, you want to be sure that the file is exactly the same as the original. For example, a single erroneous bit can make a zip file unreadable.

Error control coding refers to all the algorithms which have been developed with th purpose of protecting against errors in the data.

1.1Error model

We first need to clarify the assumptions and the scope of the current discussion.

First, we consider only binary codes and binary channels, i.e. all the data is represented using binary symbols {0,1}\left\lbrace 0,1 \right\rbrace.

An error is defined as bit that has changed from 0 to 1 or from 1 to 0 while going through channel. In these lectures we consider only these types of errors, and not, for example, situations where bits are lost.

Errors can appear in different ways:

Packets of errors are much harder to correct than independent errors, because the whole context in that region of the bitstream is lost.

When considering that a bit changes its value due to an error, it is very helpful to treat this as a mathematical operation. This will allow us to use mathematical tools to design error control mechanisms.

We introduce therefore the modulo-2 sum operation, denoted \oplus, with the following properties:

Besides the module-2 sum, we can also have the normal product operation between bits (i.e. 0×1=00 \times 1 = 0, 1×1=11 \times 1 = 1). Together, they form a modulo-2 arithmetic system known as Galois field GF(2), which supports operations with bits in a way similar to normal arithmetic:

When a bit changes its value, we may think that it undergoes modulo-2 sum with 1. Given a bit bb and its complement b\overline{b}, we have:

b1=bb \oplus 1 = \overline{b}

irrespective of the value of bb (0 or 1)

When a bit remains the same, we may think that it undergoes modulo-2 sum with 0, since:

b0=bb \oplus 0 = b

for any bit bb.

We can extend this to a sequence of bits, by performing the modulo-2 sum bit by bit. Suppose we transmit a bit sequence c\mathbf{c}:

c=0100100111\mathbf{c} = 0100100111

which is received, with errors on the 2nd, 3rd, 4th and 9th positions, as the sequence r\mathbf{r}:

r=0011100101\mathbf{r} = 0011100101

we can consider r\mathbf{r} as being obtained by the modulo-2 sum of c\mathbf{c} with an error sequence e\mathbf{e}:

r=ce\mathbf{r} = \mathbf{c} \oplus \mathbf{e}

The error sequence e\mathbf{e} has values of 1 wherever there is an error, and 0 otherwise:

e=0111000010\mathbf{e} = 0111000010
Channel error model

Figure 2:Channel error model

Therefore, we model the effect of the channel as this modulo-2 sum with the error sequence. This constitutes our model for the transmission channel, for the purpose of error control coding.

1.2Error detection vs error correction

What can we do about errors? In general there are two approaches

  1. Error detection means finding out if there is any error in the received sequence, but not exactly which bits have the errors.

    Since we don’t know exactly where are the errors, we cannot correct the bits. Instead, the whole sequence is discarded, and the sender is asked to retransmit it.

    This is possible in live communication systems, where the sender can retransmit the data, as for example in the TCP/IP protocol used on the internet.

  2. Error correction means finding out if there is any error in the received sequence and find out exactly which bits have errors. Once we locate the errors, we can correct them, by inverting the bits, and obtain the original data back.

    This is especially useful when we cannot retransmit the data, for example when the data is stored on a physical medium (e.g. on a disk).

Error detections is easier to achieve than error correction, because it doesn’t require to know exactly where the errors are. However, there exist situations where error correction is necessary,

1.3Definitions and overview of error control coding

We first give a general schematic of the error control process, with the purpose of fixing general definitions and notation. The process is tailored to block codes, which are the most common type of error control codes.

  1. The sender needs to send a sequence of kk bits, which is the information word:

    i=i1i2...ik\mathbf{i} = i_1i_2...i_k
  2. For each possible information word i\mathbf{i}, the coder assigns a codeword c\mathbf{c} of length n>kn > k:

    c=c1c2...cn\mathbf{c} = c_1c_2...c_n
  3. The codeword is sent on the channel instead of the original information word

  4. The receiver receives a sequence rc\mathbf{r} \approx \mathbf{c}, known ad the received word, with possible errors:

    r=r1r2...rn\mathbf{r} = r_1 r_2 ... r_n
  5. The decoding algorithm detects and/or corrects thformse errors.

    • If it performs only detection, it reports whether there are errors or not in r\mathbf{r}

    • If it performs correction, it tries to find the correct codeword c^\hat{\mathbf{c}} and information word i^\hat{\mathbf{i}} from r\mathbf{r}.

    Both detection and correction may fail, if the errors are too numerous. This is why we denote the output of the decoding algorithm as c^\hat{\mathbf{c}} and i^\hat{\mathbf{i}}, to indicate that they are not necessarily the same as the original c\mathbf{c} and i\mathbf{i}.

We provide below a list of definitions.

Further examples: at blackboard

1.4Naive examples

A first example: parity bit

One of the simplest ways to do error control coding is to add a parity bit to the information word.

The parity bit is defined as the module-2 sum of all the bits in the information word, i.e. it is 0 if there are an even number of 1’s in the information word, and 1 if there are an odd number of 1’s.

The codeword is obtained by appending the parity bit at the end of the information word.

Consider an information word of length k=8k = 8 bits, to which we append a parity bit, resulting in a codeword of length n=9n = 9 bits.

The receiver checks if the parity bit is still equal to the modulo-2 sum of the information bits.

What can we say about this coding procedure:

The coding algorithm could be improved by adding more parity bits, (see example at blackboard)

A second example: repetition code

Let’s discuss another basic way of doing error control coding, the repetition code.

The codeword is created by repeating the information word NN times in a row. As an example, for N=5N=5, to transmit i=100\mathbf{i} = 100 we actually send c=100100100100100\mathbf{c} = 100100100100100 on the channel

The receiver uses a majority rule for decoding: knowing that each bit is repeated nn times on certain locations, it decides that the correct value of each bit is the one that appears most often among those locations.

For example, if it receives r=101100100110000\mathbf{r} = 101100100110000, the decoder views the sequence as 5 groups of 3 bits, and takes the majority value in each column:

r=101100100110000i^=100\begin{aligned} \mathbf{r} = &101 \\ &100 \\ &100 \\ &110 \\ &000 \\ \hline \hat{\mathbf{i}} = &100 \end{aligned}

Let’s analyze this code:

Redundancy

We can see in these examples that error control coding works by adding redundancy to the data, i.e. adding additional bits, artificially created according to some rule.

The decoding algorithm will then check if the rule still holds:

Note that error control coding adds redundancy to the data, while source coding aims to reduce redundancy. Is there a contradiction?

Well, no, because the two types of redundancy are different. Source coding reduces existing redundancy from the data, which served no purpose. Error control coding adds redundancy in a controlled way, with a purpose. The purpose is for the decoder to be able to check if the data is correct or not.

In practice, source coding and error control coding are done sequentially, in an independent manner.

  1. First perform source coding, eliminating redundancy in representation of data

  2. Then perform error control coding, adding redundancy for protection

1.5Coding rate and channel capacity

This is a preview of something which will be discussed in the next chapters, but relates to the coding rate of a code, so we can introduce it here.

In Chapter IV we will study transmission channels, which are mathematical model of how information is handled from the sender to the receiver. In that chapter we will find that each channel has a certain capacity value.

The channel capacity is the maximum amount of information than can be sent over the channel with one symbol, on average.

For example, a binary channel (working with bits 0 and 1) may have capacity C=0.8C = 0.8 bits, i.e. for each physical bit delivered, the mathematical information actually transmitted is 0.8.

The channel capacity is related to the coding rate of error control codes via Shannon’s noisy channel coding theorem.

The rigorous proof of the theorem is too complex to present here. Some key ideas of the proof are:

In layman terms, the theorem says that:

As a numerical example, suppose we send bits on a binary channel with capacity 0.7 bits/message.

The theorem makes it clear when it is possible to fix all errors, giving a clear limit to the coding rate required for this to happen. Moreover, it guarantees that such a code exists, but it doesn’t tell us how to find it. It gives no clue of how to actually find the code in practice, only some general principles derived from the proof:

In practice, we cannot use infinitely long codewords, and random codewords are no efficient to implement, so we will only get a good enough code.

2Analyzing linear block codes with the Hamming distance

2.1The Hamming distance

We start from a practical idea which we can deduce from what we discussed until now.

If a codeword c1\mathbf{c_1} has errors and, as a consequence, it becomes identical to another codeword c2\mathbf{c_2}, any decoder will be fooled into thinking everything is correct, and it has received a correct codeword c2\mathbf{c_2}. Thus the errors go undetected.

Therefore, we want the codewords to be as different as possible from each other. The Hamming distance is a way of computing how different two codewords are.

The Hamming distance of two binary sequences a and b having the same length nn, dH(a,b)d_H(\mathbf{a}, \mathbf{b}), is the total number of bit differences between them

dH(a,b)=i=1Naibid_H(\mathbf{a}, \mathbf{b}) = \sum_{i=1}^N a_i \oplus b_i

The Hamming distance shows how many bit changes are needed to convert one sequence into the other one.

The Hamming distance satisfies the 3 properties of a metric function:

  1. dH(a,b)0      a,bd_H(\mathbf{a},\mathbf{b}) \geq 0 \;\;\; \forall \mathbf{a},\mathbf{b}, with dH(a,b)=0a=bd_H(\mathbf{a},\mathbf{b}) = 0 \Leftrightarrow \mathbf{a} = \mathbf{b}

  2. dH(a,b)=dH(b,a),a,bd_H(\mathbf{a},\mathbf{b}) = d_H(\mathbf{b},\mathbf{a}), \forall \mathbf{a},\mathbf{b}

  3. dH(a,c)dH(a,b)+dH(b,c),a,b,cd_H(\mathbf{a},\mathbf{c}) \leq d_H(\mathbf{a},\mathbf{b}) + d_H(\mathbf{b},\mathbf{c}), \forall \mathbf{a},\mathbf{b},\mathbf{c}

The minimum Hamming distance of a code, denoted as dHmin{d_H}_{min}, is the minimum Hamming distance between any two codewords c1\mathbf{c_1} and c2\mathbf{c_2}. We find it by taking all the pairs of codewords, computing the Hamming distance between them, and taking the minimum value.

The minimum Hamming distance of a code guarantees that all codewords are at least dHmin{d_H}_{min} bits apart from each other.

2.2Nearest-neighbor decoding scheme

We introduce a simple decoding scheme, called nearest-neighbor decoding, which serves as the underlying approach for many error control codes.

Coding

Decoding

The performance of this scheme, in terms of how many errors it can detect and correct, is related to the minimum Hamming distance of the code. As the next theorem shows, a larger dHmin{d_H}_{min} means better performance. Note, however, that increasing the dHmin{d_H}_{min} usually means longer codewords, i.e. smaller coding rate, i.e. more redundancy, so there is a price to pay for better performance.

The theorem gives the maximum number of errors for which the decoding scheme is guaranteed to work. If the number of errors is higher than these limits, the decoding scheme may or may not work, depending on the specific errors. We may have:

2.3Computational complexity

The nearest-neighbor decoding scheme is simple to understand and implement, and it underpins many error control codes (if not all). However, it in not always used in practice, because of a fundamental problem: it requires a lot of computational resources, especially for large codes (e.g. in practice).

To understand this, we need to use the concept of computational complexity of an algorithm, coming from computer science.

The computational complexity of an algorithm is the amount of computational resources required by the algorithm, expressed as a function of the size of the input data. In general, we are interested in the order of magnitude of the computational complexity, i.e. the dominant term in the expression, neglecting the other terms and the coefficients.

The computational complexity of an algorithm is expressed using the big-O notation, for example O(n2)\mathcal{O}(n^2), which means that the computational complexity is proportional to n2n^2, or O(nlogn)\mathcal{O}(n \log n), which means that the computational complexity is proportional to nlognn \log n. This gives an idea about how the computational complexity grows with the size of the input data, nn.

For the nearest-neighbor decoding scheme, if the number of information bits is kk, the computational complexity is O(2k)\mathcal{O}(2^k), which is absolutely huge for large kk.

O(k)=2k\mathcal{O}(k) = 2^k

This is because both error detection and correction require comparing the received word with all the possible codewords, and there are 2k2^k codewords in total, one for each possible information word. Thus, the algorithm needs to go through an array of size 2k2^k to find the correct codeword.

A computation complexity of O(2k)\mathcal{O}(2^k) means that the algorithm is very inefficient, and is not feasible in practice:

Therefore, we need ways to make decoding simpler and more efficient.

3Analyzing linear block codes with matrix algebra

3.1Vector algebra in a binary world

A short review of basic vector algebra

A vector space is a set with two operations, sum and multiplication by a constant, having the following properties:

In other words, we cannot escape from the set by summing or multiplying by a constant.

The elements of a vector space are called “vectors”.

Intuitive examples of vector spaces are Euclidian (geometrical) vector spaces: a line, points in 2D, 3D. For example, adding two vectors in a 2D plane gives another vector in the same 2D plane, and multiplying a vector by a constant gives another vector in the same 2D plane.

A basis of a vector space VV is a set of nn independent vectors e1,...en\mathbf{e_1}, ...\mathbf{e_n}, such that any vector v\mathbf{v} can be expressed as a linear combination of the basis elements

v=e1α1+...+enαn\mathbf{v} = \mathbf{e_1} \cdot \alpha_1 + ... + \mathbf{e_n} \cdot \alpha_n

A subspace is a smaller dimensional vector space embedded inside a larger vector space.

For example, a line in a 2D plane is a subspace of the 2D plane:

Another example is a 2D plane in 3D space:

How to look at matrix-vector multiplications

When we multiply a matrix with a vector (in this order), the output is a column vector which is a linear combination of the columns of the matrix. The multiplicating vector gives the coefficients of the linear combination.

When we multiply a vector with a matrix (in this order), the output is a row vector which is a linear combination of the rows of the matrix. The multiplicating vector gives the coefficients of the linear combination.

TODO: Explain with draw picture

Vector spaces can be perfectly described with matrix-vector multiplications

Any vector v\mathbf{v} can be expressed as a linear combination of the basis elements:

v=e1α1+...+enαn\mathbf{v} = \mathbf{e_1} \cdot \alpha_1 + ... + \mathbf{e_n} \cdot \alpha_n
v=[e1e2...en][α1α2...αn]\mathbf{v} = \begin{bmatrix} \mathbf{e_1} & \mathbf{e_2} & ... & \mathbf{e_n} \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ ... \\ \alpha_n \end{bmatrix}

where e1,...en\mathbf{e_1}, ... \mathbf{e_n} are column vectors.

The equation can be transposed, which would mean that all vectors become row vectors, but the usual mathematical convention is to use column vectors wherever possible.

Vector spaces of binary codewords

Furthermore the codewords of a linear block code with code length nn and information length kk is a vector subspace of the larger vector space of all binary sequences of size nn.

You can imagine the set of codewords as forming a 2D plane embedded in a 3D space. All codewords live in this plane, and all point on this plane are codewords. The larger space contains lots of other points which are not on the plane, so are not codewords, just like there are many sequences of size nn which are not codewords.

Since all codewords of a linear block code form a subspace, it follows that all codewords can be expressed as matrix-vector multiplications.

3.2The generator matrix

All codewords for a linear block code can be generated via a vector-matrix multiplication as follows:

i[G]=c\mathbf{i} \cdot [G] = \mathbf{c}
Codeword construction with generator matrix

Figure 3:Codeword construction with generator matrix

The matrix [G][G] is known as the generator matrix of the code. Its size of size k×nk \times n, and it is a “wide” matrix since we have k<nk < n. The generator matrix fully defines the code, i.e. if we know the generator matrix, we know everything about the code.

The vector i\mathbf{i} is the information word, of size kk.

Such a vector-matrix multiplication can be interpreted as follows: the resulting codeword c\mathbf{c} is a linear combination of the rows in the matrix [G][G], with the coefficients given by the information word i\mathbf{i}. This means the rows of [G][G] are a basis for the linear block code. All the vector space theory we discussed before applies here:

Note that the equation could also be transposed, i.e. use column vectors instead of row vectors, but it is more customary to use row vectors here.

This way of generating codewords guarantees that the code is linear. Indeed, we can show that the sum of two codewords is also a codeword, by construction. If we have two codewords c1\mathbf{c_1} and c2\mathbf{c_2}, corresponding to two information words i1\mathbf{i_1} and i2\mathbf{i_2}, then the sum of the two codewords is still something multiplied with [G][G], so it is still a codeword.

i1[G]=c1\mathbf{i_1} \cdot [G] = \mathbf{c_1}
i2[G]=c2\mathbf{i_2} \cdot [G] = \mathbf{c_2}
c1c2=(i1i2)[G]=codeword\mathbf{c_1} \oplus \mathbf{c_2} = (\mathbf{i_1} \oplus \mathbf{i_2}) \cdot [G] = codeword

3.3The control (parity-check) matrix

Every generator matrix [G][G] has a related control matrix [H][H] (also known as parity-check matrix) such that

0=[H][G]T\mathbf{0} = [H] \cdot [G]^T

The size of [H][H] is (nk)×n(n-k) \times n.

The two matrices [G][G] and [H][H] are related, and one can always be deduced from the other.

The control matrix [H][H] is very useful to check if a binary word is a codeword or not (i.e. for nearest neighbor error detection), based on the following theorem.

The theorem shows that all codewords generated with [G][G] will produce 0 when multiplied with [H][H]. On the other hand, all binary sequences that are not codewords will produce something different from 0 when multiplied with [H][H]. In this way, multiplcation with [H][H] can be used as a quick way to check if a binary sequence is a codeword or not. This is what error detection is all about.

The generator matrix [G][G] and [H][H] are related. In linear algebra terminology, [G][G] is the matrix which generates the kk-dimensional subspace of the codewords, and the rows of [H][H] are the “missing” dimensions of the subspace (the “orthogonal complement”)

In terms of matrix shape, [G][G] and [H][H] together form a full square matrix n×nn \times n:

3.4Matrices [G] and [H] for systematic codes

For systematic codes, [G][G] and [H][H] have special forms (known as “standard” forms)

The generator matrix can be decomposed into two parts:

The control matrix [H][H] is related to [G][G] as follows:

Thus, one matrix can be derived from the other.

3.5Interpretation as parity bits

Multiplication with [G][G] in standard form produces the codeword composed of two parts:

Each column of [G][G] corresponds to one parity bit in the codeword, and the values in the column define which bits are combined to create the parity bit.

The control matrix in standard form [H][H] checks if parity bits correspond to information bits

\smallskip

If all parity bits match the data, the result of multiplying with [H][H] is 0. Otherwise it is 0\neq 0, signaling that at least one parity bit does not check out.

Therefore, the generator & control matrices are just mathematical tools for easy computation and checking of parity bits. We’re still just computing and checking parity bits, but we do it easier with matrices.

3.6Syndrome-based error detection and correction

When considering error detection and correction, we multiply the received word r\mathbf{r} with the control matrix [H][H].

z=[H]rT\mathbf{z} = [H] \cdot \mathbf{r}^T

The vector z\mathbf{z} is called the syndrome of the received word r\mathbf{r}.

The syndrome is a linear combination of the columns of HH, according to the column-wise interpretation of the multiplication.

Codeword checking with control matrix{width=30%}

The process of error detection with matrices is therefore as follows:

  1. generate codewords with generator matrix:

    i[G]=c\mathbf{i} \cdot [G] = \mathbf{c}
  2. send codeword c\mathbf{c} on the channel

  3. a random error word e\mathbf{e} is applied on the channel

  4. receive word r=ce\mathbf{r} = \mathbf{c} \oplus \mathbf{e}

  5. compute syndrome of r\mathbf{r}:

    z=[H]rT\mathbf{z} = [H] \cdot \mathbf{r}^T
  6. Decide:

    • If z=0\mathbf{z} = 0 => r\mathbf{r} has no errors

    • If z0\mathbf{z} \neq 0 => r\mathbf{r} has errors

For error correction, we need to locate the errors. For this, observe that the syndrome z\mathbf{z} is the effect only of the error word e\mathbf{e}, and not of the codeword c\mathbf{c} (considering that r\mathbf{r} is the sum of c\mathbf{c} and e\mathbf{e}):

z=[H]rT=[H](cTeT)=[H]eT\mathbf{z} = [H] \cdot \mathbf{r}^T = [H] \cdot (\mathbf{c}^T \oplus \mathbf{e}^T) = [H] \cdot \mathbf{e}^T

We identify the error word e\mathbf{e} by searching exhaustively through all possible error words e\mathbf{e}, until we find the first one that produces the same syndrome z\mathbf{z}. Once we find it, we know where the errors are, and we can correct them.

The process is summarized as follows:

  1. Create a syndrome table:

    • for every possible error word e\mathbf{e}, compute the syndrome z=[H]eT\mathbf{z} = [H] \cdot \mathbf{e}^T

    • start with error words with 1 error (most likely), then with 2 errors (less likely), and so on

  2. Locate the syndrome z\mathbf{z} in the table, read the corresponding error word e^\mathbf{\hat{e}}

  3. Find the correct word:

    • adding the error word again will invert the errored bits back to the originals

      c^=re^\mathbf{\hat{c}} = \mathbf{r} \oplus \mathbf{\hat{e}}

Example: at blackboard

3.7Computational complexity

When using matrices, the computational complexity is much lower than the scheme based on codeword tables.

Error detection requires a multiplication with [H][H], so the complexity is O(n2)\mathcal{O}(n^2) (size of [H][H] is (nk)×n(n-k) \times n).

Error correction still needs to check all possible error words, which means bad performance, but at least we don’t need to check all codewords. In practice, other tricks are used to make it much faster (see Hamming codes for example)

3.8Conditions on [H] for error detection and correction

The performance of the error detection and correction scheme, i.e. how many errors can be detected and corrected, can be deduced from the properties of the control matrix [H][H].

For error detection, we can detect errors if the syndrome is non-zero. Whenever an error word e\mathbf{e}, when multiplied with [H][H], produces a zero syndrome, the algorithm is fooled into thinking there are no errors. Therefore, we need to make sure that the syndrome is non-zero. Considering that the syndrome is a linear combination of the columns of [H][H], with only the columns corresponding to values of 1 in the error word e\mathbf{e}, we can say:

For error correction, we use the syndrome table to identify the error word. When two different error words produce the same syndrome, the algorithm is unable to identify which is the true one. Therefore, error correction works only if the syndrome is unique:

4Hamming codes

4.1Structure of a Hamming codeword

From the structure of [H][H], we observe that Hamming codes are almost-systematic: they are systematic up to a rearrangement of bits.

[H][H] is not systematic according to our previous definition, but we can make it systematic by simply rearranging the columns in a different order. All the columns of the identity matrix II are among the columns of [H][H], just not in the correct locations as to form the identity matrix at the beginning or at the end. However, we could obtain a systematic code by a simple rearranging of the columns in [H][H], which corresponds to rearranging the bits in the codeword, which is a fairly benign operation.

For Hamming codes we use the notation (n,k) Hamming code, where:

Examples:

Just like for any linear block code, a Hamming codeword has length nn equal to the width of [H][H] (n=2r1n = 2^r - 1), and consists of information bits and control bits (also known as “parity bits”).

For example, the codeword of the Hamming(7,4) code, with the [H][H] matrix above, consists of three control bits (cxc_x) and four information bits (ixi_x), with the subscript indicating the position:

c1  c2  i3  c4  i5  i6  i7c_1 \; c_2 \; i_3 \; c_4 \; i_5 \; i_6 \; i_7

Similarly, the codeword for Hamming(15,11) has the following structure:

c1  c2  i3  c4  i5  i6  i7  c8  i9  i10  i11  i12  i13  i14  i15c_1 \; c_2 \; i_3 \; c_4 \; i_5 \; i_6 \; i_7 \; c_8 \; i_9 \; i_{10} \; i_{11} \; i_{12} \; i_{13} \; i_{14} \; i_{15}

4.2Construction of Hamming codewords

Every Hamming code has a generator matrix [G][G], just like any other linear block code, but we don’t provide it explicitly, because it is hard to remember. (we could deduce it by rearranging the columns of [H][H] to make it systematic first)

Instead, we can deduce the values from the equation system of the parity-check matrix [H][H], We know that every codeword satisfies the relation:

0=[H]cT\mathbf{0} = [H] \cdot \mathbf{c}^T

Writing this as an equation system we obtain, for example for Hamming(7,4):

{0=c4i5i6i70=c2i3i5i60=c1i3i5i7\begin{cases} 0 = c_4 \oplus i_5 \oplus i_6 \oplus i_7 \\ 0 = c_2 \oplus i_3 \oplus i_5 \oplus i_6 \\ 0 = c_1 \oplus i_3 \oplus i_5 \oplus i_7 \end{cases}

Observe that in every equation there is a single control bit appearing, which means that it must be equal to the sum of the information bits in that equation:

{c4=i5i6i7c2=i3i5i6c1=i3i5i7\begin{cases} c_4 = i_5 \oplus i_6 \oplus i_7 \\ c_2 = i_3 \oplus i_5 \oplus i_6 \\ c_1 = i_3 \oplus i_5 \oplus i_7 \end{cases}

We can apply the same process for other Hamming codes.

4.3Error detection and correction with Hamming codes

As for any linear block code, by analyzing the columns of [H][H] we can deduce how many errors can be detected or corrected.

The detection and correction process is the same as with all linear block codes. Having received a word r\mathbf{r}, we compute the syndrome:

z=[H]rT\mathbf{z} = [H] \cdot \mathbf{r}^T

If z\mathbf{z} is non-zero, then there are errors. If there is a single error, z\mathbf{z} is equal to the column of [H][H] corresponding to the error position, which is exactly the position of the error, written in binary. Therefore, keep in mind:

For error correction with Hamming codes, the position of the error is given by reading the binary value of the syndrome.

It is important to note that the code cannot do both detection and correction simultaneously: when we obtain a non-zero syndrome, we cannot tell if it is a single error or two errors, so there is no way to know if we should attempt to fix the error (if it’s a single error) or just detect (if they are two errors). This must be decided beforehand, at the design stage.

For example, the syndrome z=[100]\mathbf{z} = \begin{bmatrix}1\\0\\0\end{bmatrix} in the previous example could also have come from two errors located on positions 1 and 5, and in this case fixing the error would have been incorrect. The system designer must decide at the design stage how to handle such cases:

The coding rate of a Hamming code is given by:

R=kn=2rr12r1R = \frac{k}{n} = \frac{2^r - r - 1}{2^r-1}

Since Hamming codes detect 2 and fix 1 error in codeword, longer Hamming codes are progressively weaker:

The choice of which code to use is ultimately based on the expected bit error probabilities during the transmission.

4.4Encoding & decoding example for Hamming(7,4)

See whiteboard.

In this example, encoding is done without the generator matrix GG, directly with the matrix HH, by finding the values of the parity bits c1c_1, c2c_2, c4c_4 such that

[000]=[H][c1c2i3c4i5i6i7]\begin{bmatrix}0\\0\\0\end{bmatrix} = [H] \begin{bmatrix}c_1\\c_2\\i_3\\c_4\\i_5\\i_6\\i_7\end{bmatrix}

For a single error, the syndrome is the binary representation of the location of the error.

4.5Hardware circuits for Hamming(7,4)

Due to their simplicity, Hamming coding and decoding can easily be done with logic gates.

The Hamming encoder consists of:

The four information bits are loaded in the shift register, and the logic gates automatically compute the parity bits.

Hamming encoder circuit with logic gates

Hamming encoder circuit with logic gates

The Hamming decoder consists of:

Hamming decoder circuit with logic gates

Hamming decoder circuit with logic gates

4.6SECDED Hamming codes

A normal Hamming code can correct 1 error or it can detect 2 errors, but we cannot differentiate between the two cases at runtime.

A solution to this is to add an additional parity bit, which helps identify the correct situation.

For example, the (8,4) SECDED Hamming code is based in the normal (7,4) Hamming code, with the additional bit in front:

c0  c1  c2  i3  c4  i5  i6  i7\mathbf{c_0} \; c_1 \; c_2 \; i_3 \; c_4 \; i_5 \; i_6 \; i_7

where c0c_0 is defined as:

c0=c1c2i3c4i5i6i7c_0 = c_1 \oplus c_2 \oplus i_3 \oplus c_4 \oplus i_5 \oplus i_6 \oplus i_7

The parity check matrix of a SECDED Hamming code, H~\tilde{H}, is based on the matrix of the normal code, extended by one row and one column:

H~=[1...10H]\tilde{H} = \begin{bmatrix} 1 &...1 \\ 0 &\mathbf{H} \\ \end{bmatrix}

4.7Encoding and decoding of SECDED Hamming codes

The encoding of SECDED Hamming codes can be done in two ways:

  1. Compute the codeword directly using H~\tilde{H}, as we did for normal Hamming codes,

    or

  2. Alternatively, compute the normal Hamming code first, and then prepend the additional bit c0\mathbf{c_0} = sum of all other bits

Decoding of SECDED Hamming codes makes use of the additional parity bit to decide whether there is a single error, which can be fixed, or there are two errors, when no fix is attempted.

The decoding process is as follows:

  1. Compute the syndrome of the received word using H~\tilde{H}:

    z~=[z0z]=[H~]rT\tilde{\mathbf{z}} = \begin{bmatrix}z_0\\\mathbf{z}\end{bmatrix} = [\tilde{H}] \cdot \mathbf{r}^T

    The syndrome z~\mathbf{\tilde{z}} has one additional bit in front, z0z_0, which corresponds to the extra line added on top of [H~][\tilde{H}].

    The new bit z0z_0 tells us whether the received c0c_0 matches the parity of the received word

    • if z0=0z_0 = 0: the additional parity bit c0c_0 matches the parity of the received word, so there is an even number of errors (0 or 2)

    • z0=1z_0 = 1: the additional parity bit c0c_0 does not match the parity of the received word, so there is an odd number of errors (1)

  2. Decide which of the following cases happened:

    1. If no error happened: z1=z2=z3=0,z0=z_1 = z_2 = z_3 = 0, z_0 = \forall

    2. If 1 error happened: z0=1z_0 = 1 (does not match), the remaining part z\mathbf{z} is non-zero.

      In this case there is 1 error on position given by z\mathbf{z}, and we can fix it.

    3. If 2 errors happened: z0=0z_0 = 0 (does match, because the two errors cancel each other out), the remaining part z\mathbf{z} is non-zero.

      In this case we can just report that errors are detected, but we cannot fix them, since it is more than 1 error

    4. If 3 errors happened: same as case 1, but we can’t differentiate it from the case of a single error.

With the help of the additional parity bit, we can now simultaneously differentiate between 1 error (perform correction) and 2 errors (can detect, but do not perform correction).

Also, if the design decision is made that error correction is never attempted, the code can detect up to 3 errors, but this cannot be differentiated from the case of 1 error. This can be verified from the columns of [H~][\tilde{H}]: the sum of any three columns is non-zero (because of the top row of 1’s), so we know that the code can detect up to three errors.

5Cyclic codes

A “cyclic shift” means a circular (cyclic) rotation of a sequence of bits, in any direction, as illustrated below. Note that the ending bits are rotated to the start positions.

A cyclic shift of two positions to the right

A cyclic shift of two positions to the right

Cyclic codes are a particular class of linear block codes, so all the theory up to now still applies: they have a generator matrix, parity check matrix etc. However, they can be implemented more efficient than general linear block codes (e.g. Hamming), using their special properties.

Cyclic codes are widely used in practice under the name CRC (Cyclic Redundancy Check), for example in network communications (Ethernet protocol), data storage (Flash memory) etc.

As an example, Ethernet frames (i.e., data packets on normal Internet connections) use CRC codes to check data integrity.

Structure of an Etherner frame, including a CRC field for error detection (“Frame check sequence (32-bit CRC)”)

Structure of an Etherner frame, including a CRC field for error detection (“Frame check sequence (32-bit CRC)”)

5.1Binary polynomials

Working with cyclic codes relies heavily on binary polynomials.

Every binary sequence a\mathbf{a} corresponds to a polynomial a(x)\mathbf{a(x)} with binary coefficients:

a0  a1  ...  an1a(x)=a0a1x...an1xn1a_0 \; a_1 \; ... \; a_{n-1} \rightarrow \mathbf{a(x)} = a_0 \oplus a_1x \oplus ... \oplus a_{n-1}x^{n-1}

for example:

100101111x3x5x6x710010111 \rightarrow 1 \oplus x^3 \oplus x^5 \oplus x^6 \oplus x^7

From now on, when we refer to a “word” or “codeword” we also mean the corresponding polynomial.

We can perform all mathematical operations with these polynomials (addition, multiplication, division etc.), just like we do with normal polynomials, but considering the fact that we are in a binary world where 11=01 \oplus 1 = 0. Cyclic coding relies on these operations with polynomials, and there exist efficient circuits for performing such operations (multiplications and divisions).

5.2Generator polynomial

No proof given.

This is fundamental property that codewords have, which is checked when performing error detection.

The generator polynomial g(x)g(x) must satisfy the following properties:

Since nn is the size of the codeword, and kk is the size of the information word, it follows that nkn-k is the number of control bits added by the code:

The degree of g(x)g(x) is the number of control bits added by the code.

To give some examples of generator polynomials matching these properties, consider the following relation. We can write 1x71 \oplus x^7 as a product of three polynomials (do the calculations and check this):

1x7=(1x)(1x+x3)(1x2x3)1 \oplus x^7 = (1 \oplus x)(1 \oplus x + \oplus x^3)(1 \oplus x^2 \oplus x^3)

1X71 \oplus X^7 is the Xn1X^n \oplus 1 mentioned in the properties, so we have n=7n=7. Each of the three factors have first and last coefficient equal to 1. Therefore each factor can generate a different cyclic code, the degree of each factor being the number of control bits added by the code.

Several polynomials are standardized in various protocols:

Popular generator polynomials (image from *http://www.ross.net/crc/download/crc_v3.txt)

Popular generator polynomials (image from *http://www.ross.net/crc/download/crc_v3.txt)

Your turn: write the polynomials in mathematical form.

5.3Proving the cyclic property

Note that the proof relied on two properties mentioned before:

5.4Coding and decoding of cyclic codes

Cyclic codes can be used for detection or correction. In practice, they are used mostly for detection only (e.g. in Ethernet protocol), because there are other codes with better performance for correction.

Cyclic codes can be designed to be either systematic or non-systematic. but in practice the systematic variant is much preferred.

We will show the coding and decoding from 3 perspectives, all equivalent to each other:

The mathematical way

First, a reminder on polynomial multiplication and division.

Two polynomials a(x)a(x) and b(x)b(x) can be multiplied, and their result has degree equal to the degree of a(x)a(x) + degree of b(x)b(x).

A polynomials a(x)a(x) can be divided to another polynomial b(x)b(x), producing a quotient and a remainder (which may be zero):

a(x)=b(x)q(x)r(x)a(x) = b(x) q(x) \oplus r(x)

where:

Coding. We want to encode an information word with kk bits:

i0  i1  i2  ...  ik1i(x)=i0i1x...ik1xk1i_0 \; i_1 \; i_2 \; ... \; i_{k-1} \rightarrow i(x) = i_0 \oplus i_1x \oplus ... \oplus i_{k-1}x^{k-1}

The non-systematic codeword generation relies simply on polynomial multiplication of i(x)i(x) with g(x)g(x):

c(x)=i(x)g(x)c(x) = i(x) \cdot g(x)

Note that the degrees in the equation match:

However, this is non-systematic because we have no guarantee that the information bits are preserved in the codeword.

The systematic codeword generation is achieved using:

c(x)=xnki(x)b(x)c(x) = x^{n-k} \cdot i(x) \oplus b(x)

where b(x)b(x) is the remainder of dividing xnki(x)x^{n-k} i(x) to g(x)g(x):

xnki(x)=g(x)a(x)b(x)x^{n-k} i(x) = g(x) a(x) \oplus b(x)

In normal parlance, the remainder b(x)b(x) is also known as “the CRC value”.

Let us consider more closely the systematic codeword generation formula. First, is the resulting c(x)c(x) really a multiple of g(x)g(x)? It is, because we have:

c(x)=xnki(x)b(x)=a(x)g(x)b(x)b(x)=a(x)g(x)\begin{aligned} c(x) &= x^{n-k} \cdot i(x) \oplus b(x) \\ &= a(x) g(x) \oplus b(x) \oplus b(x) \\ &= a(x) g(x) \end{aligned}

Second, why is the code systematic? Let’s analyze the systematic codeword generation step by step. We start from the information word:

i=[i0  i1  ...  ik1k]i(x)=i0i1x...ik1xk1.\mathbf{i} = [\underbrace{i_0 \; i_1 \; ... \; i_{k-1}}_k] \rightarrow i(x) = i_0 \oplus i_1 x \oplus ... \oplus i_{k-1} x^{k-1}.

Multiplying it with xnkx^{n-k} to obtain xnki(x)x^{n-k} \cdot i(x) just shifts all bits to the right with (nk)(n-k) positions:

[0  0  ...  0nki0  i1  ...  ik1k]i(x)=i0xnki1xnk+1...ik1xn1[\underbrace{0 \; 0 \; ... \; 0}_{n-k} \underbrace{i_0 \; i_1 \; ... \; i_{k-1}}_k] \rightarrow i(x) = i_0 x^{n-k} \oplus i_1 x^{n-k+1} \oplus ... \oplus i_{k-1} x^{n-1}

Now, the remainder b(x)b(x) has degree strictly less than the degree of g(x)g(x), which is nkn-k, so it at most nkn-k bits.

Therefore adding b(x)b(x) to xnki(x)x^{n-k} \cdot i(x) to construct the codeword

c(x)=xnki(x)b(x)c(x) = x^{n-k} \cdot i(x) \oplus b(x)

means the two terms do not overlap, because xnki(x)x^{n-k} \cdot i(x) has powers starting from xnkx^{n-k} upwards, whereas b(x)b(x) has powers strictly less than xnkx^{n-k}.

Alternatively, xnki(x)x^{n-k} \cdot i(x) has nkn-k zeros in front, whereas b(x)b(x) has at most nkn-k bits, so they will not overlap when adding.

c=[b0b1...bnknki0i1...ik1]c(x)=b0b1x...bnk1xnk1i0xnki1xnk+1...ik1xn1\begin{aligned} &\mathbf{c} = [\underbrace{b_0b_1...b_{n-k}}_{n-k} \underbrace{i_0i_1...i_{k-1}}] \rightarrow \\ &\rightarrow c(x) = b_0 \oplus b_1 x \oplus ... \oplus b_{n-k-1} x^{n-k-1} \oplus i_0 x^{n-k} \oplus i_1 x^{n-k+1} \oplus ... \oplus i_{k-1} x^{n-1} \end{aligned}

Therefore the code is systematic: the information bits are in the codeword, unchanged, in the positions corresponding to the higher powers of xx (towards the MSB), whereas the CRC value (i.e., b(x)b(x)) occupies the first nkn-k positions (towards the LSB).

We see now that a systematic cyclic codeword can be interpreted as computing a CRC value and appending it to the data. The CRC value b(x)b(x) represents the control bits added by the code.

Even though in our analysis and equations we always write it at the start of the codeword, in practice there exist different conventions for writing the codeword:

Decoding. Assume we receive a word

r=r0  r1  r2  ...  rn1r(x)=r0r1x...rn1xn1\mathbf{r} = r_0 \; r_1 \; r_2 \; ... \; r_{n-1} \rightarrow \mathbf{r(x)} = r_0 \oplus r_1x \oplus ... \oplus r_{n-1}x^{n-1}

For error detection, we just want to see if r(x)r(x) is a codeword or not. For this, we check if the received r(x)\mathbf{r(x)} still is a multiple of g(x)g(x), by dividing r(x)\mathbf{r(x)} to g(x)g(x) and checking the remainder:

Note that computing the remainder is the same to computing the CRC of the received data (including the CRC bits in the received word).

For error correction, ww need to use a syndrome table, just like with any linear block code, using the same general procedure:

Example: at blackboard

The programming way

We show this method only for systematic codes, which is the mostly used variant.

A good reference for the procedure is found here: “A Painless Guide to CRC Error Detection Algorithms”*, Ross N. Williams

The key fact is that mathematical polynomial division algorithm is identical to XOR-ing the binary sequence succesively with g(x)g(x):

See example at blackboard / lab.

In other words, we don’t have to think of binary polynomial division as a complicated mathematical operation. It is a simple, recursive bitwise XOR operation, easy to implement algorithmically in code. Polynomial division is the foundation of both (systematic) coding and decoding, so we just replace the mathematical computations with the equivalent binary algorithm.

Polynomial division = XORing succesively with g(x)

Polynomial division = XORing succesively with g(x)g(x)

Coding. We have to perform the two steps:

  1. Compute the CRC value b(x)b(x), which is the remainder of xnki(x)x^{n-k} i(x) divided to g(x)g(x), using the binary algorithm above;

  2. Put the CRC in front of the information word (mirrored, from LSB to MSB).

Decoding. We receive r=r0  r1  r2  ...  rn1r(x)=r0r1x...rn1xn1\mathbf{r} = r_0 \; r_1 \; r_2 \; ... \; r_{n-1} \rightarrow \mathbf{r(x)} = r_0 \oplus r_1x \oplus ... \oplus r_{n-1}x^{n-1}

Steps:

  1. Mirror the sequence r\mathbf{r}, so it is aligned MSB to LSB (CRC must be at the end)

  2. For error detection:

    • compute the CRC of all sequence r\mathbf{r}

    • if the remainder is 0 => no errors

    • if the remainder is non-zero => errors detected

  3. Error correction:

    • use a syndrome table

    • build a table for all possible error words (same as with matrix codes)

    • for each error word, compute the CRC

    • when the resulting remainder is identical to the remainder obtained with r\mathbf{r}, we found the error word => correct errors

The hardware way

There exist efficient circuits for polynomial multiplication and division, which can be used for systematic or non-systematic codeword generation.

Two schematics for polynomial multiplication are presented below.

Circuits for polynomial multiplication

Circuits for polynomial multiplication

Operation of the multiplication circuits:

Two schematics for division of binary polynomials are presented below.

Circuits for polynomial division

Circuits for polynomial division

Operation of polynomial division circuits:

A non-systematic cyclic encoder circuit relies simply on polynomial multiplication, without any other modifications, where the input is i(x)i(x) and the resulting output is the codeword c(x)c(x).

Circuits for polynomial multiplication

Circuits for polynomial multiplication

A systematic cyclic encoder circuit relies instead on a division circuit, which computes the CRC value which is appended:

Systematic cyclic encoder circuit

Systematic cyclic encoder circuit

This schematic contains a division circuit (upper right part), and operates as follows:

Why is the output c(x)c(x) the desired codeword? Because:

  1. has the information bits in the first part (systematic)

  2. is a multiple of g(x)g(x)

Why is it a multiple of g(x)g(x)? Because:

Side note: we haven’t really explained why the output c(x)c(x) is a codeword, we just showed that it is so, but this is enough.