Getting started

Building

To build the library run:

python waf configure
python waf build

Usage

See the API of the encoder and decoder in src/kodo_slide/encoder.hpp and src/kodo_slide/decoder.hpp.

An example is available in test/src/test_code.cpp.

Definitions

To understand the API of kodo-slide’s ECC implementation a bit of terminology is needed:

  • A source symbol: Unit of data used for the encoding. Typically a symbol corresponds to a data packet to be sent over the network.

  • A symbol: The output of the encoder. Typically a linear combination of a set of source symbols.

  • The symbol size: This is the size in bytes of a symbol. Currently the algorithm expects that all symbols have the same size - for most applications this is not a problem. In special applications with varying symbol sizes we currently need to perform padding before passing the symbol data to the encoding / decoding algorithms.

  • The finite field and coding coefficients: The finite field determines the size of the coding coefficients used to produces a coded symbol. As a rough rule of thumb:

    • A larger finite field yields a “stronger” code i.e. better decoding probability. However, using a large finite field is also more computational complex.

    We have one coding coefficient for each symbol used to produce a coded symbol.

  • The coding rate: Most erasure coding algorithms allow you to select a coding rate. The rate is typically specified by two numbers n and k where the coding rate is k/n (n divided by k). What they specify is essentially that after sending n symbols k of those will be original source symbols. Often r is defined as r = n - k, which denotes the number of repair symbols we are sending. See Parameter selection.

  • The stream: This is the set of symbols currently available at the encoder for encoding or at the decoder for decoding. Using the API it is possible to either push new symbols to the front of the stream or pop symbols from the back of the stream.

  • The window: This is the set of symbols included in a specific encoding. The window is a subset of the stream. The API allows full control over both the stream and window.

The following diagram gives an overview of how these three concepts relate:

Data Symbols:
 +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
 | 0 |  | 1 |  | 2 |  | 3 |  | 4 |  | 5 |  | 6 |  | 7 |      ...
 +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
        |            stream             |
        +-------------------------------+
                |    window      |
                +----------------+                           ...

In the above example the stream includes five symbols 1 through 5. The window which is a subset of the stream includes symbol 2 through 4. To recap:

  • Symbols 1 through to 5 are currently available in memory (in the stream).
  • Symbols 2 through to 4 are in the window i.e. they will be included in a coded symbol produced by the encoder. At the decoder the next coded symbol read is expected to be a linear combination of the symbols in the window.

Coefficients memory layout

When encoding or decoding data the API expects to receive the coding coefficients. You can generate these coding coefficients yourself - or use the kodo-slide in-built coefficient generator.

The algorithms expect the coding coefficients to have a specific memory layout which we will explain here.

The reason we choose this memory layout was to achieve two goals:

  1. The bit position of a coefficient corresponding to a specific symbol index in the window should be constant within memory.
  2. It should be possible to resize the coding window without remapping coefficients.

We will use LSB 0 bit numbering (see Bit numbering (bit endianness)).

Example: Binary field

In the binary field each coefficient is a single bit.

Initial example, if we have three symbols in the window with the first symbol being index zero (we call this the window_lower_bound).

In this case we would have three coefficients at bit index 0, 1 and 2 below represented by capital C:

  7   6   5   4   3   2   1   0
+-------------------------------+
| 0   0   0   0   0   C   C   C |
+-------------------------------+
                              ^
                              |
     least significant +------+
     bit

As the window moves to encode a different set of symbols, so will the coefficients offset within the byte e.g. lets see the coefficients for encoding symbol 2,3 and 4:

  7   6   5   4   3   2   1   0
+-------------------------------+
| 0   0   0   C   C   C   0   0 |
+-------------------------------+
                              ^
                              |
     least significant +------+
     bit

Notice how the coefficient for symbol 2 remains at the same position with in the byte.

Once the window_lower_bound reaches 6, coefficients will “spill” and we need to use an additional byte:

  7   6   5   4   3   2   1   0   7   6   5   4   3   2   1   0
+-------------------------------+-------------------------------+
| C   C   0   0   0   0   0   0 | 0   0   0   0   0   0   0   C |
+-------------------------------+-------------------------------+
                              ^                               ^
                              |                               |
     least significant +------+      least significant +------+
     bit in first byte               in second byte

Effectively bit index 0 in the second byte corresponds to the coefficient encoding the symbol with index 8.

Once the coding window leaves the first byte i.e. window_lower_bound >= 8 we can omit the fist byte since it only contains leading zeros.

It is important that all bits not in the window are zero’ed.

Example: Binary8 field

In the binary8 field each coefficient is eight bits (or one byte).

As in the binary example, lets imagine we have three symbols in the window. Since we are in binary8 we simply need set three bytes:

                  0                 8                 16
  +-----------------+-----------------+-----------------+
  | C C C C C C C C | C C C C C C C C | C C C C C C C C |
  +-----------------+-----------------+-----------------+
           ^                 ^                 ^
           |                 |                 |
first byte +     second byte +      third byte +

increasing addresses ------->

The coding coefficient corresponding to symbol zero is the first byte, the second byte corresponds to symbol with index one and so forth.

As the window moves we interpret the first byte as the one corresponding to the symbol at the window_lower_bound.

Bit numbering (bit endianness)

In the following we will use LSB 0 bit numbering. LSB 0 is where the bit numbering starts at zero for the least significant bit (LSB). As an example say we have a single byte (8 bits):

   least significant +--------+
   bit                        |
                              v
+-------------------------------+
| 0   1   0   1   1   1   0   0 |
+-------------------------------+
  ^
  |             most significant
  +-----------+ bit

Lets number the bits inside byte given earlier according to the LSB 0 bit numbering:

  7   6   5   4   3   2   1   0
+-------------------------------+
| 0   1   0   1   1   1   0   0 |
+-------------------------------+

In the following we will omit all but the LSB index:

                              0
+-------------------------------+
| 0   1   0   1   1   1   0   0 |
+-------------------------------+

For a stream of bytes we number assume little endian byte order least significant byte first:

                              0                               8
+-------------------------------+-------------------------------+
| 0   0   0   0   0   0   0   0 | 0   0   0   0   0   0   0   0 |
+-------------------------------+-------------------------------+
                              ^                               ^
                              |                               |
     least significant +------+      least significant +------+
     bit in first byte               in second byte

                  increasing addresses ------->

Symbol protection

Every time a source symbol is included in a coded symbol we can say that the coded symbol “protects” the source symbol. I.e. that coded symbol can be used to repair a potential packet loss of the source symbol.