The Basics

In the following, we will walk through the process of creating an encoder and a decoder for a single buffer of data.

Kodo implements a number of different erasure correcting codes. In this example, we have chosen to use a particular version of a RLNC (Random Linear Network Code).

The Full RLNC is one of the most common RLNC variants, and provides several of the advantages that RLNCs have over traditional erasure correcting codes. However, for the time being we will just use it as a standard erasure correcting code, namely to encode and decode some data.

In the following, we will go through three of the key parameters to choose when configuring an erasure correcting code:

  • The number of symbols
  • The symbol_size
  • The finite field, or more specifically the field size used.

In general if a block of data is to be encoded, it’s split into a number of symbols of a specific size. If you multiply the number of symbols with the symbol size, you get the total amount of data in bytes that will be either encoded or decoded per generation.

Note

Sizes in Kodo are always measured in bytes. So if you see a variable or function name that includes the word “size”, bytes is the unit used.

Note

In network applications, a single symbol typically corresponds to a single packet (for example, an UDP datagram).

Let us briefly outline the impact of changing the three parameters.

Number of Symbols

Denotes the number of symbols in a block/generation. Increasing this number will have the following effects:

  • The computational complexity will increase, and can therefore slow down the encoding/decoding.
  • For some variants of RLNC, the per-packet overhead will increase due to added coding coefficients.
  • The per-symbol decoding delay will become larger. The reason for this is that when we increase the number of symbols that are encoded the decoder has to receive more symbols before decoding.
  • The protocol complexity can be decreased. If the number of symbols is increased so that all the data which is to be encoded can fit in a single generation, the protocol will only have to handle a single generation. If multiple generations are needed, the receivers will have to tell from which generations the server should send data, and hence increasing the complexity of the protocol.
  • The need for a high field size decreases (which is an advantage since, in short, a higher field size leads to higher complexity). The reason for this is that when the decoder is only missing a few symbols, the chance for it to receive a useful encoded symbol decreases. This reduction depends on the field size (higher is better). You pay this price at each generation, but if a generation contains many symbols this issue becomes smaller. Furthermore with many symbols, the generations will be bigger, and hence fewer generations are needed.

Symbol Size

Denotes the size of each symbol in bytes. The choice of symbol size typically depends on the application. For network applications we may choose the symbol size according to the network MTU (Maximum Transfer Unit) so that datagrams do not get fragmented as they traverse the network. In those cases symbols sizes are typically around 1300-1400 bytes. On the other hand for storage applications the symbol size is typically much larger, e.g., in the order of several megabytes.

Field Size

The field size determines the core mathematics. Most erasure correcting codes are based on finite fields.

  • Increasing the field size will increase the probability of successful decoding.
  • However it will typically also lead to increased computational complexity which results in slower applications.

We are now ready to look at our first example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Copyright Steinwurf ApS 2016.
// Distributed under the "STEINWURF RESEARCH LICENSE 1.0".
// See accompanying file LICENSE.rst or
// http://www.steinwurf.com/licensing

#include <cstdint>
#include <algorithm>
#include <iostream>
#include <vector>

#include <kodocpp/kodocpp.hpp>

int main()
{
    //! [0]
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t max_symbols = 16;
    uint32_t max_symbol_size = 1400;
    //! [1]
    // In the following we will make an encoder/decoder factory.
    // The factories are used to build actual encoders/decoders
    kodocpp::encoder_factory encoder_factory(
        kodocpp::codec::full_vector,
        kodocpp::field::binary8,
        max_symbols,
        max_symbol_size);

    kodocpp::encoder encoder = encoder_factory.build();

    kodocpp::decoder_factory decoder_factory(
        kodocpp::codec::full_vector,
        kodocpp::field::binary8,
        max_symbols,
        max_symbol_size);

    kodocpp::decoder decoder = decoder_factory.build();
    //! [2]
    std::vector<uint8_t> payload(encoder.payload_size());
    std::vector<uint8_t> data_in(encoder.block_size());

    // Just for fun - fill the data with random data
    std::generate(data_in.begin(), data_in.end(), rand);
    //! [3]
    // Assign the data buffer to the encoder so that we may start
    // to produce encoded symbols from it
    encoder.set_const_symbols(data_in.data(), encoder.block_size());
    //! [4]
    // Create a buffer which will contain the decoded data, and we assign
    // that buffer to the decoder
    std::vector<uint8_t> data_out(decoder.block_size());
    decoder.set_mutable_symbols(data_out.data(), decoder.block_size());
    //! [5]
    uint32_t encoded_count = 0;

    while (!decoder.is_complete())
    {
        // Encode a packet into the payload buffer
        uint32_t bytes_used = encoder.write_payload(payload.data());
        std::cout << "Bytes used = " << bytes_used << std::endl;

        ++encoded_count;

        // Pass that packet to the decoder
        decoder.read_payload(payload.data());
    }

    std::cout << "Encoded count = " << encoded_count << std::endl;
    //! [6]
    // Check if we properly decoded the data
    if (data_in == data_out)
    {
        std::cout << "Data decoded correctly" << std::endl;
    }
    //! [7]
    return 0;
}

Initially we define the two parameters number of symbols and the symbol_size.

1
2
3
4
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t max_symbols = 16;
    uint32_t max_symbol_size = 1400;

In the given example, the codec and the finite field types are specified by first two parameters of encoder and decoder factories. Since fast finite field computations are not only useful in erasure correcting codes, this part of the functionality has been split into a second library called Fifi. The Fifi library defines a number of different finite fields such as binary, binary4, binary8. To switch to a different field you can simple replace field::binary8 with one of the other field types, e.g. field::binary.

Once the key parameters have been selected, we are ready to create an encoder and a decoder to perform the actual coding.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    // In the following we will make an encoder/decoder factory.
    // The factories are used to build actual encoders/decoders
    kodocpp::encoder_factory encoder_factory(
        kodocpp::codec::full_vector,
        kodocpp::field::binary8,
        max_symbols,
        max_symbol_size);

    kodocpp::encoder encoder = encoder_factory.build();

    kodocpp::decoder_factory decoder_factory(
        kodocpp::codec::full_vector,
        kodocpp::field::binary8,
        max_symbols,
        max_symbol_size);

    kodocpp::decoder decoder = decoder_factory.build();

We use the encoder_factory and decoder_factory objects to configure and build encoders and decoders. Invoking the build() function will return a new encoder or decoder object. These objects actually contain C++ smart pointers under the hood, so you can pass them around to other fuctions and the underlying objects will be deleted when there are no more references to them.

Before the encoding and decoding of data can begin, we define two buffers.

1
2
3
4
5
    std::vector<uint8_t> payload(encoder.payload_size());
    std::vector<uint8_t> data_in(encoder.block_size());

    // Just for fun - fill the data with random data
    std::generate(data_in.begin(), data_in.end(), rand);

The first buffer is the payload buffer. Once we start coding, this buffer will contain a single encoded symbol which we can “transmit” to the decoder. Besides the encoded symbol data, the payload buffer will also contain a header that describes how the symbol was encoded. The format and size of this data depends on the chosen erasure correcting code. Fortunately we don’t have to worry about that as long as we provide a large enough buffer. The needed size of the buffer is returned by the payload_size function.

The second buffer, data_in, contains the data we wish to encode. As mentioned earlier the size of this buffer is the number of symbols multiplied by the symbol size. For convenience we can use the block_size function to get this value. In this case we are not encoding real data, so we just fill the data_in buffer with some randomly generated data.

Once the buffers have been created we can call the set_const_symbols function on the encoder to specify which buffer it should encode.

1
2
3
    // Assign the data buffer to the encoder so that we may start
    // to produce encoded symbols from it
    encoder.set_const_symbols(data_in.data(), encoder.block_size());

We also create a third buffer, data_out, where the decoder will store the decoded data. The size of this buffer is the same as the size of data_in. Again, we call the block_size function to get the proper size. We do not have to initialize this buffer, we just pass it as a parameter of the set_mutable_symbols function, and the decoder will take control of the buffer.

1
2
3
4
    // Create a buffer which will contain the decoded data, and we assign
    // that buffer to the decoder
    std::vector<uint8_t> data_out(decoder.block_size());
    decoder.set_mutable_symbols(data_out.data(), decoder.block_size());

Finally we have everything ready to start the coding. This is done in a loop until the decoding is completed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    uint32_t encoded_count = 0;

    while (!decoder.is_complete())
    {
        // Encode a packet into the payload buffer
        uint32_t bytes_used = encoder.write_payload(payload.data());
        std::cout << "Bytes used = " << bytes_used << std::endl;

        ++encoded_count;

        // Pass that packet to the decoder
        decoder.read_payload(payload.data());
    }

    std::cout << "Encoded count = " << encoded_count << std::endl;

We use a variable encoded_count to keep track of the number of symbols we have encoded. When we finish, this number should match the symbols as all data is passed safely to the decoder, we shall later see examples where this is not necessarily the case. The loop stops when the decoders is_complete function returns true. This happens when all symbols have been decoded. The encoder encodes into the payload buffer and returns the number of bytes used during the encoding, and hence the number of bytes we would have to transmit over the network. The payload is passed to the decoder which decodes the encoded data and thereby increases its rank.

When the decoding process is completed, the data will be available in the data_out buffer, so we check that its contents match the original data be in data_in.

1
2
3
4
5
    // Check if we properly decoded the data
    if (data_in == data_out)
    {
        std::cout << "Data decoded correctly" << std::endl;
    }