The Basics

In the following, we will walk through the process of creating an encoder and a decoder for a single buffer of data.

Kodo implements a number of different erasure correcting codes. In this example, we have chosen to use a particular version of a RLNC (Random Linear Network Code).

The Full RLNC is one of the most common RLNC variants, and provides several of the advantages that RLNCs have over traditional erasure correcting codes. However, for the time being we will just use it as a standard erasure correcting code, namely to encode and decode some data.

In the following, we will go through three of the key-parameters to choose when configuring an erasure correcting code:

  • The number of symbols
  • The symbol_size
  • The finite field, or more specifically the field size used.

In general if a block of data is to be encoded, it’s split into a number of symbols of a specific size. If you multiply the number of symbols with the symbol size, you get the total amount of data in bytes that will be either encoded or decoded per generation.

Note

Sizes in Kodo are always measured in bytes. So if you see a variable or function name that includes the word “size”, bytes is the unit used.

Note

In network applications, a single symbol typically corresponds to a single packet (for example, an UDP datagram).

Let us briefly outline the impact of changing the three parameters.

Number of Symbols

Denotes the number of symbols in a block/generation. Increasing this number will have the following effects:

  • The computational complexity will increase, and can therefore slow down the encoding/decoding.
  • For some variants of RLNC, the per-packet overhead will increase due to added coding coefficients.
  • The per-symbol decoding delay will become larger. The reason for this is that when we increase the number of symbols that are encoded the decoder has to receive more symbols before decoding.
  • The protocol complexity can be decreased. If the number of symbols is increased so that all the data which is to be encoded can fit in a single generation, the protocol will only have to handle a single generation. If multiple generations are needed, the receivers will have to tell from which generations the server should send data, and hence increasing the complexity of the protocol.
  • The need for a high field size decreases (which is an advantage since, in short, a higher field size leads to higher complexity). The reason for this is that when the decoder is only missing a few symbols, the chance for it to receive a useful encoded symbol decreases. This reduction depends on the field size (higher is better). You pay this price at each generation, but if a generation contains many symbols this issue becomes smaller. Furthermore with many symbols, the generations will be bigger, and hence fewer generations are needed.

Symbol Size

Denotes the size of each symbol in bytes. The choice of symbol size typically depends on the application. For network applications we may choose the symbol size according to the network MTU (Maximum Transfer Unit) so that datagrams do not get fragmented as they traverse the network. In those cases symbols sizes are typically around 1300-1400 bytes. On the other hand for storage applications the symbol size is typically much larger, e.g., in the order of several megabytes.

Field Size

The field size determines the core mathematics. Most erasure correcting codes are based on finite fields.

  • Increasing the field size will increase the probability of successful decoding.
  • However it will typically also lead to increased computational complexity which results in slower applications.

We’re now ready to look at the next example. Building on the previous and very limited example, we extend this in a step by step manner to finally end up with something that resembles the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Copyright Steinwurf ApS 2011.
// Distributed under the "STEINWURF RESEARCH LICENSE 1.0".
// See accompanying file LICENSE.rst or
// http://www.steinwurf.com/licensing

#include <algorithm>
#include <cassert>
#include <cstdint>
#include <vector>

#include <kodo_rlnc/full_vector_codes.hpp>

#include <storage/storage.hpp>


int main()
{
    //! [0]
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t max_symbols = 16;
    uint32_t max_symbol_size = 1400;
    fifi::api::field field = fifi::api::field::binary8;
    //! [1]
    using rlnc_encoder = kodo_rlnc::full_vector_encoder;
    using rlnc_decoder = kodo_rlnc::full_vector_decoder;
    //! [2]
    // In the following we will make an encoder/decoder factory.
    // The factories are used to build actual encoders/decoders
    rlnc_encoder::factory encoder_factory(field, max_symbols, max_symbol_size);
    auto encoder = encoder_factory.build();

    rlnc_decoder::factory decoder_factory(field, max_symbols, max_symbol_size);
    auto decoder = decoder_factory.build();
    //! [3]
    std::vector<uint8_t> payload(encoder->payload_size());
    std::vector<uint8_t> block_in(encoder->block_size());

    // Just for fun - fill the data with random data
    std::generate(block_in.begin(), block_in.end(), rand);
    //! [4]
    // Assign the data buffer to the encoder so that we may start
    // to produce encoded symbols from it
    encoder->set_const_symbols(storage::storage(block_in));

    // Define a data buffer where the symbols should be decoded
    std::vector<uint8_t> block_out(decoder->block_size());
    decoder->set_mutable_symbols(storage::storage(block_out));
    //! [5]
    uint32_t encoded_count = 0;

    while (!decoder->is_complete())
    {
        // Encode a packet into the payload buffer
        uint32_t bytes_used = encoder->write_payload(payload.data());
        std::cout << "Bytes used = " << bytes_used << std::endl;

        ++encoded_count;

        // Pass that packet to the decoder
        decoder->read_payload(payload.data());
    }

    std::cout << "Encoded count = " << encoded_count << std::endl;
    //! [6]

    return 0;
}

Initially we define the two parameters number of symbols and the symbol_size.

1
2
3
4
5
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t max_symbols = 16;
    uint32_t max_symbol_size = 1400;
    fifi::api::field field = fifi::api::field::binary8;

In the given example the following two lines selects the field size for both the encoder and decoder.

1
2
    using rlnc_encoder = kodo_rlnc::full_vector_encoder;
    using rlnc_decoder = kodo_rlnc::full_vector_decoder;

As shown above this is done by passing a type defining the finite field, as the first argument to the chosen encoder and decoder. Since fast finite field computations are not only useful in erasure correcting codes this part of the functionality has be split into a second library called Fifi. The Fifi library defines a number of different finite fields such as binary, binary4, binary8, and binary16. To switch between the different field you can simple replace fifi::binary8 with one of the other field types e.g. fifi::binary.

Once the key parameters have been selected we are ready to create an encoder and a decoder to perform the actual coding.

1
2
3
4
5
6
7
    // In the following we will make an encoder/decoder factory.
    // The factories are used to build actual encoders/decoders
    rlnc_encoder::factory encoder_factory(field, max_symbols, max_symbol_size);
    auto encoder = encoder_factory.build();

    rlnc_decoder::factory decoder_factory(field, max_symbols, max_symbol_size);
    auto decoder = decoder_factory.build();

The encoder and decoder types define a nested type called the factory. Using the factory we can configure and build encoders and decoders. We instantiate the factory using chosen number of symbols and symbol size. Invoking the build() function will return a smart-pointer to a new encoder or decoder. In C++ a smart-pointer is one which behaves just like a normal pointer, but which will delete the object when there are no more references to it. Typically the factory type used is a pooled factory which means that when an encoder or decoder is about to be deleted instead they will be returned to a memory pool for reuse. The next call to build will then return one of the reused encoders/decoders. This type of memory management increases performance by reducing the number of memory allocations.

Before the encoding and decoding of data can begin, two buffers are needed.

1
2
3
4
5
    std::vector<uint8_t> payload(encoder->payload_size());
    std::vector<uint8_t> block_in(encoder->block_size());

    // Just for fun - fill the data with random data
    std::generate(block_in.begin(), block_in.end(), rand);

The first buffer is the payload buffer. Once we start coding this buffer will contain a single encoded symbol which we can “transmit” to the decoder. Besides the encoded symbol data, the payload buffer will also contain internal meta-data describing how the symbol was encoded. The format and size of this data depends on the chosen erasure correcting code. Fortunately we don’t have to worry about that, as long as we provide a buffer large enough. The needed size of the buffer is returned by the payload_size call.

The second buffer, block_in, contains the data we wish to encode. As mentioned earlier the size of this buffer is the number of symbols multiplied by the symbol size. For convenience we can use the block_size function to get this value. In this case we are not encoding real data so we just fill the block_in buffer with some randomly generate data.

Once the buffers have been created we can call the set_const_symbols function on the encoder to specify which buffer it should encode.

1
2
3
4
5
6
7
    // Assign the data buffer to the encoder so that we may start
    // to produce encoded symbols from it
    encoder->set_const_symbols(storage::storage(block_in));

    // Define a data buffer where the symbols should be decoded
    std::vector<uint8_t> block_out(decoder->block_size());
    decoder->set_mutable_symbols(storage::storage(block_out));

Finally we have everything ready to start the coding. This is done in a loop until the decoding has successfully completed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    uint32_t encoded_count = 0;

    while (!decoder->is_complete())
    {
        // Encode a packet into the payload buffer
        uint32_t bytes_used = encoder->write_payload(payload.data());
        std::cout << "Bytes used = " << bytes_used << std::endl;

        ++encoded_count;

        // Pass that packet to the decoder
        decoder->read_payload(payload.data());
    }

    std::cout << "Encoded count = " << encoded_count << std::endl;

We use a variable encoded_count to keep track of the number of symbols we’ve encoded. When we finish this number should match the symbols, as all data is passed safely on to the decoder, we shall later see examples where this is not necessarily the case. The loop stops when the decoders is_complete function returns true. This happens when all symbols have been decoded. The encoder encodes into the payload buffer and returns then number of bytes used during the encoding, and hence the number of bytes we in theory have to transmit over the network. The payload is passed to the decoder which decodes the encoded data and thereby increases its rank.

When the decoding process is completed, the data can be extracted from the decoder.

1
2
    return 0;
}

To do so, a buffer is created and the decoded data is copied to it using the copy_symbols function.