Encoding and decoding large objects

In practice we often have to encode/decode data which does not fit into a single encoder or decoder. To support this use-case Kodo provides the kodo::object::storage_encoder and kodo::object::storage_decoder classes.

It is recommended that you first familiarize yourself with using a single encoder/decoder pair. You will notice that extending to several encoders and decoders requires only a few changes to the code. We will not explain all parameters in detail in this example only those relevant to using the kodo::object::storage_encoder and kodo::object::storage_decoder classes. If you find some information missing, please check the The Basics example as it is likely you find it there.

The complete example

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
// Copyright Steinwurf ApS 2011.
// Distributed under the "STEINWURF RESEARCH LICENSE 1.0".
// See accompanying file LICENSE.rst or
// http://www.steinwurf.com/licensing

#include <cstring>
#include <utility>
#include <iostream>
#include <vector>

//! [0]
#include <kodo_core/object/storage_decoder.hpp>
#include <kodo_core/object/storage_encoder.hpp>

#include <kodo_rlnc/full_vector_codes.hpp>
//! [1]

/// @example encode_decode_storage.cpp
///
/// Often we want to encode / decode data that exceed a single
/// encoding/decoding block. In this case we need to divide the data
/// into manageable blocks and then encode and decode each block
/// separately. This examples shows how to use the storage encoder and
/// decoder in Kodo.
///
/// A single block consists of symbol and each symbol has a size in bytes.
/// So the total size of a block is number of symbols multiplied by
/// the symbol size. We use the word "block" here, in litterature several
/// other words are used to refer to the same thing, other popular words
/// are generation or chunk. They are all different words for the same
/// thing.
///
/// One important thing to note here is that data encoded from one block
/// cannot be mixed with data encoded from a different block. So if we
/// have two encoders we need also two decoders and we must pass data
/// between the corresponding encoders and decoders.
///
/// +---------+      +---------+      +---------+      +---------+
/// |         |      |         |      |         |      |         |
/// |         |      | encoder |      | decoder |      |         |
/// |         | +--> |   one   | +--> |   one   | +--> |         |
/// |  input  |      |         |      |         |      |  output |
/// |  data   |      +---------+      +---------+      |  data   |
/// |         |      |         |      |         |      |         |
/// |         | +--> | encoder | +--> | decoder | +--> |         |
/// |         |      |   two   |      |   two   |      |         |
/// |         |      |         |      |         |      |         |
/// +---------+      +---------+      +---------+      +---------+
///
/// If the encoded data is to be trasmitted over a network or stored
/// we do have to make sure we can identify which block the data comes
/// from. This can be done by adding a "block id" to each encoded
/// packet e.g. an integer value which is written on the encoder side
/// and which can later be read by the receving process to understand
/// which decoder it needs to pass the data to.
///
/// As an example a header could look like:
///
///     +--------+----------------------------+
///     | block  |          payload           |
///     |   id   |           data             |
///     +--------+----------------------------+
///
/// Most likely other application specific headers would need to be
/// added, this is just to illustrate in it's simplets form how it
/// could look like. The example below does not write a "block id" to
/// each encoded payload, but one could be added with minimal effort.
///
/// Both the encoder and decoder uses a shallow storage which means
/// that they operate directly on the memory provided.

int main()
{
    //! [2]
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t symbols = 42;
    uint32_t symbol_size = 64;
    fifi::api::field field = fifi::api::field::binary;

    uint32_t object_size = 23456;
    //! [3]

    //! [4]
    using storage_encoder = kodo_core::object::storage_encoder<
                            kodo_rlnc::full_vector_encoder>;

    using storage_decoder = kodo_core::object::storage_decoder<
                            kodo_rlnc::full_vector_decoder>;
    //! [5]

    //! [6]
    storage_encoder::factory encoder_factory(
        field, symbols, symbol_size);
    storage_decoder::factory decoder_factory(
        field, symbols, symbol_size);

    auto encoder = encoder_factory.build();
    auto decoder = decoder_factory.build();

    std::vector<uint8_t> data_in(object_size, 'x');
    std::vector<uint8_t> data_out(object_size, '\0');

    encoder->set_const_storage(storage::storage(data_in));
    decoder->set_mutable_storage(storage::storage(data_out));

    assert(encoder->object_size() == object_size);
    assert(decoder->object_size() == object_size);

    std::cout << "object_size = " << object_size << std::endl;
    std::cout << "encoder blocks = " << encoder->blocks() << std::endl;
    std::cout << "decoder blocks = " << decoder->blocks() << std::endl;
    //! [7]

    //! [8]
    for (uint32_t i = 0; i < encoder->blocks(); ++i)
    {
        storage_encoder::stack_pointer e = encoder->build(i);
        storage_decoder::stack_pointer d = decoder->build(i);

        std::vector<uint8_t> payload(e->payload_size());

        while (!d->is_complete())
        {
            e->write_payload(payload.data());

            // Here we would send and receive the payload over a
            // network. Lets throw away some packet to simulate.
            if (rand() % 2)
            {
                continue;
            }

            d->read_payload(payload.data());
        }
    }

    // Check we properly decoded the data
    if (data_in == data_out)
    {
        std::cout << "Data decoded correctly" << std::endl;
    }
    else
    {
        std::cout << "Unexpected failure to decode "
                  << "please file a bug report :)" << std::endl;
    }
    //! [9]
}

Adding the includes

First we have to provide the appropriate includes which defines the codec that we want to use and the kodo::object::storage_encoder and kodo::object::storage_decoder classes.

1
2
3
4
#include <kodo_core/object/storage_decoder.hpp>
#include <kodo_core/object/storage_encoder.hpp>

#include <kodo_rlnc/full_vector_codes.hpp>

Specifying the coding parameters

As in most other examples we have to specify the number of symbols and the size of each symbol which we would like to use for the individual encoders and decoders. One thing to notice here is that these values are maximum values (i.e. we will never exceed these). However, depending on the block partitioning scheme used we might not use exactly those values.

Note

When encoding/decoding large objects we have to assign different parts of the data to different encoders/decoders, the strategy for how this is done is called the block partitioning scheme.

For more information about the block partitioning scheme see the Customize Partitioning Scheme example.

In addition we will also specify the size of the object we want to code.

1
2
3
4
5
6
7
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t symbols = 42;
    uint32_t symbol_size = 64;
    fifi::api::field field = fifi::api::field::binary;

    uint32_t object_size = 23456;

Specifying the encoder and decoder types

The kodo::object::storage_encoder and kodo::object::storage_decoder classes take one template argument which is the actual type of the erasure correcting code to use. In this case we are using the kodo::shallow_full_rlnc_encoder for encoding and kodo::shallow_full_rlnc_decoder for decoding. These are standard RLNC (Random Linear Network Coding) codes.

Note

We use the shallow variant of the RLNC codes. This simply means that Kodo will not copy the data into the encoder/decoder, but operate directly on the user provided buffer (this is currently the only supported mode).

1
2
3
4
5
    using storage_encoder = kodo_core::object::storage_encoder<
                            kodo_rlnc::full_vector_encoder>;

    using storage_decoder = kodo_core::object::storage_decoder<
                            kodo_rlnc::full_vector_decoder>;

Using the object encoder and decoder

As with The Basics example we can now create the input and output data buffers and use it to initialize the object encoder/decoder.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
    storage_encoder::factory encoder_factory(
        field, symbols, symbol_size);
    storage_decoder::factory decoder_factory(
        field, symbols, symbol_size);

    auto encoder = encoder_factory.build();
    auto decoder = decoder_factory.build();

    std::vector<uint8_t> data_in(object_size, 'x');
    std::vector<uint8_t> data_out(object_size, '\0');

    encoder->set_const_storage(storage::storage(data_in));
    decoder->set_mutable_storage(storage::storage(data_out));

    assert(encoder->object_size() == object_size);
    assert(decoder->object_size() == object_size);

    std::cout << "object_size = " << object_size << std::endl;
    std::cout << "encoder blocks = " << encoder->blocks() << std::endl;
    std::cout << "decoder blocks = " << decoder->blocks() << std::endl;

The encoding/decoding loop has changed a bit since we now have several encoders and decoders that need to finish before the entire object has been encoded and decoded. However, the general structure is very similar to using just a single encoder and decoder.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    for (uint32_t i = 0; i < encoder->blocks(); ++i)
    {
        storage_encoder::stack_pointer e = encoder->build(i);
        storage_decoder::stack_pointer d = decoder->build(i);

        std::vector<uint8_t> payload(e->payload_size());

        while (!d->is_complete())
        {
            e->write_payload(payload.data());

            // Here we would send and receive the payload over a
            // network. Lets throw away some packet to simulate.
            if (rand() % 2)
            {
                continue;
            }

            d->read_payload(payload.data());
        }
    }

    // Check we properly decoded the data
    if (data_in == data_out)
    {
        std::cout << "Data decoded correctly" << std::endl;
    }
    else
    {
        std::cout << "Unexpected failure to decode "
                  << "please file a bug report :)" << std::endl;
    }