Customize Partitioning Scheme

In many applications we need to deal with data which does not fit into a single encoder/decoder. In such cases we need a strategy for how to partition and assign chucks of the data to separate encoders and decoders.

Note

Ideally we would create a single encoder/decoder pair for any object. We could do this simply by increasing the number of symbols or the size of the symbols until the data would fit. However in practice this does not work well.

When we increase the number of symbols in an encoder/decoder we also increase the computational complexity and it often not practical to have blocks with more than a few thousand symbols. Likewise increasing the size of a symbol may not be feasible due to constraints in the underlying systems e.g. the network MTU (Maximum Transfer Unit) etc.

In Kodo we call the algorithm defining how to segment a large object into smaller ones the block partitioning scheme.

In the following we will show how to define new partitioning schemes and thereby customize the behavior of our storage/file encoders and decoders (Encoding and decoding large objects and Encoding and decoding files).

Block partitioning API

In Kodo a valid block partitioning scheme must implement the following API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/// @brief Type requirements for BlockPartitioning compatible types.
class block_partitioning
{
public:

    /// Default constructor
    block_partitioning();

    /// Constructor
    /// @param max_symbols the maximum number of symbols in a block
    /// @param max_symbol_size the size in bytes of a symbol
    /// @param object_size the size in bytes of the whole object
    block_partitioning(uint32_t max_symbols, uint32_t max_symbol_size,
                       uint32_t object_size);

    /// @ingroup block_partitioning_type
    /// @param index the block index
    /// @return the number of symbols in a specific block
    uint32_t symbols(uint32_t index) const;

    /// @ingroup block_partitioning_type
    /// @param index the block index
    /// @return the size of a symbol in a specific block
    uint32_t symbol_size(uint32_t index) const;

    /// @ingroup block_partitioning_type
    /// @param index the block index
    /// @return the size of a specific block in bytes
    uint32_t block_size(uint32_t index) const;

    /// @ingroup block_partitioning_type
    /// @param index the block index
    /// @return the offset in bytes to the start of a specific block
    uint32_t byte_offset(uint32_t index) const;

    /// @ingroup block_partitioning_type
    /// @param index the block index
    /// @return the number of bytes used in a specific block
    uint32_t bytes_used(uint32_t index) const;

    /// @ingroup block_partitioning_type
    /// @return the total number of blocks in the object
    uint32_t blocks() const;

    /// @ingroup block_partitioning_type
    /// @return the size of the object being partitioned
    uint32_t object_size() const;

    /// @ingroup block_partitioning_type
    /// @return the total number of symbols in the entire object
    uint32_t total_symbols() const;

    /// @ingroup block_partitioning_type
    /// @return The total number of bytes needed to cover all blocks
    uint32_t total_block_size() const;

};

Defining a custom scheme

In this case we will implement a partitioning scheme which keeps the symbol size fixed and creates blocks with exactly the same number of symbols in each.

Note

We do not recommend using the partitioning scheme presented here in practice. Since for certain input we will only have one symbol per block. Furthermore it has not been tested to work will all input values.

Lets see the implementation of the example partitioning scheme.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
/// In this case we will implement a partitioning scheme which
/// creates blocks with exactly the same number of symbols in
/// each.
///
/// In Kodo a block partitioning scheme must implement the
/// following functions:
class fixed_partitioning_scheme
{
public:

    /// Default constructor
    fixed_partitioning_scheme() :
        m_max_symbols(0),
        m_max_symbol_size(0),
        m_object_size(0),
        m_total_symbols(0),
        m_total_blocks(0)
    { }

    /// Constructor
    /// @param max_symbols the maximum number of symbols in a block
    /// @param max_symbol_size the size in bytes of a symbol
    /// @param object_size the size in bytes of the whole object
    fixed_partitioning_scheme(uint32_t max_symbols,
                              uint32_t max_symbol_size,
                              uint32_t object_size) :
        m_max_symbols(max_symbols),
        m_max_symbol_size(max_symbol_size),
        m_object_size(object_size)
    {
        assert(m_max_symbols > 0);
        assert(m_max_symbol_size > 0);
        assert(m_object_size > 0);

        // Lets calculate how many symbols we can make for an
        // object with this size
        //
        // Note the following: ceil(x/y) = ((x - 1) / y) + 1

        // How many symbols can we make?
        m_total_symbols = ((m_object_size - 1) / m_max_symbol_size) + 1;

        // Since we want the same number of symbols in each block
        // we find the greatest common divisor between the total
        // amount of symbols and the max_symbols per block.
        m_symbols_per_block = gcd(m_total_symbols, m_max_symbols);

        assert(m_symbols_per_block > 0);

        m_total_blocks = m_total_symbols / m_symbols_per_block;
        assert(m_total_blocks > 0);
    }

    /// @copydoc block_partitioning::symbols(uint32_t) const
    uint32_t symbols(uint32_t index) const
    {
        assert(m_total_blocks > index);
        assert(m_symbols_per_block > 0);

        return m_symbols_per_block;
    }

    /// @copydoc block_partitioning::symbol_size(uint32_t) const
    uint32_t symbol_size(uint32_t index) const
    {
        (void) index;
        assert(m_max_symbol_size > 0);
        return m_max_symbol_size;
    }

    /// @copydoc block_partitioning::block_size(uint32_t) const
    uint32_t block_size(uint32_t index) const
    {
        assert(m_total_blocks > index);
        return symbols(index) * symbol_size(index);
    }

    /// @copydoc block_partitioning::bytes_offset(uint32_t) const
    uint32_t byte_offset(uint32_t index) const
    {
        assert(m_total_blocks > index);

        // We have a constant block size and since block_id starts
        // from zero we can simply multiply the block size with
        // the block_id to get the offset
        return block_size(index) * index;
    }

    /// @copydoc block_partitioning::bytes_used(uint32_t) const
    uint32_t bytes_used(uint32_t index) const
    {
        assert(index < m_total_blocks);

        uint32_t offset = byte_offset(index);

        assert(offset < m_object_size);
        uint32_t remaining =  m_object_size - offset;
        uint32_t the_block_size = block_size(index);

        return std::min(remaining, the_block_size);
    }

    /// @copydoc block_partitioning::blocks() const
    uint32_t blocks() const
    {
        assert(m_total_blocks > 0);
        return m_total_blocks;
    }

    /// @copydoc block_partitioning::object_size() const
    uint32_t object_size() const
    {
        assert(m_object_size > 0);
        return m_object_size;
    }

    /// @copydoc block_partitioning::total_symbols() const
    uint32_t total_symbols() const
    {
        assert(m_total_symbols > 0);
        return m_total_symbols;
    }

    /// @copydoc block_partitioning::total_block_size() const
    uint32_t total_block_size() const
    {
        assert(m_total_symbols > 0);
        assert(m_max_symbol_size > 0);
        return m_total_symbols * m_max_symbol_size;
    }

private:

    /// Calculate the greatest common divisor of two numbers
    uint32_t gcd(uint32_t u, uint32_t v)
    {
        assert(u > 0);
        assert(v > 0);

        while (v != 0)
        {
            uint32_t r = u % v;
            u = v;
            v = r;
        }

        return u;
    }

private:

    /// The maximum number of symbols per block
    uint32_t m_max_symbols;

    /// The maximum size of a symbol in bytes
    uint32_t m_max_symbol_size;

    /// The size of the object to transfer in bytes
    uint32_t m_object_size;

    /// The total number of symbols in the object
    uint32_t m_total_symbols;

    /// The number of symbols per block
    uint32_t m_symbols_per_block;

    /// The total number of blocks in the object
    uint32_t m_total_blocks;
};

In the following we used the Encoding and decoding large objects example as a starting point but changed the types of the storage_encoder and storage_decoder as follows:

1
2
3
4
5
6
7
    using storage_encoder = kodo_core::object::storage_encoder<
                            kodo_rlnc::shallow_full_vector_encoder<fifi::binary>,
                            kodo_rlnc::fixed_partitioning_scheme>;

    using storage_decoder = kodo_core::object::storage_decoder<
                            kodo_rlnc::shallow_full_vector_decoder<fifi::binary>,
                            kodo_rlnc::fixed_partitioning_scheme>;

It is possible to query the partitioning scheme about the way the object was split. The object encoder/decoder uses the kodo::object::partitioning layer (defined in SEC/kodo/object/partitioning.hop). We can query that layer to get information about the how partitioning was done.

1
2
3
4
5
6
    for (uint32_t i = 0; i < object_encoder->blocks(); ++i)
    {
        std::cout << "Block = " << i << " symbols = "
                  << object_encoder->symbols(i) << " symbol size = "
                  << object_encoder->symbol_size(i) << std::endl;
    }

For the given input values:

1
2
3
4
5
6
    // Set the number of symbols (i.e. the generation size in RLNC
    // terminology) and the size of a symbol in bytes
    uint32_t max_symbols = 40;
    uint32_t max_symbol_size = 64;

    uint32_t object_size = 6400;

We would see the following partitioning:

Block = 0 symbols = 20 symbol size = 64
Block = 1 symbols = 20 symbol size = 64
Block = 2 symbols = 20 symbol size = 64
Block = 3 symbols = 20 symbol size = 64
Block = 4 symbols = 20 symbol size = 64

As we expect all blocks have the same number of symbols.