# Customize Partitioning Scheme¶

In many applications we need to deal with data which does not fit into a single encoder/decoder. In such cases we need a strategy for how to partition and assign chucks of the data to separate encoders and decoders.

Note

Ideally we would create a single encoder/decoder pair for any object. We could do this simply by increasing the number of symbols or the size of the symbols until the data would fit. However in practice this does not work well.

When we increase the number of symbols in an encoder/decoder we also increase the computational complexity and it often not practical to have blocks with more than a few thousand symbols. Likewise increasing the size of a symbol may not be feasible due to constraints in the underlying systems e.g. the network MTU (Maximum Transfer Unit) etc.

In Kodo we call the algorithm defining how to segment a large object into smaller ones the block partitioning scheme.

In the following we will show how to define new partitioning schemes and thereby customize the behavior of our storage/file encoders and decoders (Encoding and decoding large objects and Encoding and decoding files).

## Block partitioning API¶

In Kodo a valid block partitioning scheme must implement the following API:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 /// @brief Type requirements for BlockPartitioning compatible types. class block_partitioning { public: /// Default constructor block_partitioning(); /// Constructor /// @param field the chosen finite field /// @param max_symbols the maximum number of symbols in a block /// @param max_symbol_size the size in bytes of a symbol /// @param object_size the size in bytes of the whole object block_partitioning(fifi::api::field field, uint32_t max_symbols, uint32_t max_symbol_size, uint32_t object_size); /// @ingroup block_partitioning_type /// @param index the block index /// @return the number of symbols in a specific block uint32_t symbols(uint32_t index) const; /// @ingroup block_partitioning_type /// @param index the block index /// @return the size of a symbol in a specific block uint32_t symbol_size(uint32_t index) const; /// @ingroup block_partitioning_type /// @param index the block index /// @return the size of a specific block in bytes uint32_t block_size(uint32_t index) const; /// @ingroup block_partitioning_type /// @param index the block index /// @return the offset in bytes to the start of a specific block uint32_t byte_offset(uint32_t index) const; /// @ingroup block_partitioning_type /// @param index the block index /// @return the number of bytes used in a specific block uint32_t bytes_used(uint32_t index) const; /// @ingroup block_partitioning_type /// @return the total number of blocks in the object uint32_t blocks() const; /// @ingroup block_partitioning_type /// @return the size of the object being partitioned uint32_t object_size() const; /// @ingroup block_partitioning_type /// @return the total number of symbols in the entire object uint32_t total_symbols() const; /// @ingroup block_partitioning_type /// @return The total number of bytes needed to cover all blocks uint32_t total_block_size() const; }; 

## Defining a custom scheme¶

In this case we will implement a partitioning scheme which keeps the symbol size fixed and creates blocks with exactly the same number of symbols in each.

Note

We do not recommend using the partitioning scheme presented here in practice. Since for certain input we will only have one symbol per block. Furthermore it has not been tested to work will all input values.

Lets see the implementation of the example partitioning scheme.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 /// In this case we will implement a partitioning scheme which /// creates blocks with exactly the same number of symbols in /// each. /// /// In Kodo a block partitioning scheme must implement the /// following functions: class fixed_partitioning_scheme { public: /// Default constructor fixed_partitioning_scheme() : m_max_symbols(0), m_max_symbol_size(0), m_object_size(0), m_total_symbols(0), m_total_blocks(0) { } /// Constructor /// @param max_symbols the maximum number of symbols in a block /// @param max_symbol_size the size in bytes of a symbol /// @param object_size the size in bytes of the whole object fixed_partitioning_scheme(uint32_t max_symbols, uint32_t max_symbol_size, uint32_t object_size) : m_max_symbols(max_symbols), m_max_symbol_size(max_symbol_size), m_object_size(object_size) { assert(m_max_symbols > 0); assert(m_max_symbol_size > 0); assert(m_object_size > 0); // Lets calculate how many symbols we can make for an // object with this size // // Note the following: ceil(x/y) = ((x - 1) / y) + 1 // How many symbols can we make? m_total_symbols = ((m_object_size - 1) / m_max_symbol_size) + 1; // Since we want the same number of symbols in each block // we find the greatest common divisor between the total // amount of symbols and the max_symbols per block. m_symbols_per_block = gcd(m_total_symbols, m_max_symbols); assert(m_symbols_per_block > 0); m_total_blocks = m_total_symbols / m_symbols_per_block; assert(m_total_blocks > 0); } /// @copydoc block_partitioning::symbols(uint32_t) const uint32_t symbols(uint32_t index) const { assert(m_total_blocks > index); assert(m_symbols_per_block > 0); return m_symbols_per_block; } /// @copydoc block_partitioning::symbol_size(uint32_t) const uint32_t symbol_size(uint32_t index) const { (void) index; assert(m_max_symbol_size > 0); return m_max_symbol_size; } /// @copydoc block_partitioning::block_size(uint32_t) const uint32_t block_size(uint32_t index) const { assert(m_total_blocks > index); return symbols(index) * symbol_size(index); } /// @copydoc block_partitioning::byte_offset(uint32_t) const uint32_t byte_offset(uint32_t index) const { assert(m_total_blocks > index); // We have a constant block size and since block_id starts // from zero we can simply multiply the block size with // the block_id to get the offset return block_size(index) * index; } /// @copydoc block_partitioning::bytes_used(uint32_t) const uint32_t bytes_used(uint32_t index) const { assert(index < m_total_blocks); uint32_t offset = byte_offset(index); assert(offset < m_object_size); uint32_t remaining = m_object_size - offset; uint32_t the_block_size = block_size(index); return std::min(remaining, the_block_size); } /// @copydoc block_partitioning::blocks() const uint32_t blocks() const { assert(m_total_blocks > 0); return m_total_blocks; } /// @copydoc block_partitioning::object_size() const uint32_t object_size() const { assert(m_object_size > 0); return m_object_size; } /// @copydoc block_partitioning::total_symbols() const uint32_t total_symbols() const { assert(m_total_symbols > 0); return m_total_symbols; } /// @copydoc block_partitioning::total_block_size() const uint32_t total_block_size() const { assert(m_total_symbols > 0); assert(m_max_symbol_size > 0); return m_total_symbols * m_max_symbol_size; } private: /// Calculate the greatest common divisor of two numbers uint32_t gcd(uint32_t u, uint32_t v) { assert(u > 0); assert(v > 0); while (v != 0) { uint32_t r = u % v; u = v; v = r; } return u; } private: /// The maximum number of symbols per block uint32_t m_max_symbols; /// The maximum size of a symbol in bytes uint32_t m_max_symbol_size; /// The size of the object to transfer in bytes uint32_t m_object_size; /// The total number of symbols in the object uint32_t m_total_symbols; /// The number of symbols per block uint32_t m_symbols_per_block; /// The total number of blocks in the object uint32_t m_total_blocks; }; 

In the following we used the Encoding and decoding large objects example as a starting point but changed the types of the storage_encoder and storage_decoder as follows:

 1 2 3 4 5 6 7  using storage_encoder = kodo_core::object::storage_encoder< kodo_rlnc::full_vector_encoder, kodo_rlnc::fixed_partitioning_scheme>; using storage_decoder = kodo_core::object::storage_decoder< kodo_rlnc::full_vector_decoder, kodo_rlnc::fixed_partitioning_scheme>; 

It is possible to query the partitioning scheme about the way the object was split. The object encoder/decoder uses the kodo::object::partitioning layer (defined in SEC/kodo/object/partitioning.hop). We can query that layer to get information about the how partitioning was done.

 1 2 3 4 5 6  for (uint32_t i = 0; i < object_encoder->blocks(); ++i) { std::cout << "Block = " << i << " symbols = " << object_encoder->symbols(i) << " symbol size = " << object_encoder->symbol_size(i) << std::endl; } 

For the given input values:

 1 2 3 4 5 6 7  // Set the number of symbols (i.e. the generation size in RLNC // terminology) and the size of a symbol in bytes uint32_t max_symbols = 40; uint32_t max_symbol_size = 64; uint32_t object_size = 6400; fifi::api::field field = fifi::api::field::binary; 

We would see the following partitioning:

Block = 0 symbols = 20 symbol size = 64
Block = 1 symbols = 20 symbol size = 64
Block = 2 symbols = 20 symbol size = 64
Block = 3 symbols = 20 symbol size = 64
Block = 4 symbols = 20 symbol size = 64


As we expect all blocks have the same number of symbols.