|
| 1 | +# Hash Implementation {#hash_impl} |
| 2 | + |
| 3 | +Hashes usage is usually split to three stages: |
| 4 | + |
| 5 | +1. Initialization. Implicit stage with creation of accumulator to be used. |
| 6 | +2. Accumulation. Performed one or more times. Calling update several times is |
| 7 | +equivalent to calling it once with all of the arguments concatenated. |
| 8 | +3. Finalization. Accumulated hash data is required to be finalized, padded and |
| 9 | +prepared to be retrieved by user. |
| 10 | + |
| 11 | +## Architecture Overview {#hash_arch} |
| 12 | + |
| 13 | +Hashes library architecture consists of several parts listed below: |
| 14 | + |
| 15 | +1. Algorithms |
| 16 | +2. Stream Processors |
| 17 | +3. Constructions and Compressors |
| 18 | +4. Accumulators |
| 19 | +5. Value Processors |
| 20 | + |
| 21 | +@dot |
| 22 | +digraph hash_arch { |
| 23 | +color="#222222"; |
| 24 | +rankdir="TB" |
| 25 | +node [shape="box"] |
| 26 | + |
| 27 | + a [label="Algorithms" color="#F5F2F1" fontcolor="#F5F2F1" URL="@ref hash_algorithms"]; |
| 28 | + b [label="Stream Processors" color="#F5F2F1" fontcolor="#F5F2F1" URL="@ref hash_stream"]; |
| 29 | + c [label="Constructions and Compressors" color="#F5F2F1" fontcolor="#F5F2F1" URL="@ref hash_pol"]; |
| 30 | + d [label="Accumulators" color="#F5F2F1" fontcolor="#F5F2F1" URL="@ref hash_acc"]; |
| 31 | + e [label="Value Processors" color="#F5F2F1" fontcolor="#F5F2F1" URL="@ref hash_val"]; |
| 32 | + |
| 33 | + a -> b; |
| 34 | + b -> c; |
| 35 | + c -> d; |
| 36 | + d -> e; |
| 37 | +} |
| 38 | +@enddot |
| 39 | + |
| 40 | +## Algorithms {#hash_algorithms} |
| 41 | + |
| 42 | +Implementation of a library is considered to be highly |
| 43 | +compliant with STL. So the crucial point is to have |
| 44 | +ciphers to be usable in the same way as STL algorithms |
| 45 | +do. |
| 46 | + |
| 47 | +STL algorithms library mostly consists of generic iterator and since C++20 |
| 48 | +range-based algorithms over generic concept-compliant types. Great example is |
| 49 | +`std::transform` algorithm: |
| 50 | + |
| 51 | +```cpp |
| 52 | +template<typename InputIterator, typename OutputIterator, typename UnaryOperation> |
| 53 | +OutputIterator transform(InputIterator first, InputIterator last, OutputIterator out, UnaryOperation unary_op); |
| 54 | +``` |
| 55 | +
|
| 56 | +Input values of type `InputIterator` operate over any iterable range, no matter |
| 57 | +which particular type is supposed to be processed. |
| 58 | +While `OutputIterator` provides a type-independent output place for the |
| 59 | +algorithm to put results no matter which particular range this `OutputIterator` |
| 60 | +represents. |
| 61 | + |
| 62 | +Since C++20 this algorithm got it analogous inside Ranges library as follows: |
| 63 | + |
| 64 | +```cpp |
| 65 | +template<typename InputRange, typename OutputRange, typename UnaryOperation> |
| 66 | +OutputRange transform(InputRange rng, OutputRange out, UnaryOperation unary_op); |
| 67 | +``` |
| 68 | + |
| 69 | +This particular modification takes no difference if `InputRange` is a |
| 70 | +`Container` or something else. The algorithm is generic just as data |
| 71 | +representation types are. |
| 72 | + |
| 73 | +As much as such algorithms are implemented as generic ones, hash algorithms |
| 74 | +should follow that too: |
| 75 | + |
| 76 | +```cpp |
| 77 | +template<typename Hash, typename InputIterator, typename OutputIterator> |
| 78 | +OutputIterator hash(InputIterator first, InputIterator last, OutputIterator out); |
| 79 | +``` |
| 80 | +
|
| 81 | +`Hash` is a policy type which represents the particular hash will be used. |
| 82 | +`InputIterator` represents the input data coming to be hashed. |
| 83 | +`OutputIterator` is exactly the same as it was in `std::transform` algorithm - |
| 84 | +it handles all the output storage operations. |
| 85 | + |
| 86 | +The most obvious difference between `std::transform` is a representation of a |
| 87 | +policy defining the particular behaviour of an algorithm. `std::transform` |
| 88 | +proposes to pass it as a reference to `Functor`, which is also possible in case |
| 89 | +of `Hash` policy used in function already pre-scheduled: |
| 90 | + |
| 91 | +```cpp |
| 92 | +template<typename Hash, typename InputIterator, typename OutputIterator> |
| 93 | +OutputIterator hash(InputIterator first, InputIterator last, OutputIterator out); |
| 94 | +``` |
| 95 | + |
| 96 | +Algorithms are no more than an internal structures initializer wrapper. In this |
| 97 | +particular case algorithm would initialize stream processor fed with accumulator |
| 98 | +set with [`hash` accumulator](@ref accumulators::hash) inside initialized with `Hash`. |
| 99 | + |
| 100 | +## Stream Data Processing {#hash_stream} |
| 101 | + |
| 102 | +Hashes are usually defined for processing `Integral` value typed byte sequences |
| 103 | +of specific size packed in blocks (e.g. [sha2](@ref hash::sha2) is defined for |
| 104 | +blocks of words which are actually plain `n`-sized arrays of `uint32_t` ). |
| 105 | +Input data in the implementation proposed is supposed to be a various-length |
| 106 | +input stream, which length could be not even to block size. |
| 107 | + |
| 108 | +This requires an introduction of stream processor specified with particular |
| 109 | +parameter set unique for each [`Hash`](@ref hash_concept) type, which takes |
| 110 | +input data stream and gets it split to blocks filled with converted to |
| 111 | +appropriate size integers (words in the cryptography meaning, not machine words). |
| 112 | + |
| 113 | +Example. Lets assume input data stream consists of 16 bytes as follows. |
| 114 | + |
| 115 | +@dot |
| 116 | +digraph bytes { |
| 117 | +bgcolor="#222222"; |
| 118 | +node [shape=record color="#F5F2F1" fontcolor="#F5F2F1"]; |
| 119 | + |
| 120 | +struct1 [label="0x00 | 0x01 | 0x02 | 0x03 | 0x04 | 0x05 | 0x06 | 0x07 | 0x08 | 0x09 | 0x10 | 0x11 | 0x12 | 0x13 |
| 121 | + | 0x14 | 0x15"]; |
| 122 | + |
| 123 | +} |
| 124 | +@enddot |
| 125 | + |
| 126 | +Lets assume the selected cipher to be used is Rijndael with 32 bit word size, 128 bit block size and 128 |
| 127 | + bit key size. This means input data stream needs to be converted to 32 bit words and merged to 128 bit |
| 128 | + blocks as follows: |
| 129 | + |
| 130 | +@dot |
| 131 | +digraph bytes_to_words { |
| 132 | +bgcolor="#222222"; |
| 133 | +node [shape=record color="#F5F2F1" fontcolor="#F5F2F1"]; |
| 134 | + |
| 135 | +struct1 [label="<b0> 0x00 |<b1> 0x01 |<b2> 0x02 |<b3> 0x03 |<b4> 0x04 |<b5> 0x05 |<b6> 0x06 |<b7> 0x07 |<b8> 0x08 |<b9> 0x09 |<b10> 0x10 |<b11> 0x11 |<b12> 0x12 |<b13> 0x13 |<b14> 0x14 |<b15> 0x15"]; |
| 136 | + |
| 137 | +struct2 [label="<w0> 0x00 0x01 0x02 0x03 |<w1> 0x04 0x05 0x06 0x07 |<w2> 0x08 0x09 0x10 0x11 |<w3> 0x12 0x13 0x14 0x15"]; |
| 138 | + |
| 139 | +struct3 [label="<bl0> 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x10 0x11 0x12 0x13 0x14 |
| 140 | + 0x15"]; |
| 141 | + |
| 142 | +struct1:b0 -> struct2:w0 |
| 143 | +struct1:b1 -> struct2:w0 |
| 144 | +struct1:b2 -> struct2:w0 |
| 145 | +struct1:b3 -> struct2:w0 |
| 146 | + |
| 147 | +struct1:b4 -> struct2:w1 |
| 148 | +struct1:b5 -> struct2:w1 |
| 149 | +struct1:b6 -> struct2:w1 |
| 150 | +struct1:b7 -> struct2:w1 |
| 151 | + |
| 152 | +struct1:b8 -> struct2:w2 |
| 153 | +struct1:b9 -> struct2:w2 |
| 154 | +struct1:b10 -> struct2:w2 |
| 155 | +struct1:b11 -> struct2:w2 |
| 156 | + |
| 157 | +struct1:b12 -> struct2:w3 |
| 158 | +struct1:b13 -> struct2:w3 |
| 159 | +struct1:b14 -> struct2:w3 |
| 160 | +struct1:b15 -> struct2:w3 |
| 161 | + |
| 162 | +struct2:w0 -> struct3:bl0 |
| 163 | +struct2:w1 -> struct3:bl0 |
| 164 | +struct2:w2 -> struct3:bl0 |
| 165 | +struct2:w3 -> struct3:bl0 |
| 166 | +} |
| 167 | + |
| 168 | +@enddot |
| 169 | + |
| 170 | +Now with this a [`Hash`](@ref hash_concept) instance of [sha2](@ref hash::sha2) |
| 171 | +can be fed. |
| 172 | + |
| 173 | +This mechanism is handled with `stream_processor` template class specified for |
| 174 | +each particular cipher with parameters required. Hashes suppose only one type |
| 175 | +of stream processor exist - the one which split the data to blocks, converts |
| 176 | +them and passes to `AccumulatorSet` reference as cipher input of format required. |
| 177 | +The rest of data not even to block size gets converted too and fed value by |
| 178 | +value to the same `AccumulatorSet` reference. |
| 179 | + |
| 180 | +## Data Type Conversion {#hash_data} |
| 181 | + |
| 182 | +Since block cipher algorithms are usually defined for `Integral` types or |
| 183 | +byte sequences of unique format for each cipher, encryption function being |
| 184 | +generic requirement should be handled with particular cipher-specific input data |
| 185 | +format converter. |
| 186 | + |
| 187 | +For example `Rijndael` cipher is defined over blocks of 32 bit words, which |
| 188 | +could be represented with `uint32_t`. This means all the input data should be |
| 189 | +in some way converted to 4 byte sized `Integral` type. In case of |
| 190 | +`InputIterator` is defined over some range of `Integral` value type, this is is |
| 191 | +handled with plain byte repack as shown in previous section. This is a case with |
| 192 | +both input stream and required data format are satisfy the same concept. |
| 193 | + |
| 194 | +The more case with input data being presented by sequence of various type `T` |
| 195 | +requires for the `T` to has conversion operator `operator Integral()` to the |
| 196 | +type required by particular `BlockCipher` policy. |
| 197 | + |
| 198 | +Example. Let us assume the following class is presented: |
| 199 | +```cpp |
| 200 | +class A { |
| 201 | +public: |
| 202 | + std::size_t vals; |
| 203 | + std::uint16_t val16; |
| 204 | + std::char valc; |
| 205 | +}; |
| 206 | +``` |
| 207 | +
|
| 208 | +Now let us assume there exists an initialized and filled with random values |
| 209 | +```SequenceContainer``` of value type ```A```: |
| 210 | +
|
| 211 | +```cpp |
| 212 | +std::vector<A> a; |
| 213 | +``` |
| 214 | + |
| 215 | +To feed the ```BlockCipher``` with the data presented, it is required to convert ```A``` to ```Integral``` type which |
| 216 | + is only available if ```A``` has conversion operator in some way as follows: |
| 217 | + |
| 218 | +```cpp |
| 219 | +class A { |
| 220 | +public: |
| 221 | + operator uint128_t() { |
| 222 | + return (vals << (3U * CHAR_BIT)) & (val16 << 16) & valc |
| 223 | + } |
| 224 | + |
| 225 | + std::size_t vals; |
| 226 | + std::uint16_t val16; |
| 227 | + std::char valc; |
| 228 | +}; |
| 229 | +``` |
| 230 | +
|
| 231 | +This part is handled internally with ```stream_processor``` configured for each particular cipher. |
| 232 | + |
| 233 | +## Hash Algorithms {#hash_pol} |
| 234 | +
|
| 235 | +## Accumulators {#hash_acc} |
| 236 | +
|
| 237 | +## Value Postprocessors {#hash_val} |
0 commit comments