Encoding buffering strategy .assembled added

dehesa · dehesa · commit c9cbe4136273 · 2020-03-31T12:11:23.000+02:00
diff --git a/README.md b/README.md
@@ -142,7 +142,7 @@ A `CSVReadder` parses CSV data from a given input (`String`, or `Data`, or file)
 
     CSV fields are separated within a row with _field delimiters_ (commonly a "comma"). CSV rows are separated through _row delimiters_ (commonly a "line feed"). You can specify any unicode scalar, `String` value, or `nil` for unknown delimiters.
 
--   `escapingStrategy` (default `.doubleQuote`) specify the Unicode scalar used to escape fields.
+-   `escapingStrategy` (default `"`) specify the Unicode scalar used to escape fields.
 
     CSV fields can be escaped in case they contain priviledge characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e. `"`), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.
 
@@ -315,11 +315,11 @@ let result = try decoder.decode(CustomType.self, from: data)
 `CSVDecoder` can decode CSVs represented as a `Data` blob, a `String`, or an actual file in the file system.
 
 ```swift
-let decoder = CSVDecoder { $0.bufferingStrategy = .fulfilled }
+let decoder = CSVDecoder { $0.bufferingStrategy = .assembled }
 let content: [Student] = try decoder([Student].self, from: URL("~/Desktop/Student.csv"))
 ```
 
-If you are dealing with a big CSV file, it is preferred to used direct file decoding, a `.sequential` or `.fulfilled` buffering strategy, and set *presampling* to false; since then memory usage is drastically reduced.
+If you are dealing with a big CSV file, it is preferred to used direct file decoding, a `.sequential` or `.assembled` buffering strategy, and set *presampling* to false; since then memory usage is drastically reduced.
 
 ### Decoder configuration
 
@@ -367,15 +367,15 @@ let data: Data = try encoder.encode(value)
 The `Encoder`'s `encode()` function creates a CSV file as a `Data` blob, a `String`, or an actual file in the file system.
 
 ```swift
-let encoder = CSVEncoder { $0.bufferingStrategy = .sequential }
+let encoder = CSVEncoder { $0.headers = ["name", "age", "hasPet"] }
 try encoder.encode(value, into: URL("~/Desktop/Students.csv"))
 ```
 
-If you are dealing with a big CSV content, it is preferred to use direct file encoding and a `.sequential` or `.fulfilled` buffering strategy, since then memory usage is drastically reduced.
+If you are dealing with a big CSV content, it is preferred to use direct file encoding and a `.sequential` or `.assembled` buffering strategy, since then memory usage is drastically reduced.
 
 ### Encoder configuration
 
-The encoding process can be tweaked by specifying configuration values. `CSVEncoder` accepts the [same configuration values as `CSVWRiter`](#Writer-configuration) plus the following ones:
+The encoding process can be tweaked by specifying configuration values. `CSVEncoder` accepts the [same configuration values as `CSVWriter`](#Writer-configuration) plus the following ones:
 
 -   `floatStrategy` (default `.throw`) defines how to deal with non-conforming floating-point numbers (e.g. `NaN`).
 
@@ -419,7 +419,7 @@ encoder.dataStrategy = .custom { (data, encoder) in
 <ul>
 <details><summary>Basic adoption.</summary><p>
 
-When a custom type conforms to `Codable`, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for [JSON and Property Lists](https://developer.apple.com/documentation/foundation/archives_and_serialization), but the community provide many other formats, such as: [YAML](https://github.com/jpsim/Yams), [XML](https://github.com/MaxDesiatov/XMLCoder), [BSON](https://github.com/OpenKitten/BSON), and CSV (through this library).
+When a custom type conforms to `Codable`, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for [JSON and Property Lists](https://developer.apple.com/documentation/foundation/archives_and_serialization) and the community provide many other formats, such as: [YAML](https://github.com/jpsim/Yams), [XML](https://github.com/MaxDesiatov/XMLCoder), [BSON](https://github.com/OpenKitten/BSON), and CSV (through this library).
 
 Lets see a regular CSV encoding/decoding usage through `Codable`'s interface. Let's suppose we have a list of students formatted in a CSV file:
 
diff --git a/sources/Codable/Decodable/DecoderConfiguration.swift b/sources/Codable/Decodable/DecoderConfiguration.swift
@@ -91,7 +91,7 @@ extension Strategy {
         /// Forward/Backwards decoding jumps are allowed. However, previously requested rows cannot be requested again or an error will be thrown.
         ///
         /// This strategy will massively reduce the memory usage, but it will throw an error if a CSV row that was previously decoded is requested from a keyed container.
-        case fulfilled
+        case assembled
         /// No rows are kept in memory (except for the CSV row being decoded at the moment)
         /// Forward jumps are allowed, but the rows in-between the jump cannot be decoded.
         case sequential
diff --git a/sources/Codable/Encodable/EncoderConfiguration.swift b/sources/Codable/Encodable/EncoderConfiguration.swift
@@ -100,20 +100,21 @@ extension Strategy {
         /// All encoded rows/fields are cached and the *writing* only occurs at the end of the encodable process.
         ///
         /// *Keyed containers* can be used to encode rows/fields unordered. That means, a row at position 5 may be encoded before the row at position 3. Similar behavior is supported for fields within a row.
-        /// - attention: This strategy consumes the largest amount of memory from all the supported options.
+        /// - remark: This strategy consumes the largest amount of memory from all the supported options.
         case keepAll
         /// Encoded rows may be cached, but the encoder will keep the buffer as small as possible by writing completed ordered rows.
         ///
         /// *Keyed containers* can be used to encode rows/fields unordered. The writer will however consume rows in order.
         ///
-        /// For example, an encoder starts encoding row 1 and it gets all its fields. The row will get written and no cache for the row is kept. Same situation occurs when the row 2 is encoded.
+        /// For example, an encoder starts encoding row 1 and gets all its fields. The row will get written and no cache for the row is kept anymore. Same situation occurs when the row 2 is encoded.
         /// However, the user may decide to jump to row 5 and encode it. This row will be kept in the cache till row 3 and 4 are encoded, at which time row 3, 4, 5, and any subsequent rows will be writen.
-        /// - attention: This strategy tries to keep the cache to a minimum, but memory usage may be big if there are holes while encoding rows. Those holes are filled with empty rows at the end of the encoding process.
-        case fulfilled
+        /// - attention: If no headers are passed during configuration the encoder has no way to know when a row is completed. That is why, the `.keepAll` buffering strategy will be used instead for such a case.
+        /// - remark: This strategy tries to keep the cache to a minimum, but memory usage may be big if there are holes while encoding rows/fields. Those holes are filled with empty rows/fields at the end of the encoding process.
+        case assembled
         /// Only the last row (the one being written) is kept in memory. Writes are performed sequentially.
         ///
         /// *Keyed containers* can be used, but at file-level any forward jump will imply writing empty-rows. At field-level *keyed containers* may still be used for random-order writing.
-        /// - attention: This strategy provides the smallest usage of memory from all.
+        /// - remark: This strategy provides the smallest usage of memory from all.
         case sequential
     }
 }
diff --git a/sources/Codable/Encodable/Internal/Sink.swift b/sources/Codable/Encodable/Internal/Sink.swift
@@ -17,22 +17,55 @@ extension ShadowEncoder {
         /// Creates the unique data sink for the encoding process.
         init(writer: CSVWriter, configuration: CSVEncoder.Configuration, userInfo: [CodingUserInfoKey:Any]) throws {
             self.writer = writer
-            self.buffer = Buffer(strategy: configuration.bufferingStrategy)
+            
+            let strategy: Strategy.EncodingBuffer
+            switch configuration.bufferingStrategy {
+            case .assembled where configuration.headers.isEmpty: strategy = .keepAll
+            case let others: strategy = others
+            }
+            
+            self.buffer = Buffer(strategy: strategy, expectedFields: self.writer.expectedFields)
             self.configuration = configuration
             self.userInfo = userInfo
             self.headerLookup = .init()
             
-            switch configuration.bufferingStrategy {
+            switch strategy {
             case .keepAll:
                 self.fieldValue = { [unowned buffer = self.buffer] in
                     // A.1. Just store the field in the buffer and forget till completion.
                     buffer.store(value: $0, at: $1, $2)
                 }
                 
-            case .fulfilled:
-                self.fieldValue = { [unowned buffer = self.buffer, unowned writer = self.writer] (v, r, f) in
-                    // B.1.
-                    fatalError()
+            case .assembled:
+                self.fieldValue = { [unowned buffer = self.buffer, unowned writer = self.writer] in
+                    // B.1. Is the requested row the same as the writer's row focus?
+                    guard writer.rowIndex == $1 else {
+                        // B.1.1. If not, the row must not have been written yet (otherwise an error is thrown).
+                        guard $1 > writer.rowIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: $1, fieldIndex: $2, value: $0) }
+                        // B.1.2. If the row hasn't been writen yet, store it in the buffer.
+                        return buffer.store(value: $0, at: $1, $2)
+                    }
+                    // B.2. Is the requested field the same as the writer's field focus?
+                    guard writer.fieldIndex == $2 else {
+                        // B.2.1 If not, the field must not have been written yet (otherwise an error is thrown).
+                        guard $2 > writer.fieldIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: $1, fieldIndex: $2, value: $0) }
+                        // B.2.2 If the field hasn't been writen yet, store it in the buffer.
+                        return buffer.store(value: $0, at: $1, $2)
+                    }
+                    // B.3. Write the provided field since it is the same as the writer's row/field.
+                    try writer.write(field: $0)
+                    
+                    assert(writer.expectedFields > 0)
+                    // B.4. Are there subsequent fields in the buffer?
+                    while true {
+                        // B.5. If is not the end of the row, check the buffer and see whether the following fields are already cached.
+                        while writer.fieldIndex < writer.expectedFields {
+                            guard let field = buffer.retrieveField(at: writer.rowIndex, writer.fieldIndex) else { return }
+                            try writer.write(field: field)
+                        }
+                        // B.6. If it is the end of the row, write the row delimiter and continue with the next row.
+                        try writer.endRow()
+                    }
                 }
                 
             case .sequential:
@@ -76,39 +109,6 @@ extension ShadowEncoder {
     }
 }
 
-//    #warning("How to deal with intended field gaps?")
-//    // When the next row is writen, check the previous row.
-//    // What happens when there are several empty rows?
-//
-//    // 1. Is the requested row the same as the writer's row focus?
-//    guard self.writer.rowIndex == rowIndex else {
-//        // 1.1. If not, the row must not have been written yet (otherwise an error is thrown).
-//        guard self.writer.rowIndex > rowIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: rowIndex, fieldIndex: fieldIndex, value: value) }
-//        // 1.2. If the row hasn't been writen yet, store it in the buffer.
-//        return self.buffer.store(value: value, at: rowIndex, fieldIndex)
-//    }
-//    // 2. Is the requested field the same as the writer's field focus?
-//    guard self.writer.fieldIndex == fieldIndex else {
-//        // 2.1 If not, the field must not have been written yet (otherwise an error is thrown).
-//        guard self.writer.fieldIndex > fieldIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: rowIndex, fieldIndex: fieldIndex, value: value) }
-//        // 2.2 If the field hasn't been writen yet, store it in the buffer.
-//        return self.buffer.store(value: value, at: rowIndex, fieldIndex)
-//    }
-//    // 3. Write the provided field since it is the same as the writer's row/field.
-//    try self.writer.write(field: value)
-//    // 4. How many fields per row there are? If unknown, stop.
-//    guard self.writer.expectedFields > 0 else { return }
-//    #warning("How to deal with the first ever row when no headers are given?")
-//    while true {
-//        // 5. If is not the end of the row, check the buffer and see whether the following fields are already cached.
-//        while self.writer.fieldIndex < self.writer.expectedFields {
-//            guard let field = self.buffer.retrieveField(at: self.writer.rowIndex, self.writer.fieldIndex) else { return }
-//            try self.writer.write(field: field)
-//        }
-//        // 6. If it is the end of the row, write the row delimiter and pass to the next row.
-//        try self.writer.endRow()
-//    }
-
 extension ShadowEncoder.Sink {
     /// The number of fields expected per row.
     ///
diff --git a/sources/Codable/Encodable/Internal/SinkBuffer.swift b/sources/Codable/Encodable/Internal/SinkBuffer.swift
@@ -3,23 +3,24 @@ extension ShadowEncoder.Sink {
     internal final class Buffer {
         /// The buffering strategy.
         let strategy: Strategy.EncodingBuffer
+        /// The number of expectedFields
+        private let expectedFields: Int
         /// The underlying storage.
         private var storage: [Int: [Int:String]]
         
         /// Designated initializer.
-        init(strategy: Strategy.EncodingBuffer) {
+        init(strategy: Strategy.EncodingBuffer, expectedFields: Int) {
             self.strategy = strategy
+            self.expectedFields = (expectedFields > 0) ? expectedFields : 8
             
             let capacity: Int
             switch strategy {
             case .keepAll:    capacity = 256
-            case .fulfilled:  capacity = 16
+            case .assembled:  capacity = 16
             case .sequential: capacity = 2
             }
             self.storage = .init(minimumCapacity: capacity)
         }
-        
-        //#warning("Optimize field storage passing writer's expected fields")
     }
 }
 
@@ -42,9 +43,9 @@ extension ShadowEncoder.Sink.Buffer {
     /// - parameter rowIndex: The position for the row being targeted.
     /// - parameter fieldIndex: The position for the field being targeted.
     func store(value: String, at rowIndex: Int, _ fieldIndex: Int) {
-        var row = self.storage[rowIndex] ?? .init()
-        row[fieldIndex] = value
-        self.storage[rowIndex] = row
+        var fields = self.storage[rowIndex] ?? Dictionary(minimumCapacity: self.expectedFields)
+        fields[fieldIndex] = value
+        self.storage[rowIndex] = fields
     }
     
     /// Retrieves and removes from the buffer the indicated value.
diff --git a/tests/CodableTests/EncodingRegularUsageTests.swift b/tests/CodableTests/EncodingRegularUsageTests.swift
@@ -81,7 +81,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         // The data used for testing.
         let value: [[String]] = []
 
@@ -104,7 +104,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         // The data used for testing.
         let value: [[String]] = [[.init()]]
 
@@ -128,7 +128,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         // The data used for testing.
         let school = TestData.School<TestData.KeyedStudent>(students: [])
         
@@ -151,7 +151,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         let headers = ["name", "age", "country", "hasPet"]
         // The data used for testing.
         let student = TestData.KeyedStudent(name: "Marcos", age: 111, country: "Spain", hasPet: true)
@@ -186,7 +186,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         let headers: [String] = []
         // The data used for testing.
         let student = TestData.UnkeyedStudent(name: "Marcos", age: 111, country: "Spain", hasPet: true)
@@ -218,7 +218,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         let headers = ["name", "age", "country", "hasPet"]
         // The data used for testing.
         let student: [TestData.KeyedStudent] = [
@@ -252,7 +252,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         let headers = [[], ["name", "age", "country", "hasPet"]]
         // The data used for testing.
         let student: [TestData.UnkeyedStudent] = [
@@ -294,7 +294,7 @@ extension EncodingRegularUsageTests {
         let encoding: String.Encoding = .utf8
         let bomStrategy: Strategy.BOM = .never
         let delimiters: Delimiter.Pair = (",", "\n")
-        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
+        let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
         let headers = [[], ["name", "age", "country", "hasPet"]]
         // The data used for testing.
         let school = TestData.GapSchool(students: [