Skip to content

Commit c9cbe41

Browse files
committed
Encoding buffering strategy .assembled added
1 parent e2dcb01 commit c9cbe41

File tree

6 files changed

+69
-67
lines changed

6 files changed

+69
-67
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ A `CSVReadder` parses CSV data from a given input (`String`, or `Data`, or file)
142142

143143
CSV fields are separated within a row with _field delimiters_ (commonly a "comma"). CSV rows are separated through _row delimiters_ (commonly a "line feed"). You can specify any unicode scalar, `String` value, or `nil` for unknown delimiters.
144144

145-
- `escapingStrategy` (default `.doubleQuote`) specify the Unicode scalar used to escape fields.
145+
- `escapingStrategy` (default `"`) specify the Unicode scalar used to escape fields.
146146
147147
CSV fields can be escaped in case they contain priviledge characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e. `"`), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.
148148

@@ -315,11 +315,11 @@ let result = try decoder.decode(CustomType.self, from: data)
315315
`CSVDecoder` can decode CSVs represented as a `Data` blob, a `String`, or an actual file in the file system.
316316
317317
```swift
318-
let decoder = CSVDecoder { $0.bufferingStrategy = .fulfilled }
318+
let decoder = CSVDecoder { $0.bufferingStrategy = .assembled }
319319
let content: [Student] = try decoder([Student].self, from: URL("~/Desktop/Student.csv"))
320320
```
321321
322-
If you are dealing with a big CSV file, it is preferred to used direct file decoding, a `.sequential` or `.fulfilled` buffering strategy, and set *presampling* to false; since then memory usage is drastically reduced.
322+
If you are dealing with a big CSV file, it is preferred to used direct file decoding, a `.sequential` or `.assembled` buffering strategy, and set *presampling* to false; since then memory usage is drastically reduced.
323323
324324
### Decoder configuration
325325
@@ -367,15 +367,15 @@ let data: Data = try encoder.encode(value)
367367
The `Encoder`'s `encode()` function creates a CSV file as a `Data` blob, a `String`, or an actual file in the file system.
368368
369369
```swift
370-
let encoder = CSVEncoder { $0.bufferingStrategy = .sequential }
370+
let encoder = CSVEncoder { $0.headers = ["name", "age", "hasPet"] }
371371
try encoder.encode(value, into: URL("~/Desktop/Students.csv"))
372372
```
373373
374-
If you are dealing with a big CSV content, it is preferred to use direct file encoding and a `.sequential` or `.fulfilled` buffering strategy, since then memory usage is drastically reduced.
374+
If you are dealing with a big CSV content, it is preferred to use direct file encoding and a `.sequential` or `.assembled` buffering strategy, since then memory usage is drastically reduced.
375375
376376
### Encoder configuration
377377
378-
The encoding process can be tweaked by specifying configuration values. `CSVEncoder` accepts the [same configuration values as `CSVWRiter`](#Writer-configuration) plus the following ones:
378+
The encoding process can be tweaked by specifying configuration values. `CSVEncoder` accepts the [same configuration values as `CSVWriter`](#Writer-configuration) plus the following ones:
379379
380380
- `floatStrategy` (default `.throw`) defines how to deal with non-conforming floating-point numbers (e.g. `NaN`).
381381
@@ -419,7 +419,7 @@ encoder.dataStrategy = .custom { (data, encoder) in
419419
<ul>
420420
<details><summary>Basic adoption.</summary><p>
421421
422-
When a custom type conforms to `Codable`, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for [JSON and Property Lists](https://developer.apple.com/documentation/foundation/archives_and_serialization), but the community provide many other formats, such as: [YAML](https://github.com/jpsim/Yams), [XML](https://github.com/MaxDesiatov/XMLCoder), [BSON](https://github.com/OpenKitten/BSON), and CSV (through this library).
422+
When a custom type conforms to `Codable`, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for [JSON and Property Lists](https://developer.apple.com/documentation/foundation/archives_and_serialization) and the community provide many other formats, such as: [YAML](https://github.com/jpsim/Yams), [XML](https://github.com/MaxDesiatov/XMLCoder), [BSON](https://github.com/OpenKitten/BSON), and CSV (through this library).
423423
424424
Lets see a regular CSV encoding/decoding usage through `Codable`'s interface. Let's suppose we have a list of students formatted in a CSV file:
425425

sources/Codable/Decodable/DecoderConfiguration.swift

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ extension Strategy {
9191
/// Forward/Backwards decoding jumps are allowed. However, previously requested rows cannot be requested again or an error will be thrown.
9292
///
9393
/// This strategy will massively reduce the memory usage, but it will throw an error if a CSV row that was previously decoded is requested from a keyed container.
94-
case fulfilled
94+
case assembled
9595
/// No rows are kept in memory (except for the CSV row being decoded at the moment)
9696
/// Forward jumps are allowed, but the rows in-between the jump cannot be decoded.
9797
case sequential

sources/Codable/Encodable/EncoderConfiguration.swift

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,20 +100,21 @@ extension Strategy {
100100
/// All encoded rows/fields are cached and the *writing* only occurs at the end of the encodable process.
101101
///
102102
/// *Keyed containers* can be used to encode rows/fields unordered. That means, a row at position 5 may be encoded before the row at position 3. Similar behavior is supported for fields within a row.
103-
/// - attention: This strategy consumes the largest amount of memory from all the supported options.
103+
/// - remark: This strategy consumes the largest amount of memory from all the supported options.
104104
case keepAll
105105
/// Encoded rows may be cached, but the encoder will keep the buffer as small as possible by writing completed ordered rows.
106106
///
107107
/// *Keyed containers* can be used to encode rows/fields unordered. The writer will however consume rows in order.
108108
///
109-
/// For example, an encoder starts encoding row 1 and it gets all its fields. The row will get written and no cache for the row is kept. Same situation occurs when the row 2 is encoded.
109+
/// For example, an encoder starts encoding row 1 and gets all its fields. The row will get written and no cache for the row is kept anymore. Same situation occurs when the row 2 is encoded.
110110
/// However, the user may decide to jump to row 5 and encode it. This row will be kept in the cache till row 3 and 4 are encoded, at which time row 3, 4, 5, and any subsequent rows will be writen.
111-
/// - attention: This strategy tries to keep the cache to a minimum, but memory usage may be big if there are holes while encoding rows. Those holes are filled with empty rows at the end of the encoding process.
112-
case fulfilled
111+
/// - attention: If no headers are passed during configuration the encoder has no way to know when a row is completed. That is why, the `.keepAll` buffering strategy will be used instead for such a case.
112+
/// - remark: This strategy tries to keep the cache to a minimum, but memory usage may be big if there are holes while encoding rows/fields. Those holes are filled with empty rows/fields at the end of the encoding process.
113+
case assembled
113114
/// Only the last row (the one being written) is kept in memory. Writes are performed sequentially.
114115
///
115116
/// *Keyed containers* can be used, but at file-level any forward jump will imply writing empty-rows. At field-level *keyed containers* may still be used for random-order writing.
116-
/// - attention: This strategy provides the smallest usage of memory from all.
117+
/// - remark: This strategy provides the smallest usage of memory from all.
117118
case sequential
118119
}
119120
}

sources/Codable/Encodable/Internal/Sink.swift

Lines changed: 39 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -17,22 +17,55 @@ extension ShadowEncoder {
1717
/// Creates the unique data sink for the encoding process.
1818
init(writer: CSVWriter, configuration: CSVEncoder.Configuration, userInfo: [CodingUserInfoKey:Any]) throws {
1919
self.writer = writer
20-
self.buffer = Buffer(strategy: configuration.bufferingStrategy)
20+
21+
let strategy: Strategy.EncodingBuffer
22+
switch configuration.bufferingStrategy {
23+
case .assembled where configuration.headers.isEmpty: strategy = .keepAll
24+
case let others: strategy = others
25+
}
26+
27+
self.buffer = Buffer(strategy: strategy, expectedFields: self.writer.expectedFields)
2128
self.configuration = configuration
2229
self.userInfo = userInfo
2330
self.headerLookup = .init()
2431

25-
switch configuration.bufferingStrategy {
32+
switch strategy {
2633
case .keepAll:
2734
self.fieldValue = { [unowned buffer = self.buffer] in
2835
// A.1. Just store the field in the buffer and forget till completion.
2936
buffer.store(value: $0, at: $1, $2)
3037
}
3138

32-
case .fulfilled:
33-
self.fieldValue = { [unowned buffer = self.buffer, unowned writer = self.writer] (v, r, f) in
34-
// B.1.
35-
fatalError()
39+
case .assembled:
40+
self.fieldValue = { [unowned buffer = self.buffer, unowned writer = self.writer] in
41+
// B.1. Is the requested row the same as the writer's row focus?
42+
guard writer.rowIndex == $1 else {
43+
// B.1.1. If not, the row must not have been written yet (otherwise an error is thrown).
44+
guard $1 > writer.rowIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: $1, fieldIndex: $2, value: $0) }
45+
// B.1.2. If the row hasn't been writen yet, store it in the buffer.
46+
return buffer.store(value: $0, at: $1, $2)
47+
}
48+
// B.2. Is the requested field the same as the writer's field focus?
49+
guard writer.fieldIndex == $2 else {
50+
// B.2.1 If not, the field must not have been written yet (otherwise an error is thrown).
51+
guard $2 > writer.fieldIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: $1, fieldIndex: $2, value: $0) }
52+
// B.2.2 If the field hasn't been writen yet, store it in the buffer.
53+
return buffer.store(value: $0, at: $1, $2)
54+
}
55+
// B.3. Write the provided field since it is the same as the writer's row/field.
56+
try writer.write(field: $0)
57+
58+
assert(writer.expectedFields > 0)
59+
// B.4. Are there subsequent fields in the buffer?
60+
while true {
61+
// B.5. If is not the end of the row, check the buffer and see whether the following fields are already cached.
62+
while writer.fieldIndex < writer.expectedFields {
63+
guard let field = buffer.retrieveField(at: writer.rowIndex, writer.fieldIndex) else { return }
64+
try writer.write(field: field)
65+
}
66+
// B.6. If it is the end of the row, write the row delimiter and continue with the next row.
67+
try writer.endRow()
68+
}
3669
}
3770

3871
case .sequential:
@@ -76,39 +109,6 @@ extension ShadowEncoder {
76109
}
77110
}
78111

79-
// #warning("How to deal with intended field gaps?")
80-
// // When the next row is writen, check the previous row.
81-
// // What happens when there are several empty rows?
82-
//
83-
// // 1. Is the requested row the same as the writer's row focus?
84-
// guard self.writer.rowIndex == rowIndex else {
85-
// // 1.1. If not, the row must not have been written yet (otherwise an error is thrown).
86-
// guard self.writer.rowIndex > rowIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: rowIndex, fieldIndex: fieldIndex, value: value) }
87-
// // 1.2. If the row hasn't been writen yet, store it in the buffer.
88-
// return self.buffer.store(value: value, at: rowIndex, fieldIndex)
89-
// }
90-
// // 2. Is the requested field the same as the writer's field focus?
91-
// guard self.writer.fieldIndex == fieldIndex else {
92-
// // 2.1 If not, the field must not have been written yet (otherwise an error is thrown).
93-
// guard self.writer.fieldIndex > fieldIndex else { throw CSVEncoder.Error.writingSurpassed(rowIndex: rowIndex, fieldIndex: fieldIndex, value: value) }
94-
// // 2.2 If the field hasn't been writen yet, store it in the buffer.
95-
// return self.buffer.store(value: value, at: rowIndex, fieldIndex)
96-
// }
97-
// // 3. Write the provided field since it is the same as the writer's row/field.
98-
// try self.writer.write(field: value)
99-
// // 4. How many fields per row there are? If unknown, stop.
100-
// guard self.writer.expectedFields > 0 else { return }
101-
// #warning("How to deal with the first ever row when no headers are given?")
102-
// while true {
103-
// // 5. If is not the end of the row, check the buffer and see whether the following fields are already cached.
104-
// while self.writer.fieldIndex < self.writer.expectedFields {
105-
// guard let field = self.buffer.retrieveField(at: self.writer.rowIndex, self.writer.fieldIndex) else { return }
106-
// try self.writer.write(field: field)
107-
// }
108-
// // 6. If it is the end of the row, write the row delimiter and pass to the next row.
109-
// try self.writer.endRow()
110-
// }
111-
112112
extension ShadowEncoder.Sink {
113113
/// The number of fields expected per row.
114114
///

sources/Codable/Encodable/Internal/SinkBuffer.swift

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,24 @@ extension ShadowEncoder.Sink {
33
internal final class Buffer {
44
/// The buffering strategy.
55
let strategy: Strategy.EncodingBuffer
6+
/// The number of expectedFields
7+
private let expectedFields: Int
68
/// The underlying storage.
79
private var storage: [Int: [Int:String]]
810

911
/// Designated initializer.
10-
init(strategy: Strategy.EncodingBuffer) {
12+
init(strategy: Strategy.EncodingBuffer, expectedFields: Int) {
1113
self.strategy = strategy
14+
self.expectedFields = (expectedFields > 0) ? expectedFields : 8
1215

1316
let capacity: Int
1417
switch strategy {
1518
case .keepAll: capacity = 256
16-
case .fulfilled: capacity = 16
19+
case .assembled: capacity = 16
1720
case .sequential: capacity = 2
1821
}
1922
self.storage = .init(minimumCapacity: capacity)
2023
}
21-
22-
//#warning("Optimize field storage passing writer's expected fields")
2324
}
2425
}
2526

@@ -42,9 +43,9 @@ extension ShadowEncoder.Sink.Buffer {
4243
/// - parameter rowIndex: The position for the row being targeted.
4344
/// - parameter fieldIndex: The position for the field being targeted.
4445
func store(value: String, at rowIndex: Int, _ fieldIndex: Int) {
45-
var row = self.storage[rowIndex] ?? .init()
46-
row[fieldIndex] = value
47-
self.storage[rowIndex] = row
46+
var fields = self.storage[rowIndex] ?? Dictionary(minimumCapacity: self.expectedFields)
47+
fields[fieldIndex] = value
48+
self.storage[rowIndex] = fields
4849
}
4950

5051
/// Retrieves and removes from the buffer the indicated value.

tests/CodableTests/EncodingRegularUsageTests.swift

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ extension EncodingRegularUsageTests {
8181
let encoding: String.Encoding = .utf8
8282
let bomStrategy: Strategy.BOM = .never
8383
let delimiters: Delimiter.Pair = (",", "\n")
84-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
84+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
8585
// The data used for testing.
8686
let value: [[String]] = []
8787

@@ -104,7 +104,7 @@ extension EncodingRegularUsageTests {
104104
let encoding: String.Encoding = .utf8
105105
let bomStrategy: Strategy.BOM = .never
106106
let delimiters: Delimiter.Pair = (",", "\n")
107-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
107+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
108108
// The data used for testing.
109109
let value: [[String]] = [[.init()]]
110110

@@ -128,7 +128,7 @@ extension EncodingRegularUsageTests {
128128
let encoding: String.Encoding = .utf8
129129
let bomStrategy: Strategy.BOM = .never
130130
let delimiters: Delimiter.Pair = (",", "\n")
131-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
131+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
132132
// The data used for testing.
133133
let school = TestData.School<TestData.KeyedStudent>(students: [])
134134

@@ -151,7 +151,7 @@ extension EncodingRegularUsageTests {
151151
let encoding: String.Encoding = .utf8
152152
let bomStrategy: Strategy.BOM = .never
153153
let delimiters: Delimiter.Pair = (",", "\n")
154-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
154+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
155155
let headers = ["name", "age", "country", "hasPet"]
156156
// The data used for testing.
157157
let student = TestData.KeyedStudent(name: "Marcos", age: 111, country: "Spain", hasPet: true)
@@ -186,7 +186,7 @@ extension EncodingRegularUsageTests {
186186
let encoding: String.Encoding = .utf8
187187
let bomStrategy: Strategy.BOM = .never
188188
let delimiters: Delimiter.Pair = (",", "\n")
189-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
189+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
190190
let headers: [String] = []
191191
// The data used for testing.
192192
let student = TestData.UnkeyedStudent(name: "Marcos", age: 111, country: "Spain", hasPet: true)
@@ -218,7 +218,7 @@ extension EncodingRegularUsageTests {
218218
let encoding: String.Encoding = .utf8
219219
let bomStrategy: Strategy.BOM = .never
220220
let delimiters: Delimiter.Pair = (",", "\n")
221-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
221+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
222222
let headers = ["name", "age", "country", "hasPet"]
223223
// The data used for testing.
224224
let student: [TestData.KeyedStudent] = [
@@ -252,7 +252,7 @@ extension EncodingRegularUsageTests {
252252
let encoding: String.Encoding = .utf8
253253
let bomStrategy: Strategy.BOM = .never
254254
let delimiters: Delimiter.Pair = (",", "\n")
255-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
255+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
256256
let headers = [[], ["name", "age", "country", "hasPet"]]
257257
// The data used for testing.
258258
let student: [TestData.UnkeyedStudent] = [
@@ -294,7 +294,7 @@ extension EncodingRegularUsageTests {
294294
let encoding: String.Encoding = .utf8
295295
let bomStrategy: Strategy.BOM = .never
296296
let delimiters: Delimiter.Pair = (",", "\n")
297-
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, /*.fulfilled, */.sequential]
297+
let bufferStrategies: [Strategy.EncodingBuffer] = [.keepAll, .assembled, .sequential]
298298
let headers = [[], ["name", "age", "country", "hasPet"]]
299299
// The data used for testing.
300300
let school = TestData.GapSchool(students: [

0 commit comments

Comments
 (0)