You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: add `.batch() to `IterableDataset` and introduce new `BatchedExamplesIterable`
* style: formatting...
* refactor: implement feedback to use .map()
* test: add tests for new `batch()` method
* style: formatting...
* fix: remove type hints in `batch_fn()` to fix failing CI
* docs: add section "Batching data in IterableDataset" to "Differences between Dataset and IterableDataset"
* refactor: apply feedback
* docs nit
---------
Co-authored-by: Quentin Lhoest <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/about_mapstyle_vs_iterable.mdx
-4
Original file line number
Diff line number
Diff line change
@@ -205,10 +205,6 @@ for epoch in range(n_epochs):
205
205
pass
206
206
```
207
207
208
-
## Checkpoint and resuming differences
209
-
210
-
If you training loop stops, you may want to restart the training from where it was. To do so you can save a checkpoint of your model and optimizers, as well as your data loader.
211
-
212
208
To restart the iteration of a map-style dataset, you can simply skip the first examples:
Copy file name to clipboardExpand all lines: docs/source/stream.mdx
+38
Original file line number
Diff line number
Diff line change
@@ -318,6 +318,44 @@ You can filter rows in the dataset based on a predicate function using [`Dataset
318
318
{'id': 4, 'text': 'Are you looking for Number the Stars (Essential Modern Classics)? Normally, ...'}]
319
319
```
320
320
321
+
## Batch
322
+
323
+
The `batch` method transforms your `IterableDataset` into an iterable of batches. This is particularly useful when you want to work with batches in your training loop or when using frameworks that expect batched inputs.
324
+
325
+
<Tip>
326
+
327
+
There is also a "Batch Processing" option when using the `map` function to apply a function to batches of data, which is discussed in the [Map section](#map) above. The `batch` method described here is different and provides a more direct way to create batches from your dataset.
0 commit comments