You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* add split argument to Generator, from_generator, AbstractDatasetInputStream, GeneratorDatasetInputStream
* split generator review feedbacks
* import Split
* tag added version in iterable_dataset, rollback change in _concatenate_iterable_datasets
* rm useless Generator __init__
* docstring formatting
Co-authored-by: Albert Villanova del Moral <[email protected]>
* format docstring
Co-authored-by: Albert Villanova del Moral <[email protected]>
* fix test_dataset_from_generator_split[None]
---------
Co-authored-by: Albert Villanova del Moral <[email protected]>
Copy file name to clipboardExpand all lines: src/datasets/arrow_dataset.py
+6
Original file line number
Diff line number
Diff line change
@@ -1068,6 +1068,7 @@ def from_generator(
1068
1068
keep_in_memory: bool=False,
1069
1069
gen_kwargs: Optional[dict] =None,
1070
1070
num_proc: Optional[int] =None,
1071
+
split: NamedSplit=Split.TRAIN,
1071
1072
**kwargs,
1072
1073
):
1073
1074
"""Create a Dataset from a generator.
@@ -1090,6 +1091,10 @@ def from_generator(
1090
1091
If `num_proc` is greater than one, then all list values in `gen_kwargs` must be the same length. These values will be split between calls to the generator. The number of shards will be the minimum of the shortest list in `gen_kwargs` and `num_proc`.
1091
1092
1092
1093
<Added version="2.7.0"/>
1094
+
split ([`NamedSplit`], defaults to `Split.TRAIN`):
1095
+
Split name to be assigned to the dataset.
1096
+
1097
+
<Added version="2.21.0"/>
1093
1098
**kwargs (additional keyword arguments):
1094
1099
Keyword arguments to be passed to :[`GeneratorConfig`].
0 commit comments