Skip to content

Load a cached dataset as iterable #5481

Open
@lhoestq

Description

@lhoestq

The idea would be to allow something like

ds = load_dataset("c4", "en", as_iterable=True)

To be used to train models. It would load an IterableDataset from the cached Arrow files.

Cc @stas00

Edit : from the discussions we may load from cache when streaming=True

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood second issueIssues a bit more difficult than "Good First" issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions