The idea would be to allow something like ```python ds = load_dataset("c4", "en", as_iterable=True) ``` To be used to train models. It would load an IterableDataset from the cached Arrow files. Cc @stas00 Edit : from the discussions we may load from cache when streaming=True