Skip to content

Easier CustomDataset Creation #1936

Open
0 of 5 issues completed
Open
Parent
0 of 5 issues completed
@lancechua

Description

@lancechua

Description

IO read write functions typically follow this signature:

def load(path: Union[Path, str], **kwars) -> obj:
    ...

def save(obj, path: Union[Path, str], **kwars) -> None:
    ...

Creating custom datasets should ideally be as easy as supplying load / save function(s) that follow this function signature.

Context

kedro supports an extensive range of datasets, but it is not exhaustive. Popular libraries used by relatively niche communities like xarray and arviz aren't currently supported.

Beyond these examples, unofficially adding support for more obscure datasets would be easier.

Initially, I was looking to implement something like this and asked in the Discord chat if this pattern made sense.
Then, @datajoely suggested I open a feature request.

Possible Implementation

We can consider a Dataset class factory. Maybe GenericDataset with a class method .create_custom_dataset(name: str, load: callable, save: callable).

Usage would look something like xarrayNetCDF = GenericDataset("xarrayNetCDF", xarray.open_dataset, lambda x, path, **kwargs: x.to_netcdf(path, **kwargs)).
Entries can be added to the data catalog yaml just as with any other custom dataset implementation.

Possible Alternatives

  • LambdaDataset is very similar but the load, and save are hard coded in the implementation, and cannot be parameterized in the data catalog, as far as I'm aware
  • Subclassing AbstractDataset is an option, but this feature request seeks to reduce boilerplate when defining new datasets
  • Adding xarray support #1346 officially requires implementing nuances like cloud file storage, partitioned datasets, lazy loading, etc.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions