Skip to content

Apply map_batches and forward in lazy mode with sequence and numeric. #249

Open
@linjing-lab

Description

@linjing-lab

The map_batches function need LazyFrame query optimization and stream compuation in forward module.

df.with_columns([
    polars.col("features").map_batches(lambda seq: NeuralNetwork.forward(seq.to_numpy())).alias("activations")
])

The above solution pattern is eager mode, so if lazy mode would enhance with df namespace, the whole program execute both query optimization and forward computation.

df.lazy().with_columns([
    polars.col("features").map_batches(lambda seq: NeuralNetwork.forward(seq.to_numpy())).alias("activations")
]).collect()

I'd rather choose NeuralNetwork use the same numeric level as numpy.ndarray, to make numerical forward extensible and compatible with lazy mode in stream computation. Cause lambda function takes a delayed buffer in completion of query optimization. The recommand solution describes like, col and map_batches are the expressions of lazy query, so the execute process located at collect function when memory streamly pass expression (not occupy new memory) and make query plan.

The NeuralNetwork can be perceptron and any sequence-friendly model with activated high-dimensional sequences to predict downstream task with activations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions