-
I am trying to achieve a remote lazy DF with Ibis and I am super close! This is what I have so far. resource_registry: dict[str, ibis.Table] = {}
@app.get("/data")
def get_data():
"""
Create a filtered dataset and register it lazily.
"""
table = con.table("house-price")
# Some server-side pre-processing
df = table.filter(table.mainroad == "yes")
# Generate a random ID for the client
dataset_id = str(uuid.uuid4())
# Register the lazy DF so we can fetch it later
resource_registry[dataset_id] = df
# Return schema + ID to client
return {
"dataset_id": dataset_id,
"schema": {col: str(dtype) for col, dtype in df.schema().items()},
} So the client only knows the random ID + schema - enough to create an unbound table on the client. The client-side SQL looks something like SELECT * FROM "4c9c66a6-6223-48b6-9b4c-e1b0c4c1f3ce" AS "t0" WHERE "t0"."price" > 11000000" Lastly, I "just" need to execute the SQL in the context of the table registry I built above. This is what I have so far: # Parse the SQL plan to unbound expression
client_expr_unbound = ibis.parse_sql(
sql_from_client,
catalog={
dataset_id: expr.schema() for dataset_id, expr in resource_registry.items()
}
)
# This fails:
con.execute(client_expr_unbound) The So I need to Or, stepping back from this implementation, how else can I achieve my goal? Bear in mind that I want clients to be able to join multiple pipelines together - anything in |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
One way to solve this would be to use views I guess. Somehow this all feels hacky - having SQL to communicate plans, adding views (no idea how databases implement views - does that already start some internal optimizations?) I now found a doc that explains how to execute substrait server-side, but the API does not allow me to define my own catalog with whitelisted view IDs. I feel like I'm missing a part somewhere |
Beta Was this translation helpful? Give feedback.
We can use alias property to achieve this