Replies: 1 comment
-
One way to solve this would be to use views I guess. Somehow this all feels hacky - having SQL to communicate plans, adding views (no idea how databases implement views - does that already start some internal optimizations?) I now found a doc that explains how to execute substrait server-side, but the API does not allow me to define my own catalog with whitelisted view IDs. I feel like I'm missing a part somewhere |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to achieve a remote lazy DF with Ibis and I am super close!
This is what I have so far.
The clients request data. The server already does some pre-processing, such as filtering and joins. This bit is important: I don't want clients to have full access to the table! So I keep track of the resources that were created and create a UUID for this specific resource.
So the client only knows the random ID + schema - enough to create an unbound table on the client.
Imagine clients do further processing there, maybe even join multiple resources together (all fetched in the same way).
I then serialize their client-side pipeline to SQL and send it back to the server. (I would have loved to use substrait but didn't find a way to de-serialize it server-side with a catalog, so SQL will do for now).
The client-side SQL looks something like
Lastly, I "just" need to execute the SQL in the context of the table registry I built above.
This is what I have so far:
The
con.execute
raises a catalog errorduckdb.duckdb.CatalogException: Catalog Error: Table with name 4c9c66a6-6223-48b6-9b4c-e1b0c4c1f3ce does not exist!
.Which makes sense, the frontend SQL is built using the UUID as a table name, which of course is not mapped back to the table.
So I need to
execute
with a select catalog of bound expressions (I have them inresource_registry
).Can I create a temporary catalog that only contains the bound expressions from above?
Or, stepping back from this implementation, how else can I achieve my goal?
Bear in mind that I want clients to be able to join multiple pipelines together - anything in
resource_registry
is "sanctioned" for use. So it wouldn't be enough to just apply the SQL to one of the entries of the registry.I also can't send the actual server-side query to the client for security reasons - or else I would give them full access to the DB!
Beta Was this translation helpful? Give feedback.
All reactions