-
Notifications
You must be signed in to change notification settings - Fork 81
fix(ibis): refactor the available function list #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request updates several test files in the ibis-server by modifying function test assertions, expected values, and removing CSV‐based test functions. The constant Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant PySessionContext
participant Runtime
participant Database
Client->>PySessionContext: new(mdl_base64, remote_functions_path)
PySessionContext->>Runtime: Initialize asynchronous runtime
PySessionContext->>Runtime: Register remote functions asynchronously
Client->>PySessionContext: transform_sql(sql)
PySessionContext->>Runtime: Execute SQL transformation asynchronously
Runtime->>Database: Query registered functions (if needed)
Database-->>Runtime: Return query results
Runtime-->>PySessionContext: Return transformed SQL/result
PySessionContext-->>Client: Return transformed SQL/result
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
1a52a99
to
6c6b94d
Compare
We still have the BigQuery secret key issue for the CI. I tested BigQuery on my laptop, and it worked well.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (16)
wren-core/core/src/lib.rs (1)
4-4
:❓ Verification inconclusive
Expanded public API with broader datafusion exports
You've changed from selectively exporting
SessionContext
to re-exporting all items fromdatafusion::arrow::*
anddatafusion::prelude::*
. While this makes more functionality available, it could lead to:
- Potential naming conflicts with other modules
- Tighter coupling to datafusion's API (changes in datafusion could directly impact users)
- Less clear API boundaries
Consider a more selective approach or documenting the expanded public API:
- pub use datafusion::arrow::*; + // Export specific types from datafusion::arrow that are needed + pub use datafusion::arrow::{array, compute, datatypes, record_batch}; - pub use datafusion::prelude::*; + // Export commonly used types from datafusion::prelude + pub use datafusion::prelude::{SessionContext, col, lit, when, SessionConfig};Let's check if any of the exported symbols might conflict with other common names:
Also applies to: 7-7
🏁 Script executed:
#!/bin/bash # Check for potential naming conflicts in exported symbols echo "Checking for common names that might conflict..." # Get symbols from datafusion::prelude cargo doc --no-deps --open &> /dev/null echo "Common names from datafusion::prelude to watch for conflicts:" cargo rustdoc -- --document-private-items 2>/dev/null find target/doc -name "*.html" | grep -i "datafusion.*prelude" | xargs grep -l "pub fn \|pub struct \|pub enum" | xargs grep -o "pub [a-z]* [A-Za-z0-9_]*" | grep -E "pub (fn|struct|enum) (error|result|option|string|vec|map|set|path|file|io|read|write|context|config|session)"Length of output: 636
Attention: Verify Exported Symbols and API Boundaries
The recent change re-exports all symbols from both
datafusion::arrow
anddatafusion::prelude
, which may expose the API to potential naming conflicts, tighter coupling with DataFusion’s internals, and a less clear public API surface. We attempted an automated check for common naming conflicts using documentation generated from DataFusion's prelude, but the script did not yield any output (thetarget/doc
directory was not found). This suggests that automated validation was inconclusive, so please verify manually if any of the re-exported symbols conflict with common names used elsewhere in the codebase.Recommendations:
- Review Naming Conflicts: Manually inspect the exported symbols to ensure they don’t conflict with common names (e.g.,
context
,config
,session
).- Consider a Selective Export: To minimize risks, export only those components that are actually required. For example:
- pub use datafusion::arrow::*; + // Export specific items needed from datafusion::arrow: + pub use datafusion::arrow::{array, compute, datatypes, record_batch}; - pub use datafusion::prelude::*; + // Export selected functionality from datafusion::prelude: + pub use datafusion::prelude::{SessionContext, col, lit, when, SessionConfig};- Documentation: Clearly document the expanded public API and note the rationale behind the selective exports.
Please manually verify the exported symbols for potential conflicts and evaluate whether a more selective export approach would reduce coupling with DataFusion’s API.
ibis-server/tools/remote_function_check.py (3)
1-43
: New utility script for duplicate function detection looks good, with minor improvement opportunitiesThis script efficiently identifies and removes duplicate functions from CSV files by comparing against functions available in wren_core. It supports the PR's goal of eliminating duplicate remote functions.
Consider these improvements:
- Add error handling for file operations (file not found, permission errors)
- Replace print statements with proper logging
- Make the CSV structure parsing more robust (currently assumes function name is in second column)
-import argparse -import os -import wren_core +import argparse +import logging +import os +import wren_core +from pathlib import Path + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) # Later in code: -print("Default Function count: ", len(function_names)) -print("Function is already in the function list:") +logger.info(f"Default Function count: {len(function_names)}") +logger.info("Function is already in the function list:") # Add error handling: +try: with open(args.path, "r") as file: # ... +except FileNotFoundError: + logger.error(f"File not found: {args.path}") + exit(1) +except PermissionError: + logger.error(f"Permission denied: {args.path}") + exit(1)🧰 Tools
🪛 Ruff (0.8.2)
1-1: File
ibis-server/tools/remote_function_check.py
is part of an implicit namespace package. Add an__init__.py
.(INP001)
23-23:
(T201)
24-24:
(T201)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
42-42:
(T201)
27-27
: Remove unnecessary file mode parameterThe default mode for
open()
is already 'r' (read), so specifying it is redundant.-with open(args.path, "r") as file: +with open(args.path) as file:🧰 Tools
🪛 Ruff (0.8.2)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
1-6
: Add__init__.py
to avoid implicit namespace package warningStatic analysis detected that this file is part of an implicit namespace package.
Create an empty
__init__.py
file in thetools
directory to resolve this warning.🧰 Tools
🪛 Ruff (0.8.2)
1-1: File
ibis-server/tools/remote_function_check.py
is part of an implicit namespace package. Add an__init__.py
.(INP001)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
56-65
: Database-specific function testing looks good, but consider checking result lengthThe test now properly checks for Snowflake-specific function "is_null_value" instead of generic "abs", which aligns with the PR goal to improve DataSource-specific function handling.
Consider re-enabling the count assertion to verify the total number of functions:
- # assert len(result) == DATAFUSION_FUNCTION_COUNT + 60 + assert len(result) == DATAFUSION_FUNCTION_COUNT + 60 # Adjust number if neededwren-core-py/src/remote_functions.rs (4)
66-75
: Semantic clarity on variable naming.
Within.and_then(|types| ...)
, you're processing parameter names but calling the variabletypes
. For clarity, consider updating the variable name. Otherwise, the logic to flatten and filter out empty strings is sound..map(|names| { names .into_iter() .flatten() .collect::<Vec<String>>() .join(",") }) -.and_then(|types| if types.is_empty() { None } else { Some(types) }); +.and_then(|joined_names| if joined_names.is_empty() { None } else { Some(joined_names) });
86-90
: Empty string handling for return_type.
Converting the empty string intoNone
is a neat solution. If whitespaces are possible in the CSV, you may want to trim them first, but that’s optional.
109-116
: Potential whitespace trimming.
When buildingVec<Option<String>>
for param names, consider trimming each piece to avoid storing accidental whitespace..map(|name| { let trimmed = name.trim(); if trimmed.is_empty() { None } else { Some(trimmed.to_string()) } })
121-128
: Param type parsing consistency.
This is the same approach used for param names. Trimming may also be beneficial here for cleanliness.wren-core-py/src/context.rs (2)
62-62
: Unwrap risk in default.
UsingRuntime::new().unwrap()
can panic on failure. Consider.expect(...)
or error handling if the runtime fails to initialize.
238-305
: Async function “get_regietered_functions” with potential typo.
The method’s name is misspelled (“regietered”). Consider renaming toget_registered_functions
for clarity. Otherwise, the batch parsing logic is solid.-async fn get_regietered_functions( +async fn get_registered_functions(wren-core/core/src/mdl/function.rs (5)
29-55
: Handling fallback signatures on unrecognized parameter types.The new
get_signature
method andtransform_param_type
function effectively parse and apply exact signatures whenparam_types
is fully known. However, note that if any parameter type parsing fails, the entire signature falls back to[Nullary, VariadicAny]
. Consider logging or otherwise flagging partial mismatches, so users can diagnose unexpected fallback signatures.
89-151
: Introducing theReturnType
enum and related logic.This addition elegantly handles specialized return types, skipping complex nested array logic for now. For future extensibility, consider whether deeper nesting or advanced type resolution might be needed (e.g., arrays of arrays). Otherwise, this approach is concise and covers common cases.
178-190
: Conversion with potential panic on invalid return type.
impl From<RemoteFunction> for ByPassScalarUDF
panics ifReturnType::from_str
fails, aligning with the chosen "panic on invalid input" strategy. If more graceful handling is desired, consider returning aResult<Self>
instead to propagate errors.
261-269
: Consistent panic on invalid return type for Aggregate UDF.As with the scalar variant, the logic is clear but might be improved by propagating errors if dynamic input is expected.
466-640
: Extensive coverage of remote-to-bypass function testing.These tests thoroughly validate various parameter name/type scenarios and return-type enum variants. To further enhance coverage, consider adding negative tests that confirm how errors/panics are handled upon invalid input (e.g., unrecognized data types).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
ibis-server/resources/function_list/bigquery.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/canner.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/clickhouse.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/mssql.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/mysql.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/postgres.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/snowflake.csv
is excluded by!**/*.csv
ibis-server/resources/function_list/trino.csv
is excluded by!**/*.csv
wren-core-py/Cargo.lock
is excluded by!**/*.lock
wren-core-py/poetry.lock
is excluded by!**/*.lock
wren-core-py/tests/functions.csv
is excluded by!**/*.csv
📒 Files selected for processing (20)
ibis-server/tests/conftest.py
(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/canner/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/mssql/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/mysql/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/postgres/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_functions.py
(1 hunks)ibis-server/tests/routers/v3/connector/trino/test_functions.py
(1 hunks)ibis-server/tools/remote_function_check.py
(1 hunks)wren-core-py/src/context.rs
(7 hunks)wren-core-py/src/errors.rs
(1 hunks)wren-core-py/src/remote_functions.rs
(3 hunks)wren-core-py/tests/test_modeling_core.py
(2 hunks)wren-core/core/src/lib.rs
(1 hunks)wren-core/core/src/logical_plan/analyze/plan.rs
(5 hunks)wren-core/core/src/logical_plan/utils.rs
(9 hunks)wren-core/core/src/mdl/function.rs
(7 hunks)wren-core/core/src/mdl/mod.rs
(4 hunks)wren-core/core/src/mdl/utils.rs
(3 hunks)
🧰 Additional context used
🧬 Code Definitions (6)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (9)
ibis-server/tests/routers/v3/connector/canner/test_functions.py (1)
set_remote_function_list_path
(36-40)ibis-server/tests/routers/v3/connector/mssql/test_functions.py (1)
set_remote_function_list_path
(36-40)ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)
set_remote_function_list_path
(37-41)ibis-server/tests/routers/v3/connector/postgres/test_functions.py (1)
set_remote_function_list_path
(38-42)ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py (1)
set_remote_function_list_path
(36-40)ibis-server/tests/routers/v3/connector/trino/test_functions.py (1)
set_remote_function_list_path
(36-40)ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
set_remote_function_list_path
(33-37)ibis-server/app/config.py (1)
set_remote_function_list_path
(70-71)ibis-server/tests/conftest.py (1)
client
(18-23)
wren-core/core/src/logical_plan/analyze/plan.rs (1)
wren-core/core/src/logical_plan/utils.rs (1)
try_map_data_type
(97-102)
wren-core-py/src/context.rs (3)
wren-core/core/src/mdl/function.rs (14)
name
(216-218)name
(279-281)name
(342-344)new
(165-175)new
(248-258)new
(311-321)from
(179-188)from
(262-271)from
(325-334)param_types
(46-49)return_type
(224-226)return_type
(287-289)from_str
(79-86)from_str
(119-129)wren-core/core/src/mdl/mod.rs (3)
new
(117-172)new
(421-423)register_remote_function
(383-408)wren-core-py/src/remote_functions.rs (2)
from
(65-99)from
(103-139)
wren-core-py/tests/test_modeling_core.py (2)
wren-core-py/src/context.rs (3)
transform_sql
(139-148)functions
(93-96)get_available_functions
(151-160)ibis-server/app/routers/v3/connector.py (1)
functions
(104-115)
wren-core/core/src/mdl/mod.rs (5)
wren-core/core/src/logical_plan/utils.rs (1)
try_map_data_type
(97-102)wren-core/core/src/mdl/context.rs (1)
new
(228-231)wren-core/core/src/mdl/function.rs (3)
new
(165-175)new
(248-258)new
(311-321)wren-core/core/src/logical_plan/analyze/model_anlayze.rs (2)
new
(67-75)new
(788-796)wren-core/core/src/mdl/lineage.rs (1)
new
(26-37)
wren-core/core/src/mdl/function.rs (1)
wren-core/core/src/logical_plan/utils.rs (1)
map_data_type
(104-171)
🪛 Ruff (0.8.2)
ibis-server/tools/remote_function_check.py
1-1: File ibis-server/tools/remote_function_check.py
is part of an implicit namespace package. Add an __init__.py
.
(INP001)
23-23: print
found
(T201)
24-24: print
found
(T201)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
42-42: print
found
(T201)
🔇 Additional comments (67)
ibis-server/tests/conftest.py (1)
14-14
:❓ Verification inconclusive
Significant increase in DATAFUSION_FUNCTION_COUNT constant
The constant has been increased from 273 to 25941, which is a substantial change. This appears to align with retrieving function information from INFORMATION_SCHEMA as mentioned in the PR description, rather than manually defining functions.
Please verify that this dramatic increase is expected and won't negatively impact test performance:
🏁 Script executed:
#!/bin/bash # Check if test performance is affected by the large function count time=$(python -c " import time start = time.time() import sys sys.path.append('ibis-server') from tests.conftest import DATAFUSION_FUNCTION_COUNT print(f'Loading {DATAFUSION_FUNCTION_COUNT} functions took {time.time() - start:.2f} seconds') ") echo "$time"Length of output: 521
Review Verification: Manual Check Required for Dependency Issue
The test performance verification did not complete as expected because the environment is missing the
asgi_lifespan
module (error: ModuleNotFoundError: No module named 'asgi_lifespan'). The dramatic increase inDATAFUSION_FUNCTION_COUNT
from 273 to 25941 appears intentional—aligning with the new approach of retrieving function data from INFORMATION_SCHEMA—but its impact on test performance remains unverified.
- Action Items:
- Ensure that the required
asgi_lifespan
module is installed or properly configured for the test environment.- Manually verify that loading 25,941 functions does not degrade test performance when the dependencies are correctly set up.
wren-core-py/src/errors.rs (1)
4-4
: Reintroducing ParseIntError importThe import for
ParseIntError
has been added back, which is correct since the code already has an implementation forFrom<ParseIntError>
at lines 64-68. This ensures the error handling for integer parsing works properly.ibis-server/tests/routers/v3/connector/canner/test_functions.py (1)
56-65
:✅ Verification successful
Updated function testing and reduced count
The test now:
- Expects 28 additional functions (down from 30)
- Tests the "to_base64" function instead of "abs"
- Has updated the expected function description and return type
These changes align with the PR's goal of eliminating duplicate remote functions for each DataSource.
Let's verify if other test files have been consistently updated with the new function pattern:
🏁 Script executed:
#!/bin/bash # Check if all connector test files have been updated to test "to_base64" instead of "abs" echo "Checking for consistent function test patterns across connector tests..." # Look for test files that might still be using "abs" function grep -r "x\[\"name\"\] == \"abs\"" --include="test_*.py" ibis-server/tests/routers/v3/connector/ # Check other connectors for consistent handling of to_base64 echo -e "\nChecking to_base64 implementation across connector test files:" grep -r "x\[\"name\"\] == \"to_base64\"" --include="test_*.py" ibis-server/tests/routers/v3/connector/Length of output: 762
Consistent test updates verified across connectors
The updated test in
ibis-server/tests/routers/v3/connector/canner/test_functions.py
now correctly expects 28 additional functions, tests the "to_base64" function (with the updated description and return type), and is consistent with similar changes in other connectors (e.g., in the Trino tests). Shell script results confirmed that there are no remaining tests still using the "abs" function and that the new "to_base64" pattern is applied uniformly.ibis-server/tests/routers/v3/connector/mssql/test_functions.py (1)
57-65
: Function test now uses database-specific 'sysdatetime' instead of generic 'floor'The test now validates a MSSQL-specific function with appropriate type information, aligning with the PR's goal to improve function information retrieval and type signatures for each DataSource.
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)
58-66
: Database-specific function test updated correctlyThe test now validates MySQL-specific function "lcase" with appropriate type information (Utf8 instead of int), which aligns with the PR goal of improving function registration and type signatures for each DataSource.
wren-core/core/src/mdl/mod.rs (4)
1-1
: Import of try_map_data_type is consistent with the implementation changes.The import of
try_map_data_type
fromlogical_plan::utils
aligns with the function's usage throughout the file. This change is part of the refactoring effort to improve error handling for data type mapping.
222-222
: Enhanced error handling with try_map_data_type.Replacing
map_data_type
withtry_map_data_type
improves robustness by providing a fallback to Utf8 type when parsing fails, as shown in the implementation from the relevant code snippets.
391-392
: Consistent application of try_map_data_type across all function types.The change consistently applies
try_map_data_type
for all three function types (Scalar, Aggregate, Window), ensuring uniform error handling behavior across different function registrations.Also applies to: 397-398, 403-404
1310-1311
: Improved error message specificity for struct field access.The error message has been updated to provide more detailed information about type mismatches when accessing struct fields. The new message explicitly mentions that the operation is only valid for
Struct
,Map
, orNull
types, helping developers identify the issue more quickly.ibis-server/tests/routers/v3/connector/trino/test_functions.py (1)
57-65
: Test updated to verify to_base64 function instead of array_distinct.The test has been updated to verify the "to_base64" function instead of "array_distinct", with updated type signatures that match the correct specification. This aligns with the PR objective to improve type signatures and ensure consistency in function definitions.
The parameter type changed from "array" to "Binary" and return type from "array" to "Utf8", which is the correct signature for a base64 encoding function.
ibis-server/tests/routers/v3/connector/postgres/test_functions.py (1)
65-67
: Updated extract function type signatures to use precise Arrow types.The parameter and return types for the "extract" function have been updated to use more precise Arrow type definitions:
- Parameter types changed from "text,timestamp" to "Utf8,Timestamp(Nanosecond, None)"
- Return type changed from "numeric" to "Decimal128(38, 10)"
These changes reflect a move towards more accurate type representations that match the Arrow data model, which is essential for proper type handling across systems.
ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py (1)
57-65
: Test updated to verify uniq aggregate function instead of abs scalar function.The test case has been modified to check for the "uniq" aggregate function instead of the "abs" scalar function, with appropriate updates to the function properties:
- Function type changed from "scalar" to "aggregate"
- Description updated to correctly describe the HyperLogLog-based cardinality estimation
- Parameter types and return types set to None, which is appropriate for this function in ClickHouse
This change better represents the ClickHouse-specific functions in the test suite and aligns with the PR objective to improve function definitions across different data sources.
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2)
48-55
: Enhanced test precision for string_agg functionThe test now properly filters for the specific
string_agg
function withLargeUtf8,LargeUtf8
parameter types, which makes the test more precise and less prone to errors if multiple variants of this function exist.
57-63
: Updated string_agg function attributes to match actual implementationThe function metadata has been updated with a more descriptive explanation, correct parameter names, and accurate type information that aligns with the actual BigQuery implementation.
wren-core/core/src/logical_plan/utils.rs (5)
95-102
: Improved error handling with fallback for unsupported data typesThe new
try_map_data_type
function gracefully handles type mapping failures by providing a fallback toDataType::Utf8
when a data type can't be parsed. This prevents errors from propagating and makes the system more robust when dealing with unknown or unsupported types.
106-107
: Better variable naming for improved clarityIntroducing
lower_data_type
for the lowercase version of the data type string improves code readability by making it clear which value is being used in subsequent operations.
147-148
: Added support for XML and JSONB data typesSupport for additional data types enhances compatibility with various database systems. The appropriate fallback types are chosen for data types not natively supported by Arrow.
165-168
: Fixed case sensitivity in the DataType parsing fallbackThe code now correctly handles case sensitivity by using the original data type string with
DataType::from_str()
rather than the lowercase version, which may lead to parsing errors for case-sensitive type names.
437-457
: Updated test cases to use try_map_data_typeTest cases have been appropriately updated to use the new error-handling function while maintaining the same assertions, ensuring that the existing functionality continues to work correctly.
wren-core/core/src/mdl/utils.rs (2)
215-216
: Consistent adoption of error-handled type mappingReplacing
map_data_type
withtry_map_data_type
in the field creation function improves error handling for column type mapping, making the system more resilient when processing columns with unsupported data types.
233-240
: Enhanced robustness in remote field handlingThe remote field creation now uses the error-handled type mapping function, reducing the risk of failures when processing data types from remote data sources that might have type systems incompatible with Arrow.
wren-core/core/src/logical_plan/analyze/plan.rs (6)
23-23
: Updated import to use the error-handling version of map_data_typeThe import has been updated to use
try_map_data_type
instead ofmap_data_type
, aligning with the refactoring effort to improve error handling throughout the codebase.
230-233
: Improved error handling in model plan node buildingUsing
try_map_data_type
in the field creation process for the model plan node ensures that the system can handle unexpected data types gracefully without failing the entire plan construction.
767-770
: Enhanced robustness in model source node field creationThe model source node now uses the error-handled type mapping function when creating fields, ensuring that wildcard selections can handle columns with unsupported data types.
807-810
: Consistent error handling across field creationUsing
try_map_data_type
consistently in all model source node field creation paths ensures uniform behavior when handling data types in both wildcard and explicit column scenarios.
901-903
: Improved calculation plan node field creationThe calculation plan node now uses error-handled type mapping for the calculation column's field creation, preventing failures when dealing with complex calculated expressions that might result in unsupported data types.
906-908
: Consistent error handling for primary key fieldsUsing
try_map_data_type
for primary key field creation in calculation plan nodes ensures that the join operations will work correctly even when primary keys have unusual or complex data types.wren-core-py/src/remote_functions.rs (5)
22-22
: Non-issue import addition.
Addingstd::fmt::Display
is appropriate for implementing theDisplay
trait below.
27-27
: Extended trait derivation.
DerivingEq
,PartialEq
,Hash
, andDebug
onPyRemoteFunction
is useful for hashing, set membership checks, and debugging.
77-85
: Consistent parameter type flattening.
This mirrors the parameter names logic and is equally valid. No concerns.
94-94
: Properly using the transformed return_type.
Aligns with the new handling for empty versus non-empty strings.
141-155
: UsefulDisplay
implementation.
This user-friendly string representation will help with logging and debugging. No issues.wren-core-py/tests/test_modeling_core.py (10)
97-101
: Updated function call for two parameters.
Using bothc_custkey
arguments confirms the function signature. The rewritten SQL assertion is correct.
109-109
: Large function list size.
Raising the count to25948
might affect performance. Verify this aligns with new function entries and ensure no unintentional duplicates.
112-116
: Consistent transformation verification.
Ensures the generated SQL handles both parameters correctly. LGTM.
121-121
: Additional function count check.
25941
closely matches the previous large count—just confirm correctness so no duplication or missing entries exist.
131-131
: Specifying return type as Int32.
Reflects refined data type handling. The test matches recent changes.
133-133
: Correct parameter types.
“Int32,Int32
” is consistent with the new approach.
141-148
: Added scalar function “add_custom”.
Parameters are listed in CSV butparam_names
is intentionallyNone
. This is fine if it’s expected.
149-156
: Validating array-based function.
A function returningNone
for both param and return types suggests it’s untyped or determined at runtime. No immediate concerns.
157-164
: Testing function with no param names and no return type.
This aligns with the possibility of dynamic type resolution in the system.
165-175
: Function missing param type in CSV.
The test clarifies how a missing param type yieldsNone
. Implementation is logically consistent with your data flow.wren-core-py/src/context.rs (12)
25-25
: Added import for FromStr.
Enables parsing of string-based function types. Straightforward addition.
27-28
: New imports for vec and Runtime.
These support your asynchronous patterns. No concerns.
31-31
: Importing GenericStringType.
Likely needed for reading string columns from DataFusion. Looks good.
38-41
: Expanded imports from wren_core.
Brings inmdl
,AggregateUDF
, etc. to manage user-defined functions. Safe changes.
48-48
: Storing the runtime in the session context.
This design allows async operations. Straightforward.
86-87
: Information Schema enabled.
Allows queries againstinformation_schema
for function metadata, which matches your function retrieval logic.
88-88
: Runtime creation with error handling.
map_err(CoreError::from)?
is good for safer initialization.
90-99
: Asynchronous retrieval of registered functions.
Good approach to check for duplicates before re-registering. Minimizes overhead.
115-115
: Consistent runtime usage.
Keeps the runtime in sync when an MDL is provided. No issues.
140-146
: Async transform flow.
block_on(mdl::transform_sql_with_ctx(...))
is consistent with the new runtime-based context.
152-159
: Async function list retrieval.
Modeling functions asPyRemoteFunction
after awaiting the DataFusion queries is orderly.
203-216
: Registering remote functions by function type.
Good approach to unify the code path for scalar, aggregate, and window UDFs.wren-core/core/src/mdl/function.rs (13)
3-3
: Consolidate and standardize new imports.These import statements for
internal_err
,not_impl_err
, and items fromdatafusion::logical_expr
as well as themap_data_type
utility appear consistent and purposeful. They unify error handling and provide necessary functions for the new return-type logic.Also applies to: 8-10, 17-18
24-25
: Param fields allow for partial undefined parameters.Expanding
param_names
andparam_types
toOption<Vec<Option<String>>>
provides flexibility for scenarios where certain parameter names or types may remain unspecified. This design choice is coherent, but ensure that downstream logic (like doc generation and signature construction) gracefully handles these nestedNone
values.
57-57
: AddedPartialEq
to derive macros.Adding
PartialEq
for the enum or struct is beneficial for test assertions and comparisons. No concerns here.
66-66
: FunctionType display implementation unchanged.The
impl Display for FunctionType
remains straightforward, mirroring the enum variants. No issues noted.
80-80
: String matching infrom_str
aligns with the existing pattern.The
match s.to_lowercase().as_str()
approach is consistent across similar code. Approved.
159-159
: EmbeddingReturnType
anddoc
:Storing
ReturnType
anddoc
directly inByPassScalarUDF
is consistent with the broader refactor design. The approach is clear and maintainable.Also applies to: 161-161
191-209
: Building documentation from partially known parameters.Using
param_types
strings as argument descriptions and defaulting to an empty string for missing elements ensures consistent doc generation. This looks good, and the logic handlesNone
gracefully.
224-225
: ByPassScalarUDF return type forwarding.The
return_type
method simply defers toself.return_type.to_data_type(arg_types)
, which is coherent with the newReturnType
abstraction.
232-234
: Documentation forwarding.Returning
self.doc.as_ref()
is straightforward. No concerns here.
242-242
: EmbeddingReturnType
anddoc
in ByPassAggregateUDF.Adopting the same design pattern for aggregates ensures consistency across UDF implementations.
Also applies to: 244-244
287-288
: Aggregate UDF return type logic is straightforward.The code properly delegates to
ReturnType::to_data_type(arg_types)
.
295-297
: Aggregate UDF documentation retrieval.Again, a simple reference to
self.doc
, consistent with previous patterns.
357-360
: Window function field logic.The
field
method usesto_data_type(field_args.input_types())
to derive the return type, which is aligned with the rest of the code.
Description
ibis-server/tools/remote_function_check.py
Summary by CodeRabbit
New Features
Improvements