Skip to content

feat(ibis): pushdown the limit of the query request into SQL #1082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 13, 2025

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Mar 7, 2025

Description

This PR introduces the way to push down the limit parameter into the SQL. If the SQL has the limit clause, the smaller one of them will be used in the SQL.

Summary by CodeRabbit

  • New Features

    • Enhanced SQL query processing now uniformly enforces user-specified limits, ensuring that result sets are consistently and accurately trimmed.
    • Introduced a new function for handling SQL limit pushdown, improving robustness in SQL processing.
  • Bug Fixes

    • Improved error handling for SQL limit pushdown failures, retaining the original SQL query when necessary.
  • Tests

    • Added comprehensive test coverage to validate the improved query limit functionality across multiple scenarios.

Copy link

coderabbitai bot commented Mar 7, 2025

Walkthrough

This update introduces enhanced error handling around the pushdown_limit function in the query endpoints of both v2 and v3 routers. The new implementation ensures that if the limit pushdown fails, the original SQL query is retained. Additionally, a new utility function pushdown_limit is added to the utility module, and a corresponding method is introduced in the Rust-based session context. New tests have been implemented across both Python and Rust layers to verify the correct behavior of limit pushdown, along with additional error conversions and public exports.

Changes

File(s) Change Summary
ibis-server/app/routers/v2/connector.py
ibis-server/app/routers/v3/connector.py
Introduced error handling for pushdown_limit in the query function; modified SQL assignment to retain original SQL on failure. Updated imports in v3 to include pushdown_limit.
ibis-server/app/util.py Added a new utility function pushdown_limit that wraps a call to SessionContext.pushdown_limit with OpenTelemetry tracing.
ibis-server/tests/routers/v2/connector/test_postgres.py
ibis-server/tests/routers/v3/connector/postgres/test_query.py
Updated assertions in test_limit_pushdown to reflect expected outcomes; added a new asynchronous test function in v3 to validate SQL query limits.
wren-core-py/src/context.rs Added a public method pushdown_limit in PySessionContext to adjust SQL limits based on provided parameters.
wren-core-py/src/errors.rs Implemented From trait for CoreError to handle conversions from parser and integer parsing errors.
wren-core-py/tests/test_modeling_core.py Added a test function test_limit_pushdown to verify various scenarios for SQL limit and offset handling using the new method.
wren-core/core/src/lib.rs Added a public export for datafusion::sql::sqlparser::*, exposing SQL parser functionalities across modules.
ibis-server/app/mdl/rewriter.py Modified logging level from error to warning in handle_extract_exception method.

Suggested reviewers

  • wwwy3y3

Poem

In the code garden where changes bloom,
I hop with joy through each new room.
Limits pushed down with a gentle leap,
SQL transformed while tests do keep.
With carrots of code and bounces of cheer,
This rabbit sings a merry ode so dear.
🥕🐇 Happy coding all around!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core ibis rust Pull requests that update Rust code python Pull requests that update Python code labels Mar 7, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
wren-core-py/src/context.rs (3)

194-229: Improve error handling in the new pushdown_limit method

The implementation looks solid overall for enforcing a maximum limit on queries. However, there's an .unwrap() call on line 214 that could panic if the parsed number isn't a valid usize. Consider replacing it with proper error handling.

-                        if n.parse::<usize>().unwrap() > pushdown {
+                        if let Ok(limit_value) = n.parse::<usize>() {
+                            if limit_value > pushdown {
+                                q.limit = Some(Expr::Value(Value::Number(
+                                    pushdown.to_string(),
+                                    is.clone(),
+                                )));
+                            }
+                        } else {
+                            return Err(CoreError::new("Failed to parse limit as usize").into());
+                        }

210-227: Consider clarifying behavioral edge cases with comments

The implementation correctly maintains existing limits that are smaller than the pushdown limit, which is good. Consider adding a comment to clarify this behavior for future maintainers.

 visit_statements_mut(&mut statements, |stmt| {
     if let Statement::Query(q) = stmt {
         if let Some(limit) = &q.limit {
+            // If statement already has a limit, only replace it if it's larger than our pushdown limit
             if let Expr::Value(Value::Number(n, is)) = limit {
                 if n.parse::<usize>().unwrap() > pushdown {
                     q.limit = Some(Expr::Value(Value::Number(
                         pushdown.to_string(),
                         is.clone(),
                     )));
                 }
+                // If existing limit is smaller, preserve it
             }
         } else {
+            // If no limit exists, add one with the pushdown value
             q.limit =
                 Some(Expr::Value(Value::Number(pushdown.to_string(), false)));
         }
     }
     ControlFlow::<()>::Continue(())
 });

198-199: Enhance the method signature with more descriptive parameter names

The method signature could be clearer about what the limit parameter actually represents.

-#[pyo3(signature = (sql, limit=None))]
-pub fn pushdown_limit(&self, sql: &str, limit: Option<usize>) -> PyResult<String> {
+#[pyo3(signature = (sql, max_limit=None))]
+pub fn pushdown_limit(&self, sql: &str, max_limit: Option<usize>) -> PyResult<String> {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ea6996 and fc4fe84.

📒 Files selected for processing (9)
  • ibis-server/app/routers/v2/connector.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (2 hunks)
  • ibis-server/app/util.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
  • wren-core-py/src/context.rs (2 hunks)
  • wren-core-py/src/errors.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (1 hunks)
  • wren-core/core/src/lib.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: test
  • GitHub Check: ci
🔇 Additional comments (14)
wren-core/core/src/lib.rs (1)

7-7: Looks good - this export will support the SQL limit pushdown feature

The addition of pub use datafusion::sql::sqlparser::*; makes the SQL parsing capabilities available to other modules, which is necessary for implementing the limit pushdown functionality.

ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

343-355: Good test coverage for limit pushdown functionality

This test effectively verifies that the limit parameter (10) correctly overrides the SQL query's own limit (100). The test confirms the response contains exactly 10 rows of data, validating the pushdown_limit functionality works as expected.

wren-core-py/src/errors.rs (3)

1-1: Properly added import for ParseIntError

Good addition of the import needed for the new error conversion implementation.


58-62: Appropriate error handling for parser errors

This implementation provides proper conversion from parser errors to CoreError, maintaining consistency with the existing error handling pattern in the codebase.


64-68: Good addition of ParseIntError conversion

This error conversion is likely needed for properly handling integer parsing errors that might occur during limit value processing, following the established error handling pattern.

ibis-server/app/routers/v2/connector.py (2)

22-22: Import added for the new pushdown_limit function

The import has been properly updated to include the new utility function.


47-52:

✅ Verification successful

Correctly implemented limit pushdown in query function

The SQL query is now processed through the pushdown_limit function before being rewritten, which ensures that the limit parameter is properly applied to the SQL query. This implementation aligns with the PR objective.

I recommend adding a similar test for the v2 router as you did for v3, to ensure consistent behavior across both API versions.


🏁 Script executed:

#!/bin/bash
# Check if there's a test for limit pushdown in the v2 router tests
fd -t f "test.*\.py" ibis-server/tests/routers/v2 | xargs grep -l "test_limit_pushdown\|pushdown_limit"

Length of output: 161


Approved: Limit Pushdown Verified in v2 Router
The SQL query is correctly processed through the pushdown_limit function before being rewritten. Additionally, tests for the v2 router already include coverage for this functionality (see ibis-server/tests/routers/v2/connector/test_postgres.py), ensuring consistent behavior with the v3 endpoint.

ibis-server/app/util.py (2)

6-7: Good addition of the necessary import.

The addition of the wren_core import is appropriate for the new functionality being introduced.


104-107: Clean implementation of the limit pushdown functionality.

This function is well-structured, follows the established pattern of using the OpenTelemetry tracer, and properly delegates the actual work to the SessionContext class. The type annotations clearly indicate that it takes a SQL string and an optional limit integer, returning a modified SQL string.

ibis-server/app/routers/v3/connector.py (2)

21-21: Good import update to include the new functionality.

The import has been correctly updated to include the pushdown_limit function along with the existing imports.


41-44: Effective integration of the limit pushdown feature.

The implementation correctly applies the pushdown_limit function to the SQL query before passing it to the Rewriter, ensuring the limit is properly enforced at the SQL level. This approach is more efficient as it potentially reduces the amount of data processed by the database.

ibis-server/tests/routers/v2/connector/test_postgres.py (1)

288-301: Good test case for the limit pushdown functionality.

This test effectively verifies that the limit parameter (10) overrides the limit in the SQL query (100), confirming that the pushdown_limit function is working correctly. The test is well-structured and makes clear assertions.

wren-core-py/tests/test_modeling_core.py (1)

210-240: Comprehensive test coverage for the limit pushdown feature.

Excellent test coverage for the pushdown_limit method, covering multiple scenarios:

  1. Adding a limit to a query without one
  2. Overriding a higher limit with a lower one
  3. Preserving a lower limit when a higher one is requested
  4. Maintaining the offset when applying a limit

These tests thoroughly validate the behavior of the limit pushdown functionality across various query patterns.

wren-core-py/src/context.rs (1)

206-209:

❓ Verification inconclusive

Consider handling multiple statements more flexibly

Currently, an error is returned if there are multiple statements. Consider if a more flexible approach might be beneficial, such as applying the limit to all query statements or just the last one.


🏁 Script executed:

#!/bin/bash
# Check if this limitation is appropriate by examining how this function is used
rg "pushdown_limit" --type rust --type python -A 3 -B 3

Length of output: 86


Below is a new verification script to help us better understand how the “Only one statement is allowed” limitation is used across the repository. This search should reveal whether similar error messages (or patterns) are being used elsewhere, which can indicate if the current strict limitation is intentional or if a more flexible approach might indeed be beneficial:

Action Required: Verify Statement Limitation Handling
The previous command with the --type python flag did not produce the expected output due to an unrecognized file type. Please run the above script (or confirm its output) so we can determine whether the “one statement only” rule is consistently enforced across the codebase. Once we have that context, we can confidently rewrite the review comment.


🏁 Script executed:

#!/bin/bash
# Search for occurrences of the error message related to the limitation.
rg "Only one statement is allowed" -A 3 -B 3

Length of output: 710


Action: Reconsider SQL statement processing flexibility

The current implementation in wren-core-py/src/context.rs (lines 206–209) enforces that exactly one statement is present by returning an error with the message "Only one statement is allowed." There’s no indication elsewhere in the repository that this behavior is adjusted based on context. If future use cases require supporting multiple statements, you might consider one of the following approaches:

  • Process only the last query statement if multiple valid statements are provided.
  • Apply the statement limit specifically to query statements while allowing other valid non-query statements.

This suggestion is to evaluate whether a more flexible handling meets current or anticipated requirements.

@goldmedal goldmedal marked this pull request as draft March 7, 2025 09:30
@goldmedal goldmedal force-pushed the feat/ibis-limit-pushdown branch from fc4fe84 to 78eb956 Compare March 10, 2025 03:10
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Mar 10, 2025
@goldmedal goldmedal force-pushed the feat/ibis-limit-pushdown branch from 78eb956 to c4a8ef7 Compare March 11, 2025 07:36
@goldmedal goldmedal marked this pull request as ready for review March 11, 2025 09:07
@goldmedal goldmedal requested a review from douenergy March 11, 2025 09:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc4fe84 and c4a8ef7.

📒 Files selected for processing (11)
  • wren-core-py/src/context.rs (2 hunks)
  • wren-core-py/src/errors.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (1 hunks)
  • wren-core/core/src/lib.rs (1 hunks)
  • ibis-server/app/routers/v2/connector.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (2 hunks)
  • ibis-server/app/util.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
  • ibis-server/app/util.py (1 hunks)
  • ibis-server/app/util.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • wren-core/core/src/lib.rs
  • ibis-server/app/routers/v2/connector.py
  • ibis-server/app/util.py
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • wren-core-py/src/errors.rs
  • ibis-server/app/util.py
  • ibis-server/app/util.py
  • ibis-server/app/routers/v3/connector.py
🔇 Additional comments (2)
wren-core-py/tests/test_modeling_core.py (1)

210-240: Comprehensive test coverage for the new pushdown_limit feature!

The test covers all important scenarios:

  1. Adding a limit to a query without one
  2. Reducing an existing limit when it's higher than requested
  3. Maintaining a lower existing limit when requested limit is higher
  4. Preserving offset when a limit exists
  5. Maintaining offset when reducing a limit

One minor observation: you're instantiating SessionContext without parameters on line 211, while other test functions provide a manifest string. This appears intentional since pushdown_limit doesn't depend on the manifest, but it might be worth adding a comment to clarify this if the behavior differs from other methods.

wren-core-py/src/context.rs (1)

26-26: Good additions to the import list

These imports are necessary for the new pushdown_limit functionality.

Also applies to: 28-29

@goldmedal goldmedal marked this pull request as draft March 11, 2025 09:52
@goldmedal goldmedal marked this pull request as ready for review March 11, 2025 10:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ibis-server/app/routers/v2/connector.py (1)

48-52: Good defensive programming with graceful fallback.

This try-except block provides a robust error handling mechanism for the limit pushdown operation. If the operation fails, the system properly logs a warning and gracefully falls back to using the original SQL query, ensuring the service remains functional even when the enhancement cannot be applied.

I'd recommend including more specific error details in the log message to assist in future debugging:

- logger.warning("Failed to pushdown limit. Using original SQL: {}", e)
+ logger.warning("Failed to pushdown limit for query. Using original SQL. Error: {}", e)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4a8ef7 and d3af6ef.

📒 Files selected for processing (8)
  • ibis-server/app/routers/v2/connector.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/app/mdl/rewriter.py (1 hunks)
  • ibis-server/app/routers/v2/connector.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • ibis-server/app/routers/v2/connector.py
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • ibis-server/app/routers/v3/connector.py
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • ibis-server/app/routers/v3/connector.py
🔇 Additional comments (2)
ibis-server/app/routers/v2/connector.py (1)

5-5: Added logger for improved error handling.

The addition of the logger import is necessary for the new error handling mechanism in the query function.

ibis-server/app/mdl/rewriter.py (1)

108-108:

❓ Verification inconclusive

Changed log level from error to warning.

The logging level has been downgraded from error to warning for manifest extraction issues. This is appropriate if these exceptions are expected in normal operation and don't necessarily represent system failures.

However, ensure this aligns with your team's logging policies - if manifest extraction failures should trigger alerts or are considered critical issues, maintaining the error level might be more appropriate.

Can you verify that this change in log level is intentional and consistent with how other non-critical exceptions are handled in the codebase?


🏁 Script executed:

#!/bin/bash
# Check how other similar exceptions are logged in the codebase
rg "logger\.(error|warning|info).*exception" --type py

Length of output: 56


Log Level Adjustment for Manifest Extraction

It appears that the downgrade from error to warning for manifest extraction issues is intended for non-critical exceptions. However, our automated search for similar logger calls (using the pattern with "exception") didn’t yield any results. This means we couldn’t conclusively verify that the changed log level is consistent with how other non-critical exceptions are handled across the codebase.

Next Steps:

  • Manually verify that similar non-critical exception scenarios (if any) are logged at the warning level.
  • Confirm that manifest extraction failures are indeed expected and not indicative of critical issues per your team’s logging policies.

Please ensure that this handling aligns with your overall logging strategy.

Copy link
Contributor

@douenergy douenergy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @goldmedal ! LGTM

@douenergy douenergy merged commit e7b3343 into Canner:main Mar 13, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants