Skip to content

SQL: Stronger read-only mode #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2025
Merged

SQL: Stronger read-only mode #11

merged 2 commits into from
May 16, 2025

Conversation

amotl
Copy link
Member

@amotl amotl commented May 10, 2025

About

CodeRabbit had a remark about the previous variant of filtering non-SELECT statements and suggested to use sqlparse.

Review

This patch aims to implement a stronger approach using sqlparse, as suggested.
If you think you can come up with finding another leak, please let us know.

Note

That's a stacked PR. GH-9 will need to go in first.

Copy link

coderabbitai bot commented May 10, 2025

Walkthrough

This update introduces a robust SQL read-only enforcement mechanism using a new sql_is_permitted function, leveraging both sqlparse and a custom SQL classification dataclass. Configuration is added for permitting all statements via an environment variable. Documentation, changelog, and tests are updated to reflect and validate these changes.

Changes

File(s) Change Summary
cratedb_mcp/util/sql.py New module implementing SQL filtering logic with sql_is_permitted and SqlStatementClassifier dataclass, enforcing SELECT-only queries using parsing libraries.
cratedb_mcp/__main__.py Replaces substring-based SQL validation in query_sql with the new sql_is_permitted function for robust query checking.
cratedb_mcp/settings.py Adds PERMIT_ALL_STATEMENTS boolean config, sourced from environment variable, with a runtime warning if enabled.
pyproject.toml Adds attrs and sqlparse<0.6 as dependencies.
README.md, CHANGES.md Updates documentation to clarify and document the new read-only enforcement and configuration.
docs/backlog.md Adds a backlog document with iteration plans and records the completion of the SQL read-only enhancement.
tests/test_util.py New test module with comprehensive cases for the sql_is_permitted function, covering allowed and disallowed SQL statements, edge cases, and configuration overrides.
tests/test_mcp.py Refactors and expands tests for query_sql, separating permitted and forbidden query cases and adding bypass attempt tests.
tests/test_knowledge.py Adds trailing blank lines (whitespace only).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant query_sql (main)
    participant sql_is_permitted (util.sql)
    participant Database

    User->>query_sql: Submit SQL query
    query_sql->>sql_is_permitted: Validate SQL statement
    sql_is_permitted-->>query_sql: Permit or deny (True/False)
    alt Statement permitted
        query_sql->>Database: Execute SQL
        Database-->>query_sql: Return result
        query_sql-->>User: Return result
    else Statement denied
        query_sql-->>User: Raise ValueError (SELECT only)
    end
Loading

Possibly related issues

Poem

🐇
A rabbit hopped through SQL’s gate,
Guarding SELECT from INSERT’s fate.
With parsing keen and filters tight,
Only read-only made it right.
If you wish to write, beware the sign—
Set the flag, but heed the warning line!
Safe and sound, the queries run fine.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1aa610 and 01ddc74.

📒 Files selected for processing (10)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/settings.py (2 hunks)
  • cratedb_mcp/util/sql.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
  • tests/test_util.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • CHANGES.md
  • tests/test_knowledge.py
  • pyproject.toml
  • cratedb_mcp/main.py
  • README.md
  • docs/backlog.md
  • cratedb_mcp/settings.py
  • tests/test_mcp.py
  • tests/test_util.py
  • cratedb_mcp/util/sql.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
cratedb_mcp/settings.py (1)

7-8: Well-documented configuration parameter with sensible default.

The implementation correctly sets a secure default (only SELECT statements permitted) with the ability to override via environment variable.

However, consider adding a warning log message when PERMIT_ALL_STATEMENTS is enabled, as this could pose a security risk if accidentally enabled in production.

README.md (2)

43-49: Clear documentation of the read-only access policy.

The documentation effectively communicates the default behavior and override option to users.

Consider adjusting line 47 for better clarity:

-All other operations will raise a `ValueError` exception, unless the
+All other operations will raise a `ValueError` exception, unless
 `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable will be set
 to a truthy value.

Additionally, it would be valuable to add a security warning about the risks of enabling all statements in production environments.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...

(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)


47-47: Minor grammar issue in environment variable description.

The phrase "will be set" suggests a future action but should describe a current state.

-`CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable will be set
+`CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set
🧰 Tools
🪛 LanguageTool

[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...

(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)

cratedb_mcp/knowledge.py (1)

109-135: Well-implemented SQL validation with potential for enhancement

The implementation provides a solid foundation for enforcing read-only mode:

  • Handles edge cases like empty or None inputs
  • Properly parses SQL using a dedicated library
  • Checks for both statement type and prohibited clauses
  • Includes a configuration flag for flexibility

Consider these potential improvements:

  1. The current implementation only examines the first statement if multiple statements are provided, potentially allowing SQL injection via statement chaining:
    # Parse the SQL statement.
    parsed = sqlparse.parse(expression.strip())
    if not parsed:
        return False

+   # Check for multiple statements (potential SQL injection)
+   if len(parsed) > 1:
+       return False
+
    # Check if the expression is valid and if it's a SELECT statement,
    # also trying to consider `SELECT ... INTO ...` statements.
  1. Consider adding additional checks for potentially dangerous constructs within SELECT statements, like certain function calls or subqueries that might have side effects.

  2. The FIXME comment correctly acknowledges limitations - exploring SQLAlchemy's read-only mode could be valuable for a future implementation.

tests/test_mcp.py (1)

31-40: Consider adding more complex SQL bypass tests

While the current tests cover basic scenarios well, consider adding more sophisticated test cases to verify that the validation can't be bypassed with more complex SQL constructs.

Add tests for these additional edge cases:

  1. Statements with comments that might hide true intent
  2. Multiple statements separated by semicolons
  3. Complex subqueries within SELECT that might attempt data modification
  4. Use of custom functions that could have side effects
def test_query_sql_forbidden_multiple_statements():
    with pytest.raises(ValueError) as ex:
        query_sql("SELECT 42; INSERT INTO foo VALUES (1)")
    assert ex.match("Only queries that have a SELECT statement are allowed")

def test_query_sql_forbidden_with_comments():
    with pytest.raises(ValueError) as ex:
        query_sql("/* Sneaky comment */ INSERT /* another comment */ INTO foo VALUES (1)")
    assert ex.match("Only queries that have a SELECT statement are allowed")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5727a27 and c48f822.

📒 Files selected for processing (8)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
  • tests/test_mcp.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
cratedb_mcp/__main__.py (1)
cratedb_mcp/knowledge.py (1)
  • sql_expression_permitted (109-135)
tests/test_mcp.py (1)
cratedb_mcp/__main__.py (1)
  • query_sql (16-19)
🪛 LanguageTool
README.md

[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...

(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)

🔇 Additional comments (11)
pyproject.toml (1)

23-23: Dependencies added look appropriate for the new functionality.

The addition of attrs will support the boolean conversion utility in settings, and sqlparse<0.6 will enable the stronger SQL validation mechanism described in the PR.

Also applies to: 25-25

CHANGES.md (1)

14-14: Changelog entry is clear and concise.

The addition appropriately documents the enhancement to the read-only mode using sqlparse.

cratedb_mcp/settings.py (1)

3-4: Import looks good and is appropriately specific.

Importing only the required to_bool converter from attr.converters is a good practice.

cratedb_mcp/__main__.py (2)

4-4: LGTM - Good addition of sql_expression_permitted import

This import change correctly integrates the newly introduced SQL validation function, which will provide a more robust way to enforce read-only mode.


17-17: Excellent improvement over the previous substring check!

Replacing the naive substring check with sql_expression_permitted significantly strengthens the read-only enforcement by properly parsing SQL statements. This prevents bypasses where a malicious statement might include the string 'select' but actually perform data modifications.

tests/test_knowledge.py (2)

1-2: Appropriate imports for the new functionality

The imports have been correctly updated to include both the sql_expression_permitted function and the entire cratedb_mcp module needed for mocking.


22-49: Good test coverage for SQL validation edge cases

The test suite thoroughly covers different SQL validation scenarios including:

  • CREATE TABLE AS SELECT statements
  • INSERT statements
  • SELECT INTO statements
  • Empty inputs
  • Whitespace-only inputs
  • None inputs
  • Configuration override behavior

This provides excellent verification of the SQL validation logic.

cratedb_mcp/knowledge.py (1)

2-4: Appropriate dependencies for SQL parsing

Good choice to use the sqlparse library for robust SQL parsing rather than implementing a custom parser. The import of PERMIT_ALL_STATEMENTS also enables the configuration override.

tests/test_mcp.py (3)

27-29: Good basic test for permitted queries

This test case verifies that a simple SELECT query works as expected. The test is clear and focused.


31-35: Appropriate test for basic forbidden queries

This test case correctly verifies that a simple INSERT statement is blocked with the appropriate error message.


37-40: Excellent edge case test for potential bypass

This test case specifically checks that an INSERT statement containing the string 'select' as a value is correctly blocked. This is exactly the type of edge case that would bypass a naive substring check but is caught by the new implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_knowledge.py (1)

22-75: Consider adding tests for other SQL operation types

The test suite is comprehensive, covering many edge cases and potential vulnerabilities. To ensure complete coverage, consider adding explicit tests for other SQL statement types like UPDATE, DELETE, TRUNCATE, DROP, and ALTER to verify they're also rejected.

+ def test_sql_expression_update_rejected():
+     """UPDATE statements are rejected"""
+     assert sql_expression_permitted("UPDATE foobar SET column = 'value'") is False
+
+
+ def test_sql_expression_delete_rejected():
+     """DELETE statements are rejected"""
+     assert sql_expression_permitted("DELETE FROM foobar") is False
+
+
+ def test_sql_expression_truncate_rejected():
+     """TRUNCATE statements are rejected"""
+     assert sql_expression_permitted("TRUNCATE TABLE foobar") is False
+
+
+ def test_sql_expression_drop_rejected():
+     """DROP statements are rejected"""
+     assert sql_expression_permitted("DROP TABLE foobar") is False
+
+
+ def test_sql_expression_alter_rejected():
+     """ALTER statements are rejected"""
+     assert sql_expression_permitted("ALTER TABLE foobar ADD COLUMN newcol INTEGER") is False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48f822 and e8529ff.

📒 Files selected for processing (9)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
  • tests/test_mcp.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/backlog.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • CHANGES.md
  • README.md
  • pyproject.toml
  • tests/test_mcp.py
  • cratedb_mcp/settings.py
  • cratedb_mcp/main.py
  • cratedb_mcp/knowledge.py
🔇 Additional comments (11)
tests/test_knowledge.py (11)

1-2: Import changes look good

The import statements have been appropriately updated to include the entire cratedb_mcp module and the new sql_expression_permitted function, which aligns with the PR objective of implementing a stronger read-only mode validation.


22-25: Positive test case looks good

Good baseline test that verifies simple SELECT statements are permitted as expected.


27-30: Security test for multiple statements looks good

This test appropriately verifies that multiple SQL statements, even if they're all SELECT statements, are rejected. This is an important security measure to prevent statement chaining.


32-35: DDL statement rejection test looks good

This test verifies that Data Definition Language statements like CREATE TABLE are rejected, even when they include a legitimate SELECT statement. This is crucial for maintaining the read-only constraint.


37-40: DML statement rejection test looks good

This test ensures Data Manipulation Language statements like INSERT are properly rejected, which is a core requirement for the read-only mode.


42-45: SELECT INTO rejection test looks good

This test verifies that SELECT INTO statements, which can modify the database, are rejected. This is an important edge case to cover as it combines a permitted operation (SELECT) with an operation that modifies data.


47-55: Empty and whitespace statement tests look good

These tests verify that empty or whitespace-only statements are rejected, which is a good practice to avoid potential edge cases or unexpected behavior.


57-60: None input test looks good

This test verifies that None values are properly handled and rejected, which prevents potential issues with null inputs.


62-66: Overriding flag test is well implemented

This test correctly verifies that when the PERMIT_ALL_STATEMENTS flag is set to True, non-SELECT statements like INSERT are permitted. Using the mocker to patch the global flag is an appropriate approach.


68-70: Mixed statement test looks good

This test verifies that mixing legitimate SELECT statements with non-SELECT statements is properly rejected, which is an important security check.


72-74: SQL comment test is important and well implemented

This test ensures that SQL comments containing disallowed commands are properly rejected, which is an important security measure to prevent SQL injection attacks that try to hide malicious commands in comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_knowledge.py (1)

22-99: Consider adding more edge case tests.

Your test suite is already quite thorough, but you might consider adding tests for:

  1. Case-insensitive statements (e.g., "select" vs "SELECT")
  2. Nested queries or complex SELECT statements with JOIN, GROUP BY, etc.
  3. SQL injection patterns like string concatenation or comment termination
  4. Statements with whitespace before the command (e.g., " SELECT")
  5. Statements with Unicode characters or escapes
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8529ff and c9ece5c.

📒 Files selected for processing (9)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • CHANGES.md
  • docs/backlog.md
  • pyproject.toml
  • README.md
  • cratedb_mcp/main.py
  • cratedb_mcp/knowledge.py
  • cratedb_mcp/settings.py
  • tests/test_mcp.py
🔇 Additional comments (11)
tests/test_knowledge.py (11)

1-2: Good use of module import for mocking purposes.

The import of the entire cratedb_mcp module alongside the explicit function import is necessary for the patching used in test_sql_expression_insert_allowed. This approach follows best practices for proper test isolation.


22-25: LGTM! Basic SELECT test case.

This test correctly verifies the core functionality of allowing simple SELECT statements.


27-31: Good test for environment variable override.

This test effectively verifies the override functionality when PERMIT_ALL_STATEMENTS is set to True. The use of mocker.patch.object is appropriate for temporarily modifying the module-level constant during testing.


33-36: Multiple statements test looks good.

Properly verifies that multiple statements separated by semicolons are rejected, which is an important security consideration.


38-41: DDL rejection test is appropriate.

This test correctly verifies that CREATE TABLE statements are rejected, preventing schema modifications.


43-46: DML rejection test works well.

Properly verifies that INSERT statements are rejected in the default configuration.


48-51: Good test for SELECT INTO.

This test catches a potential bypass where SELECT INTO could be used to write data despite the read-only intention.


53-56: Edge cases are well covered.

The tests for empty, whitespace-only, and None inputs ensure robustness against unexpected input formats.

Also applies to: 58-61, 63-66


68-70: Mixed statements test is important.

Good coverage for mixed statements that could potentially bypass restrictions.


72-75: SQL comment injection test is valuable.

This test verifies protection against attempts to hide disallowed commands within SQL comments.


77-80: Comprehensive coverage of SQL command types.

These tests thoroughly verify that all types of potentially harmful SQL commands (UPDATE, DELETE, TRUNCATE, DROP, ALTER) are rejected.

Also applies to: 82-85, 87-90, 92-95, 97-99

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_knowledge.py (1)

75-82: Strong protection against obfuscation techniques.

These tests verify that common SQL obfuscation techniques (using multiple statements or embedding comments) are properly detected and rejected, which is critical for security.

Consider adding a few more tests for advanced SQL injection techniques such as:

def test_sql_expression_case_manipulation_rejected():
    """Statements with case manipulation to hide intent are rejected"""
    assert sql_expression_permitted("SeLeCt * FrOm users; DrOp TaBlE users") is False

def test_sql_expression_unicode_evasion_rejected():
    """Statements with unicode characters to evade filters are rejected"""
    assert sql_expression_permitted("SELECT * FROM users; \uFF1B DROP TABLE users") is False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ece5c and ba46eb2.

📒 Files selected for processing (9)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • CHANGES.md
  • docs/backlog.md
  • pyproject.toml
  • cratedb_mcp/main.py
  • README.md
  • cratedb_mcp/settings.py
  • cratedb_mcp/knowledge.py
  • tests/test_mcp.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_knowledge.py (1)
cratedb_mcp/knowledge.py (2)
  • Queries (11-91)
  • sql_expression_permitted (113-129)
🔇 Additional comments (9)
tests/test_knowledge.py (9)

1-2: Appropriate imports for the new functionality.

The updated imports correctly bring in the entire cratedb_mcp module and specifically import the new sql_expression_permitted function from cratedb_mcp.knowledge, which will be used in the new tests.


22-27: Good test coverage for valid SELECT statements.

These tests properly verify that standard SELECT statements are permitted, including variations with leading whitespace and different case formats. This ensures the core functionality of allowing read-only queries works as expected.


29-32: Good security check for SQL injection technique.

This test verifies that the system rejects a common SQL injection technique (using comments with escaped semicolons). This is essential for preventing bypass attempts.


34-38: Well-implemented override test.

The test correctly uses mocking to verify the behavior when PERMIT_ALL_STATEMENTS is enabled, showing that non-SELECT statements like INSERT can be allowed when explicitly configured. This flexibility is important for special cases while maintaining security by default.


40-43: Important test for multi-statement protection.

Testing that multiple statements (even if they're all SELECT) are rejected is crucial for preventing batch attacks where a malicious user might try to append a harmful statement after a legitimate one.


45-48: Comprehensive DDL protection tests.

These tests verify that all major DDL (Data Definition Language) statements are rejected, which is essential for preventing schema modifications in read-only mode.

Also applies to: 99-102, 104-107


50-53: Comprehensive DML protection tests.

These tests thoroughly check that all major DML (Data Manipulation Language) statements (INSERT, UPDATE, DELETE, TRUNCATE) are rejected, which is critical for ensuring data cannot be modified in read-only mode.

Also applies to: 84-92, 94-97


55-58: Good catch for SELECT INTO statements.

Testing rejection of SELECT INTO statements is important as these can create new tables or modify existing data, despite starting with SELECT. This shows attention to detail in identifying potential bypass methods.


60-73: Good edge case handling for empty inputs.

Testing empty strings, whitespace-only strings, and None values ensures the validation function handles these edge cases properly without errors, improving robustness.

@amotl amotl force-pushed the stronger-read-only branch from ba46eb2 to d80df8d Compare May 10, 2025 12:17
@amotl amotl linked an issue May 10, 2025 that may be closed by this pull request
@amotl amotl requested review from kneth and surister May 10, 2025 12:21
@amotl amotl marked this pull request as ready for review May 10, 2025 12:21
return True

# Parse the SQL statement.
parsed = sqlparse.parse(expression.strip())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use cratedb-sqlparse?

Copy link
Member Author

@amotl amotl May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let's explore it together with a subsequent patch? After giving it a quick try, I haven't been able to remove the dependency to classic sqlparse just yet, but didn't want to block this patch.

Base automatically changed from boilerplate to main May 10, 2025 17:39
@amotl amotl force-pushed the stronger-read-only branch from d80df8d to ee32821 Compare May 10, 2025 19:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_knowledge.py (2)

40-43: Consider consolidating redundant test cases

There appears to be redundancy between test_sql_expression_multiple_statements_rejected and test_sql_expression_select_multiple_rejected. Both verify that multiple statements are rejected, just with different SQL content.

You could consolidate these into a single test case with multiple assertions or make their purposes more distinct:

 def test_sql_expression_select_multiple_rejected():
-    """Multiple SQL statements are rejected"""
+    """Multiple SELECT statements are rejected"""
     assert sql_expression_permitted("SELECT 42; SELECT 42;") is False


 def test_sql_expression_multiple_statements_rejected():
+    """Mixed statement types (SELECT + DML) are rejected"""
     assert sql_expression_permitted("SELECT 42; INSERT INTO foo VALUES (1)") is False

Also applies to: 76-78


22-118: Consider additional test cases for complex SQL constructs

The test suite is already comprehensive, but to further strengthen it against potential filtering bypasses, consider adding tests for more complex SQL constructs like:

  1. Common Table Expressions (CTEs)
  2. Nested subqueries
  3. UNION/INTERSECT/EXCEPT operations
  4. Window functions
  5. Vendor-specific syntax extensions

These could potentially be used in SQL injection attempts to bypass filtering.

def test_sql_expression_with_cte_permitted():
    """SELECT with CTE is permitted"""
    assert sql_expression_permitted(
        "WITH nums AS (SELECT 1) SELECT * FROM nums"
    ) is True

def test_sql_expression_with_union_permitted():
    """SELECT with UNION is permitted"""
    assert sql_expression_permitted(
        "SELECT 1 UNION SELECT 2"
    ) is True

def test_sql_expression_with_cte_dml_rejected():
    """CTE with DML is rejected"""
    assert sql_expression_permitted(
        "WITH nums AS (SELECT 1) INSERT INTO foo SELECT * FROM nums"
    ) is False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba46eb2 and ee32821.

📒 Files selected for processing (9)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/knowledge.py (2 hunks)
  • cratedb_mcp/settings.py (2 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
  • tests/test_mcp.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • CHANGES.md
  • docs/backlog.md
  • README.md
  • cratedb_mcp/main.py
  • pyproject.toml
  • cratedb_mcp/settings.py
  • tests/test_mcp.py
  • cratedb_mcp/knowledge.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_knowledge.py (1)
cratedb_mcp/knowledge.py (2)
  • Queries (12-92)
  • sql_expression_permitted (114-130)
🔇 Additional comments (6)
tests/test_knowledge.py (6)

1-2: Appropriate import structure for test mocking

Good approach importing both the full module and specific functions, as this allows for proper mocking in your tests while maintaining clean function calls.


22-27: Thorough basic SELECT statement testing

Good coverage of different variations of valid SELECT statements, including whitespace and case variations.


34-38: Effective environment variable override testing

Good test demonstrating that the PERMIT_ALL_STATEMENTS flag correctly bypasses the filtering when enabled. This confirms the override mechanism works as intended.


80-83: Good security testing for comment-based evasion

Testing for SQL injection attempts via comments is critical for security. This test ensures that attackers can't use comments to disguise malicious statements.


110-118: Strong security tests for bypass attempts

Excellent tests for case manipulation and Unicode character evasion. These are sophisticated SQL injection techniques that could potentially bypass naive filtering implementations.


1-118: Overall excellent test coverage for SQL filtering

The test suite is thorough and well-structured, covering a wide range of scenarios including valid queries, invalid operations, edge cases, and security bypass attempts. The tests will help ensure that only read-only operations are permitted while properly blocking all attempts at modification.

@amotl amotl requested a review from surister May 10, 2025 19:17
@amotl amotl force-pushed the stronger-read-only branch from ee32821 to 0a12741 Compare May 10, 2025 20:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_knowledge.py (3)

29-31: Good validation of malformed SELECT statements.

This test correctly verifies that commented-out or malformed SELECT statements are rejected. Consider adding a few more examples of malformed statements for more comprehensive coverage.

def test_sql_expression_select_rejected():
    """Bogus SQL SELECT statements are rejected"""
    assert sql_expression_permitted(r"--\; select 42") is False
+   assert sql_expression_permitted(r"SEL ECT 42") is False
+   assert sql_expression_permitted(r"SELECT42") is False

22-117: Consider adding parameterized tests to reduce duplication.

While the test coverage is comprehensive, consider using pytest's parameterized testing to reduce code duplication and make it easier to add more test cases in the future.

import pytest

@pytest.mark.parametrize(
    "statement,expected",
    [
        ("SELECT 42", True),
        (" SELECT 42", True),
        ("select 42", True),
        (r"--\; select 42", False),
        ("SELECT 42; SELECT 42;", False),
        ("CREATE TABLE foobar AS SELECT 42", False),
        # Add more cases here
    ],
)
def test_sql_expression_permitted_cases(statement, expected):
    """Test various SQL statements against permission rules"""
    assert sql_expression_permitted(statement) is expected

This approach would maintain the same test coverage while making the code more maintainable.


22-117: Consider testing additional SQL injection techniques.

While the current tests cover many scenarios, consider adding tests for these additional SQL injection techniques:

  • Inline comments within SQL keywords (SEL/**/ECT)
  • Stacked queries with alternate delimiters
  • UNION-based injection attempts
def test_sql_expression_inline_comments_rejected():
    """Statements with inline comments to hide intent are rejected"""
    assert sql_expression_permitted("SEL/**/ECT * FROM users") is False
    
def test_sql_expression_union_injection_rejected():
    """UNION-based injection attempts are rejected"""
    assert sql_expression_permitted("SELECT 1 UNION SELECT * FROM users") is False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee32821 and 0a12741.

📒 Files selected for processing (3)
  • cratedb_mcp/knowledge.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • cratedb_mcp/knowledge.py
🔇 Additional comments (8)
tests/test_knowledge.py (8)

1-2: Sensible import approach for testing.

The import of the entire cratedb_mcp module alongside specific imports enables the proper mocking of the PERMIT_ALL_STATEMENTS flag in the test. This is a good practice for testing.


22-27: Well-structured test for permitted SELECT statements.

The test function properly verifies that standard SELECT statements are permitted in different formats (with leading whitespace and case variations). The docstring clearly explains the purpose of the test.


34-37: Effective testing of permission override.

The test correctly verifies that when PERMIT_ALL_STATEMENTS is set to True, non-SELECT statements like INSERT are permitted. This is an important test for the configuration flag mentioned in the PR.


40-42: Appropriate rejection of multiple statements.

This test correctly verifies that multiple SQL statements, even if they're all SELECTs, are rejected. This is an important security measure to prevent statement chaining attacks.


45-107: Comprehensive testing of rejected statement types.

The tests effectively verify that various SQL statement types are properly rejected:

  • DDL statements (CREATE, DROP, ALTER, TRUNCATE)
  • DML statements (INSERT, UPDATE, DELETE)
  • Combination statements (SELECT INTO)

This thorough coverage ensures that the read-only enforcement works as expected for all common SQL operations.


61-73: Good validation of edge cases for input.

The tests properly validate edge cases:

  • Empty strings
  • Whitespace-only strings
  • None values

This ensures robustness in handling various input scenarios.


76-82: Strong defense against SQL injection techniques.

These tests verify protection against common SQL injection techniques:

  • Multiple statements separated by semicolons
  • Hidden statements within comments

This is crucial for maintaining security in the SQL validation.


110-117: Excellent protection against evasion techniques.

The tests verify that the system can detect and reject sophisticated evasion attempts:

  • Case manipulation to hide intentions
  • Unicode character substitution to bypass filters

These tests demonstrate a strong security mindset in the implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
cratedb_mcp/util/sql.py (1)

1-141: 💡 Verification agent

🧩 Analysis chain

Consider adding tests for SQL statements with embedded Unicode control characters.

While you check for Unicode semicolons, there are other Unicode characters that could be used to evade filters, such as zero-width spaces, bidirectional overrides, etc.

Run this script to check for additional evasion techniques:


🏁 Script executed:

#!/bin/bash
# Description: Test SQL filtering against additional Unicode evasion techniques

# Get current Unicode evasion test case
echo "Current test:"
grep -n "unicode_evasion" tests/test_util.py -A 3

# Test script with additional Unicode evasion test cases
cat << 'EOF' > /tmp/test_more_evasions.py
from cratedb_mcp.util.sql import sql_is_permitted

# Bidirectional override characters
print("Testing RIGHT-TO-LEFT OVERRIDE (U+202E):")
print(sql_is_permitted("SELECT * FROM users \u202E ELBAT PORD -- "))

# Zero-width spaces
print("Testing ZERO WIDTH SPACE (U+200B):")
print(sql_is_permitted("SELECT * FROM users;\u200BDROP TABLE users"))

# Line separator
print("Testing LINE SEPARATOR (U+2028):")
print(sql_is_permitted("SELECT * FROM users\u2028DROP TABLE users"))

# Paragraph separator
print("Testing PARAGRAPH SEPARATOR (U+2029):")
print(sql_is_permitted("SELECT * FROM users\u2029DROP TABLE users"))
EOF

# Run test script
python /tmp/test_more_evasions.py

Length of output: 547


🏁 Script executed:

#!/bin/bash
# Re-run Unicode evasion tests with the repository on PYTHONPATH
export PYTHONPATH="$(pwd)"

python - << 'EOF'
from cratedb_mcp.util.sql import sql_is_permitted

cases = {
    "RIGHT-TO-LEFT OVERRIDE (U+202E)":      "SELECT * FROM users \u202E ELBAT PORD -- ",
    "ZERO WIDTH SPACE (U+200B)":            "SELECT * FROM users;\u200BDROP TABLE users",
    "LINE SEPARATOR (U+2028)":              "SELECT * FROM users\u2028DROP TABLE users",
    "PARAGRAPH SEPARATOR (U+2029)":         "SELECT * FROM users\u2029DROP TABLE users",
}

for name, sql in cases.items():
    result = sql_is_permitted(sql)
    print(f"{name}: permitted? {result}")
EOF

Length of output: 318


Add tests for additional Unicode control‐character evasion cases

The current suite only covers the fullwidth semicolon (\uFF1B), but SQL filters can be bypassed with other invisible or bidirectional characters. Please extend tests/test_util.py with cases for:

  • RIGHT-TO-LEFT OVERRIDE (U+202E)
  • ZERO WIDTH SPACE (U+200B)
  • LINE SEPARATOR (U+2028)
  • PARAGRAPH SEPARATOR (U+2029)

For example:

import pytest
from cratedb_mcp.util.sql import sql_is_permitted

@pytest.mark.parametrize("label,expr", [
    ("U+202E RTL OVERRIDE",      "SELECT * FROM users \u202E ELBAT PORD -- "),
    ("U+200B ZWSP",              "SELECT * FROM users;\u200BDROP TABLE users"),
    ("U+2028 LINE SEP",          "SELECT * FROM users\u2028DROP TABLE users"),
    ("U+2029 PARAGRAPH SEP",     "SELECT * FROM users\u2029DROP TABLE users"),
])
def test_sql_expression_unicode_evasion_rejected(label, expr):
    """Ensure statements with {} are rejected""".format(label)
    assert sql_is_permitted(expr) is False

This will help ensure the filter catches all known Unicode‐based evasion techniques.

🧹 Nitpick comments (5)
cratedb_mcp/util/sql.py (5)

13-30: Add type validation for the expression parameter.

The function is well-documented, but it doesn't validate the type of the expression parameter before using it, which could lead to unexpected behavior with non-string inputs.

def sql_is_permitted(expression: str) -> bool:
    """
    Validate the SQL expression, only permit read queries by default.

    When the `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set,
    allow all types of statements.

    FIXME: Revisit implementation, it might be too naive or weak.
           Issue:    https://github.com/crate/cratedb-mcp/issues/10
           Question: Does SQLAlchemy provide a solid read-only mode, or any other library?
    """
+   # Ensure expression is a string
+   if expression is None or not isinstance(expression, str):
+       logger.warning(f"Denied non-string SQL expression: {expression}")
+       return False
    is_dql = SqlFilter(expression=expression, permit_all=PERMIT_ALL_STATEMENTS).is_dql
    if is_dql is True:
        logger.info(f"Permitted SQL expression: {expression}")
    else:
        logger.warning(f"Denied SQL expression: {expression}")
    return is_dql

32-47: Add error handling in the post-initialization method.

The __post_init__ method assumes self.expression is a string if it's truthy, which could lead to an AttributeError if a non-string object is passed.

def __post_init__(self) -> None:
-   if self.expression:
+   if self.expression and isinstance(self.expression, str):
        self.expression = self.expression.strip()
+   elif self.expression and not isinstance(self.expression, str):
+       # Handle non-string inputs by converting to string
+       self.expression = str(self.expression).strip()

48-63: Add error handling for SQL parsing failures.

The parsing methods don't include error handling for failures, which could occur if the expression is not valid SQL or if the libraries raise exceptions.

def parse_cratedb(self):
    """
    Parse expression using `cratedb-sqlparse` library.
    """
    if self._parsed_cratedb is None:
-       self._parsed_cratedb = cratedb_sqlparse.sqlparse(self.expression)
+       try:
+           self._parsed_cratedb = cratedb_sqlparse.sqlparse(self.expression)
+       except Exception as e:
+           # If parsing fails, return an empty list to indicate invalid SQL
+           logger.warning(f"Failed to parse SQL with cratedb_sqlparse: {e}")
+           self._parsed_cratedb = []
    return self._parsed_cratedb

def parse_sqlparse(self):
    """
    Parse expression using traditional `sqlparse` library.
    """
    if self._parsed_sqlparse is None:
-       self._parsed_sqlparse = sqlparse.parse(self.expression)
+       try:
+           self._parsed_sqlparse = sqlparse.parse(self.expression)
+       except Exception as e:
+           # If parsing fails, return an empty list to indicate invalid SQL
+           logger.warning(f"Failed to parse SQL with sqlparse: {e}")
+           self._parsed_sqlparse = []
    return self._parsed_sqlparse

88-101: Add defensive check before accessing parsed SQL data.

The operation property assumes parsing succeeded and returned at least one statement, which could lead to an IndexError if parsing failed or returned an empty list.

@property
def operation(self) -> str:
    """
    The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc.
    """
-   return self._parsed_cratedb[0].type.upper()
+   parsed = self.parse_cratedb()
+   if not parsed:
+       return ""  # Return empty string for invalid SQL
+   return parsed[0].type.upper()

109-141: Add defensive checks in is_select_into and is_evasive properties.

Both properties make assumptions about the structure of the parsed SQL, which could lead to IndexErrors if parsing fails or returns unexpected results. Also, the approach for detecting SELECT INTO statements could be improved.

@property
def is_select_into(self) -> bool:
    """
    Use traditional `sqlparse` for catching `SELECT ... INTO ...` statements.

    Note: With `cratedb-sqlparse`, we can't find the `INTO` token at all?

    Examples:

        SELECT * INTO foobar FROM bazqux
        SELECT * FROM bazqux INTO foobar
    """
    parsed = self.parse_sqlparse()
+   if not parsed:
+       return False
    tokens = [str(item).upper() for item in parsed[0]]
    return "INTO" in tokens

@property
def is_evasive(self) -> bool:
    """
    Use traditional `sqlparse` for catching evasive SQL statements.

    A practice picked up from CodeRabbit was to reject multiple statements
    to prevent potential SQL injections. Is it a viable suggestion?

    Note: This currently does not work with `cratedb-sqlparse`?

    Examples:

        SELECT * FROM users; \uff1b DROP TABLE users
    """
    parsed = self.parse_sqlparse()
    return len(parsed) > 1
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a12741 and f99eca6.

📒 Files selected for processing (6)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/util/sql.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_util.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • pyproject.toml
  • docs/backlog.md
  • cratedb_mcp/main.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_util.py (1)
cratedb_mcp/util/sql.py (1)
  • sql_is_permitted (13-29)
🔇 Additional comments (13)
tests/test_util.py (10)

1-3: Imports look good.

The imports are properly organized, importing the necessary modules and functions for testing the sql_is_permitted function.


5-10: Well-structured test for basic SELECT statements.

Good comprehensive test coverage for basic SELECT statements, including variations with whitespace and case sensitivity.


12-15: Good security test for bogus SELECT statements.

This test properly checks that a statement with a comment and an escaped semicolon (which could be an attempt to bypass filtering) is rejected.


17-21: Good test for override behavior.

Effectively tests that when PERMIT_ALL_STATEMENTS is set to True, non-SELECT statements are allowed. Using the mocker to patch the variable is the right approach.


23-26: Strong security check for multiple statements.

Properly verifies that multiple SQL statements separated by semicolons are rejected, even if they are all SELECT statements - this is crucial for preventing SQL injection.


28-42: Comprehensive tests for DDL, DML, and SELECT INTO statements.

These tests thoroughly verify that:

  1. DDL statements like CREATE TABLE are rejected
  2. DML statements like INSERT are rejected
  3. SELECT INTO statements (both syntax variations), which are a form of DML disguised as SELECT, are rejected

This provides good coverage for ensuring read-only mode and preventing data modification.


44-57: Good edge case handling.

These tests properly handle edge cases including:

  • Empty statements
  • Whitespace-only statements
  • None input

This ensures the filtering is robust against various invalid inputs.


59-66: Strong security checks for evasive techniques.

These tests effectively verify that potentially malicious SQL injection techniques are caught:

  • Multiple statements with different types (SELECT and INSERT)
  • Statements with comments attempting to hide malicious SQL

This helps protect against SQL injection attempts.


68-91: Comprehensive DML and DDL rejection tests.

Good coverage of various SQL command types that should be rejected:

  • UPDATE statements
  • DELETE statements
  • TRUNCATE statements
  • DROP statements
  • ALTER statements

This ensures that all types of data modification and schema changes are blocked in read-only mode.


93-101: Excellent security checks for evasion techniques.

These tests verify protection against sophisticated SQL injection attempts:

  1. Case manipulation to hide intent (e.g., "SeLeCt * FrOm users; DrOp TaBlE users")
  2. Unicode character evasion (using Unicode semicolon "\uFF1B" instead of regular semicolon)

These are advanced security checks that show attention to detail in preventing SQL injection.

cratedb_mcp/util/sql.py (3)

1-11: Proper imports and setup.

The imports are organized well, including standard library modules, third-party libraries, and project-specific imports. Setting up logging is also a good practice.


64-87: The is_dql property looks good.

The property has strong logic for determining if an SQL expression is a read-only operation. It properly handles empty inputs, the override flag, and multiple statements, while checking for camouflaged operations.


102-108: The is_camouflage property is well defined.

Good combination of checks to detect innocent-looking SELECT statements that might be trying to evade filters.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
cratedb_mcp/util/sql.py (3)

56-78: Possible redundancy in is_dql property

The is_dql property has solid logic, but there's redundancy in checking for multiple statements.

Line 72-73 checks if len(parsed) > 1, but this same check is also implemented in the is_evasive property. Consider refactoring to use self.is_evasive instead of duplicating the logic:

-        # Reject multiple statements to prevent potential SQL injections.
-        if len(parsed) > 1:
-            return False
+        # Reject multiple statements to prevent potential SQL injections.
+        if self.is_evasive:
+            return False

Also, note that line 68 references cratedb-sqlparse in the comment but actually uses sqlparse. Consider updating the comment for clarity.


101-115: Consider more robust INTO detection

The is_select_into property could be susceptible to false positives if "INTO" appears in string literals.

Consider a more syntax-aware approach for detecting "INTO" clauses:

-        tokens = [str(token).upper() for token in parsed[0].tokens]
-        return "INTO" in tokens
+        # More robust approach to detect INTO not within string literals
+        for token in parsed[0].tokens:
+            if token.ttype in (sqlparse.tokens.Keyword, sqlparse.tokens.Keyword.DML) and str(token).upper() == "INTO":
+                return True
+        return False

This would check that the token is actually a keyword, not just a string containing "INTO".


117-132: Consider removing redundancy with is_dql

The is_evasive property duplicates logic that's already in the is_dql method.

To avoid maintenance issues if one implementation changes but not the other, consider refactoring to make one method call the other:

-        parsed = self.parse_sqlparse()
-        return len(parsed) > 1
+        return len(self.parse_sqlparse()) > 1

Also, since this check is already performed in is_dql, consider restructuring to avoid duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f99eca6 and 7e68755.

📒 Files selected for processing (10)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/settings.py (2 hunks)
  • cratedb_mcp/util/sql.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
  • tests/test_util.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • CHANGES.md
  • pyproject.toml
  • README.md
  • docs/backlog.md
  • cratedb_mcp/main.py
  • cratedb_mcp/settings.py
  • tests/test_util.py
  • tests/test_mcp.py
🔇 Additional comments (5)
cratedb_mcp/util/sql.py (5)

31-47: Well-structured dataclass with clear purpose

The SqlStatementClassifier is well-designed using dataclasses, with good documentation explaining its purpose. The __post_init__ method appropriately handles whitespace.


48-54: Good implementation of caching for parsed SQL

The parse_sqlparse method efficiently caches parsing results to avoid redundant work.


80-92: Clear and concise property implementations

The is_select and operation properties are well-implemented and clearly named.


94-99: Good combination of security checks

The is_camouflage property effectively combines multiple security checks into a single property.


12-28:

✅ Verification successful

Great enforcement with logging! Consider adding test cases for edge cases.

The sql_is_permitted function implements a clean validation approach with appropriate logging. The FIXME comment acknowledges potential limitations, which is good for future improvements.

Consider adding test cases for edge cases such as:


🏁 Script executed:

#!/bin/bash
# Check if the codebase already has test cases for these edge cases
rg -A 2 -B 2 "test.*sql" --type py

Length of output: 5312


Edge case test coverage confirmed

The tests/test_util.py suite already includes comprehensive tests for all the edge cases listed (empty/None statements, multiple statements, comments, unicode evasion, DDL/DML types, case manipulation, etc.). No additional tests are needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
cratedb_mcp/util/sql.py (6)

12-29: Review logging of full SQL expressions for potential security issues

While the function is well-structured, logging the complete SQL expression might inadvertently expose sensitive data (e.g., data in WHERE clauses).

Consider redacting or truncating the SQL in the logs:

-        logger.info(f"Permitted SQL expression: {expression}")
+        logger.info(f"Permitted SQL expression: {expression[:50]}...")
-        logger.warning(f"Denied SQL expression: {expression}")
+        logger.warning(f"Denied SQL expression: {expression[:50]}...")

Also, the is_dql is True check is redundant:

-    if is_dql is True:
+    if is_dql:

31-46: Consider adding null check for expression in class constructor

The __post_init__ method strips whitespace if expression is truthy, but there's no validation that expression isn't None.

    def __post_init__(self) -> None:
+        if self.expression is None:
+            self.expression = ""
        if self.expression:
            self.expression = self.expression.strip()

Also, consider using a more specific type hint for _parsed_sqlparse instead of t.Any.


48-54: Add return type annotation and error handling to parse_sqlparse method

The method lacks a return type annotation and could fail if self.expression is None.

-def parse_sqlparse(self):
+def parse_sqlparse(self) -> t.List[sqlparse.sql.Statement]:
    """
    Parse expression using traditional `sqlparse` library.
    """
    if self._parsed_sqlparse is None:
+        if not self.expression:
+            return []
        self._parsed_sqlparse = sqlparse.parse(self.expression)
    return self._parsed_sqlparse

56-79: Incorrect comment in is_dql about cratedb-sqlparse

The comment "Parse the SQL statement using cratedb-sqlparse" is inaccurate since the code is using the standard sqlparse library.

-    # Parse the SQL statement using `cratedb-sqlparse`.
+    # Parse the SQL statement using the sqlparse library.

Also, there's a duplicate check for multiple statements here and in is_evasive, which could be confusing for maintainers.


117-133: Duplicate check for multiple statements

The is_evasive property duplicates the check already performed in is_dql.

To avoid redundancy and improve maintainability, consider refactoring to use is_evasive in is_dql without repeating the logic:

@property
def is_dql(self) -> bool:
    """
    Whether the statement is a DQL statement, which effectively invokes read-only operations only.
    """

    if not self.expression:
        return False

    if self.permit_all:
        return True

    # Parse the SQL statement using the sqlparse library.
    parsed = self.parse_sqlparse()

-    # Reject multiple statements to prevent potential SQL injections.
-    if len(parsed) > 1:
-        return False

    # Check if the expression is valid and if it's a DQL/SELECT statement,
    # also trying to consider `SELECT ... INTO ...` and evasive
    # `SELECT * FROM users; \uff1b DROP TABLE users` statements.
    return self.is_select and not self.is_camouflage

Additionally, the docstring mentions cratedb-sqlparse but the code uses standard sqlparse.


1-133: Enhance security-critical module with additional safeguards

This is an excellent start for a security-critical filter, but consider these additional protections:

  1. Handle malicious SQL encodings (hex, comments, etc.)
  2. Add specific checks for common SQL injection patterns
  3. Consider implementing a deny list of dangerous keywords
  4. Add unit tests with comprehensive edge cases

The current implementation mainly focuses on DQL vs. non-DQL distinction. For enhanced security, consider a more comprehensive approach with a proper SQL parser or query executor that runs in a sandbox with appropriate permissions.

For maximum security, you might want to explore ORM-based solutions or database view-based approaches instead of SQL filtering.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e68755 and be09e1e.

📒 Files selected for processing (10)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/settings.py (2 hunks)
  • cratedb_mcp/util/sql.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
  • tests/test_util.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • CHANGES.md
  • docs/backlog.md
  • pyproject.toml
  • cratedb_mcp/main.py
  • README.md
  • cratedb_mcp/settings.py
  • tests/test_mcp.py
  • tests/test_util.py
🔇 Additional comments (1)
cratedb_mcp/util/sql.py (1)

1-10: Strong module for SQL validation, clean imports

The module has well-organized imports and proper logger setup, which establishes a good foundation for this security-focused feature.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
cratedb_mcp/util/sql.py (1)

83-89: ⚠️ Potential issue

Add error handling to the operation property.

The operation property accesses parsed[0] without checking if the parsed result contains any elements, which could cause an IndexError.

@property
def operation(self) -> str:
    """
    The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc.
    """
    parsed = self.parse_sqlparse()
+    if not parsed:
+        return ""
    return parsed[0].get_type().upper()
🧹 Nitpick comments (3)
cratedb_mcp/util/sql.py (3)

1-11: Import structure looks good but consider organizing them alphabetically.

The imports are well-organized by category (standard library, third-party, and local imports), but within each category, they could be sorted alphabetically for better maintainability.


32-50: Consider validating permit_all flag in post_init

The class handles None for expression but doesn't validate the permit_all flag. While it's unlikely to be None since it has a default value, being thorough with input validation is good practice.

def __post_init__(self) -> None:
    if self.expression is None:
        self.expression = ""
    if self.expression:
        self.expression = self.expression.strip()
+    # Ensure permit_all is a boolean
+    self.permit_all = bool(self.permit_all)

59-74: Consider adding explicit validation of SQL statement structure

The is_dql property checks if the statement is SELECT and not camouflage, but it doesn't validate that the SQL structure is correct. Consider adding a check that the parsed statement is valid SQL.

@property
def is_dql(self) -> bool:
    """
    Whether the statement is a DQL statement, which effectively invokes read-only operations only.
    """

    if not self.expression:
        return False

    if self.permit_all:
        return True

+    # Ensure we have a valid SQL statement
+    parsed = self.parse_sqlparse()
+    if not parsed:
+        return False
+
    # Check if the expression is valid and if it's a DQL/SELECT statement,
    # also trying to consider `SELECT ... INTO ...` and evasive
    # `SELECT * FROM users; \uff1b DROP TABLE users` statements.
    return self.is_select and not self.is_camouflage
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be09e1e and 4dadcbc.

📒 Files selected for processing (10)
  • CHANGES.md (1 hunks)
  • README.md (1 hunks)
  • cratedb_mcp/__main__.py (2 hunks)
  • cratedb_mcp/settings.py (2 hunks)
  • cratedb_mcp/util/sql.py (1 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_knowledge.py (1 hunks)
  • tests/test_mcp.py (1 hunks)
  • tests/test_util.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • CHANGES.md
  • pyproject.toml
  • cratedb_mcp/main.py
  • README.md
  • docs/backlog.md
  • cratedb_mcp/settings.py
  • tests/test_knowledge.py
  • tests/test_mcp.py
  • tests/test_util.py
🔇 Additional comments (2)
cratedb_mcp/util/sql.py (2)

13-29: The sql_is_permitted function looks solid.

The function correctly applies the SQL classification logic and has appropriate logging. The FIXME comment acknowledges areas for improvement, which is good practice.


113-126: The is_evasive implementation is correct and efficient.

The implementation correctly identifies multiple statements by checking the length of the parsed result, which is an effective way to detect SQL injection attempts.

Validate the SQL expression, only permit read queries by default.

When the `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set,
allow all types of statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allow all types of statements.
allow all types of statements. This is **not** recommended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've added f1aa610.

@proddata
Copy link
Member

proddata commented May 15, 2025

Just a thought, is this the best approach? Wouldn’t it be more robust to restrict the CrateDB user’s permissions directly used for the mcp server?
It feels like we’re partly re-implementing CrateDB’s built-in permission system externally here. Curious to hear the reasoning behind this approach!
Also Could the MCP server instead check the user’s permissions directly on CrateDB?

@amotl
Copy link
Member Author

amotl commented May 15, 2025

@proddata: Thanks for bringing up the topic, let's divert it to a dedicated issue, to discuss it for future improvements?

@proddata
Copy link
Member

@proddata: Thanks for bringing up the topic, let's divert it to a dedicated issue, to discuss it for future improvements?

Happy to discuss this in more detail, but I still want to challenge the overall direction of this PR and its relevance from a user perspective. It seems the main motivation is to enforce that only SELECT queries are allowed via the MCP endpoint — but that raises the question: what’s the actual user need or use case behind this constraint of restricting that endpoint?

@amotl
Copy link
Member Author

amotl commented May 15, 2025

It seems the main motivation is to enforce that only SELECT queries are allowed via the MCP endpoint — but that raises the question: what’s the actual user need or use case behind this constraint of restricting that endpoint?

Yes, I think the acute need is to prevent accidental wrongdoings of yet untrusted agents to not modify database content inadvertently. I think it's a good idea to track your talking points and objections per GH-10, so further assistance can be looped in easily.

@proddata
Copy link
Member

Yes, I think the acute need is to prevent accidental wrongdoings of yet untrusted agents to not modify database content inadvertently.

Just limit the mcp servers used database users privileges.

@amotl
Copy link
Member Author

amotl commented May 15, 2025

Just limit the mcp servers used database users privileges.

Why not. Can you provide a full example so we can add it to the README? 🙏

@amotl
Copy link
Member Author

amotl commented May 16, 2025

Adding a full example to the README...

CodeRabbit can possibly provide assistance here, but it can easily be wrong on relevant topics, because CrateDB knowledge might still be niche?

Asking about the overall direction of this PR and its relevance from a user perspective [...]

Please note the code companion 🐇 also refers to a defense in depth strategy, which would use both approaches:

That said, the current PR does improve security over the naive string check, so it's not without value. In fact, a defense-in-depth strategy would suggest implementing both:

  • Database-level privileges as the primary security mechanism
  • Application-level validation for better user experience (providing clear error messages)

@amotl amotl force-pushed the stronger-read-only branch from f1aa610 to 01ddc74 Compare May 16, 2025 11:17
@amotl amotl merged commit acd3f85 into main May 16, 2025
6 checks passed
@amotl amotl deleted the stronger-read-only branch May 16, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Security: Strengthen SQL validation to prevent non-SELECT operations
4 participants