-
Notifications
You must be signed in to change notification settings - Fork 2
SQL: Stronger read-only mode #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a robust SQL read-only enforcement mechanism using a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant query_sql (main)
participant sql_is_permitted (util.sql)
participant Database
User->>query_sql: Submit SQL query
query_sql->>sql_is_permitted: Validate SQL statement
sql_is_permitted-->>query_sql: Permit or deny (True/False)
alt Statement permitted
query_sql->>Database: Execute SQL
Database-->>query_sql: Return result
query_sql-->>User: Return result
else Statement denied
query_sql-->>User: Raise ValueError (SELECT only)
end
Possibly related issues
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (10)
🚧 Files skipped from review as they are similar to previous changes (10)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
cratedb_mcp/settings.py (1)
7-8
: Well-documented configuration parameter with sensible default.The implementation correctly sets a secure default (only SELECT statements permitted) with the ability to override via environment variable.
However, consider adding a warning log message when
PERMIT_ALL_STATEMENTS
is enabled, as this could pose a security risk if accidentally enabled in production.README.md (2)
43-49
: Clear documentation of the read-only access policy.The documentation effectively communicates the default behavior and override option to users.
Consider adjusting line 47 for better clarity:
-All other operations will raise a `ValueError` exception, unless the +All other operations will raise a `ValueError` exception, unless `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable will be set to a truthy value.Additionally, it would be valuable to add a security warning about the risks of enabling all statements in production environments.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)
47-47
: Minor grammar issue in environment variable description.The phrase "will be set" suggests a future action but should describe a current state.
-`CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable will be set +`CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)
cratedb_mcp/knowledge.py (1)
109-135
: Well-implemented SQL validation with potential for enhancementThe implementation provides a solid foundation for enforcing read-only mode:
- Handles edge cases like empty or None inputs
- Properly parses SQL using a dedicated library
- Checks for both statement type and prohibited clauses
- Includes a configuration flag for flexibility
Consider these potential improvements:
- The current implementation only examines the first statement if multiple statements are provided, potentially allowing SQL injection via statement chaining:
# Parse the SQL statement. parsed = sqlparse.parse(expression.strip()) if not parsed: return False + # Check for multiple statements (potential SQL injection) + if len(parsed) > 1: + return False + # Check if the expression is valid and if it's a SELECT statement, # also trying to consider `SELECT ... INTO ...` statements.
Consider adding additional checks for potentially dangerous constructs within SELECT statements, like certain function calls or subqueries that might have side effects.
The FIXME comment correctly acknowledges limitations - exploring SQLAlchemy's read-only mode could be valuable for a future implementation.
tests/test_mcp.py (1)
31-40
: Consider adding more complex SQL bypass testsWhile the current tests cover basic scenarios well, consider adding more sophisticated test cases to verify that the validation can't be bypassed with more complex SQL constructs.
Add tests for these additional edge cases:
- Statements with comments that might hide true intent
- Multiple statements separated by semicolons
- Complex subqueries within SELECT that might attempt data modification
- Use of custom functions that could have side effects
def test_query_sql_forbidden_multiple_statements(): with pytest.raises(ValueError) as ex: query_sql("SELECT 42; INSERT INTO foo VALUES (1)") assert ex.match("Only queries that have a SELECT statement are allowed") def test_query_sql_forbidden_with_comments(): with pytest.raises(ValueError) as ex: query_sql("/* Sneaky comment */ INSERT /* another comment */ INTO foo VALUES (1)") assert ex.match("Only queries that have a SELECT statement are allowed")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/knowledge.py
(2 hunks)cratedb_mcp/settings.py
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)tests/test_mcp.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
cratedb_mcp/__main__.py (1)
cratedb_mcp/knowledge.py (1)
sql_expression_permitted
(109-135)
tests/test_mcp.py (1)
cratedb_mcp/__main__.py (1)
query_sql
(16-19)
🪛 LanguageTool
README.md
[uncategorized] ~47-~47: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...IT_ALL_STATEMENTS` environment variable will be set to a truthy value. # Install ```sh...
(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)
🔇 Additional comments (11)
pyproject.toml (1)
23-23
: Dependencies added look appropriate for the new functionality.The addition of
attrs
will support the boolean conversion utility in settings, andsqlparse<0.6
will enable the stronger SQL validation mechanism described in the PR.Also applies to: 25-25
CHANGES.md (1)
14-14
: Changelog entry is clear and concise.The addition appropriately documents the enhancement to the read-only mode using
sqlparse
.cratedb_mcp/settings.py (1)
3-4
: Import looks good and is appropriately specific.Importing only the required
to_bool
converter fromattr.converters
is a good practice.cratedb_mcp/__main__.py (2)
4-4
: LGTM - Good addition of sql_expression_permitted importThis import change correctly integrates the newly introduced SQL validation function, which will provide a more robust way to enforce read-only mode.
17-17
: Excellent improvement over the previous substring check!Replacing the naive substring check with
sql_expression_permitted
significantly strengthens the read-only enforcement by properly parsing SQL statements. This prevents bypasses where a malicious statement might include the string 'select' but actually perform data modifications.tests/test_knowledge.py (2)
1-2
: Appropriate imports for the new functionalityThe imports have been correctly updated to include both the
sql_expression_permitted
function and the entirecratedb_mcp
module needed for mocking.
22-49
: Good test coverage for SQL validation edge casesThe test suite thoroughly covers different SQL validation scenarios including:
- CREATE TABLE AS SELECT statements
- INSERT statements
- SELECT INTO statements
- Empty inputs
- Whitespace-only inputs
- None inputs
- Configuration override behavior
This provides excellent verification of the SQL validation logic.
cratedb_mcp/knowledge.py (1)
2-4
: Appropriate dependencies for SQL parsingGood choice to use the
sqlparse
library for robust SQL parsing rather than implementing a custom parser. The import ofPERMIT_ALL_STATEMENTS
also enables the configuration override.tests/test_mcp.py (3)
27-29
: Good basic test for permitted queriesThis test case verifies that a simple SELECT query works as expected. The test is clear and focused.
31-35
: Appropriate test for basic forbidden queriesThis test case correctly verifies that a simple INSERT statement is blocked with the appropriate error message.
37-40
: Excellent edge case test for potential bypassThis test case specifically checks that an INSERT statement containing the string 'select' as a value is correctly blocked. This is exactly the type of edge case that would bypass a naive substring check but is caught by the new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_knowledge.py (1)
22-75
: Consider adding tests for other SQL operation typesThe test suite is comprehensive, covering many edge cases and potential vulnerabilities. To ensure complete coverage, consider adding explicit tests for other SQL statement types like UPDATE, DELETE, TRUNCATE, DROP, and ALTER to verify they're also rejected.
+ def test_sql_expression_update_rejected(): + """UPDATE statements are rejected""" + assert sql_expression_permitted("UPDATE foobar SET column = 'value'") is False + + + def test_sql_expression_delete_rejected(): + """DELETE statements are rejected""" + assert sql_expression_permitted("DELETE FROM foobar") is False + + + def test_sql_expression_truncate_rejected(): + """TRUNCATE statements are rejected""" + assert sql_expression_permitted("TRUNCATE TABLE foobar") is False + + + def test_sql_expression_drop_rejected(): + """DROP statements are rejected""" + assert sql_expression_permitted("DROP TABLE foobar") is False + + + def test_sql_expression_alter_rejected(): + """ALTER statements are rejected""" + assert sql_expression_permitted("ALTER TABLE foobar ADD COLUMN newcol INTEGER") is False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/knowledge.py
(2 hunks)cratedb_mcp/settings.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)tests/test_mcp.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/backlog.md
🚧 Files skipped from review as they are similar to previous changes (7)
- CHANGES.md
- README.md
- pyproject.toml
- tests/test_mcp.py
- cratedb_mcp/settings.py
- cratedb_mcp/main.py
- cratedb_mcp/knowledge.py
🔇 Additional comments (11)
tests/test_knowledge.py (11)
1-2
: Import changes look goodThe import statements have been appropriately updated to include the entire
cratedb_mcp
module and the newsql_expression_permitted
function, which aligns with the PR objective of implementing a stronger read-only mode validation.
22-25
: Positive test case looks goodGood baseline test that verifies simple SELECT statements are permitted as expected.
27-30
: Security test for multiple statements looks goodThis test appropriately verifies that multiple SQL statements, even if they're all SELECT statements, are rejected. This is an important security measure to prevent statement chaining.
32-35
: DDL statement rejection test looks goodThis test verifies that Data Definition Language statements like CREATE TABLE are rejected, even when they include a legitimate SELECT statement. This is crucial for maintaining the read-only constraint.
37-40
: DML statement rejection test looks goodThis test ensures Data Manipulation Language statements like INSERT are properly rejected, which is a core requirement for the read-only mode.
42-45
: SELECT INTO rejection test looks goodThis test verifies that SELECT INTO statements, which can modify the database, are rejected. This is an important edge case to cover as it combines a permitted operation (SELECT) with an operation that modifies data.
47-55
: Empty and whitespace statement tests look goodThese tests verify that empty or whitespace-only statements are rejected, which is a good practice to avoid potential edge cases or unexpected behavior.
57-60
: None input test looks goodThis test verifies that None values are properly handled and rejected, which prevents potential issues with null inputs.
62-66
: Overriding flag test is well implementedThis test correctly verifies that when the
PERMIT_ALL_STATEMENTS
flag is set to True, non-SELECT statements like INSERT are permitted. Using the mocker to patch the global flag is an appropriate approach.
68-70
: Mixed statement test looks goodThis test verifies that mixing legitimate SELECT statements with non-SELECT statements is properly rejected, which is an important security check.
72-74
: SQL comment test is important and well implementedThis test ensures that SQL comments containing disallowed commands are properly rejected, which is an important security measure to prevent SQL injection attacks that try to hide malicious commands in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_knowledge.py (1)
22-99
: Consider adding more edge case tests.Your test suite is already quite thorough, but you might consider adding tests for:
- Case-insensitive statements (e.g., "select" vs "SELECT")
- Nested queries or complex SELECT statements with JOIN, GROUP BY, etc.
- SQL injection patterns like string concatenation or comment termination
- Statements with whitespace before the command (e.g., " SELECT")
- Statements with Unicode characters or escapes
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/knowledge.py
(2 hunks)cratedb_mcp/settings.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)tests/test_mcp.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- CHANGES.md
- docs/backlog.md
- pyproject.toml
- README.md
- cratedb_mcp/main.py
- cratedb_mcp/knowledge.py
- cratedb_mcp/settings.py
- tests/test_mcp.py
🔇 Additional comments (11)
tests/test_knowledge.py (11)
1-2
: Good use of module import for mocking purposes.The import of the entire
cratedb_mcp
module alongside the explicit function import is necessary for the patching used in test_sql_expression_insert_allowed. This approach follows best practices for proper test isolation.
22-25
: LGTM! Basic SELECT test case.This test correctly verifies the core functionality of allowing simple SELECT statements.
27-31
: Good test for environment variable override.This test effectively verifies the override functionality when
PERMIT_ALL_STATEMENTS
is set to True. The use of mocker.patch.object is appropriate for temporarily modifying the module-level constant during testing.
33-36
: Multiple statements test looks good.Properly verifies that multiple statements separated by semicolons are rejected, which is an important security consideration.
38-41
: DDL rejection test is appropriate.This test correctly verifies that CREATE TABLE statements are rejected, preventing schema modifications.
43-46
: DML rejection test works well.Properly verifies that INSERT statements are rejected in the default configuration.
48-51
: Good test for SELECT INTO.This test catches a potential bypass where SELECT INTO could be used to write data despite the read-only intention.
53-56
: Edge cases are well covered.The tests for empty, whitespace-only, and None inputs ensure robustness against unexpected input formats.
Also applies to: 58-61, 63-66
68-70
: Mixed statements test is important.Good coverage for mixed statements that could potentially bypass restrictions.
72-75
: SQL comment injection test is valuable.This test verifies protection against attempts to hide disallowed commands within SQL comments.
77-80
: Comprehensive coverage of SQL command types.These tests thoroughly verify that all types of potentially harmful SQL commands (UPDATE, DELETE, TRUNCATE, DROP, ALTER) are rejected.
Also applies to: 82-85, 87-90, 92-95, 97-99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_knowledge.py (1)
75-82
: Strong protection against obfuscation techniques.These tests verify that common SQL obfuscation techniques (using multiple statements or embedding comments) are properly detected and rejected, which is critical for security.
Consider adding a few more tests for advanced SQL injection techniques such as:
def test_sql_expression_case_manipulation_rejected(): """Statements with case manipulation to hide intent are rejected""" assert sql_expression_permitted("SeLeCt * FrOm users; DrOp TaBlE users") is False def test_sql_expression_unicode_evasion_rejected(): """Statements with unicode characters to evade filters are rejected""" assert sql_expression_permitted("SELECT * FROM users; \uFF1B DROP TABLE users") is False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/knowledge.py
(2 hunks)cratedb_mcp/settings.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)tests/test_mcp.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- CHANGES.md
- docs/backlog.md
- pyproject.toml
- cratedb_mcp/main.py
- README.md
- cratedb_mcp/settings.py
- cratedb_mcp/knowledge.py
- tests/test_mcp.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_knowledge.py (1)
cratedb_mcp/knowledge.py (2)
Queries
(11-91)sql_expression_permitted
(113-129)
🔇 Additional comments (9)
tests/test_knowledge.py (9)
1-2
: Appropriate imports for the new functionality.The updated imports correctly bring in the entire
cratedb_mcp
module and specifically import the newsql_expression_permitted
function fromcratedb_mcp.knowledge
, which will be used in the new tests.
22-27
: Good test coverage for valid SELECT statements.These tests properly verify that standard SELECT statements are permitted, including variations with leading whitespace and different case formats. This ensures the core functionality of allowing read-only queries works as expected.
29-32
: Good security check for SQL injection technique.This test verifies that the system rejects a common SQL injection technique (using comments with escaped semicolons). This is essential for preventing bypass attempts.
34-38
: Well-implemented override test.The test correctly uses mocking to verify the behavior when
PERMIT_ALL_STATEMENTS
is enabled, showing that non-SELECT statements like INSERT can be allowed when explicitly configured. This flexibility is important for special cases while maintaining security by default.
40-43
: Important test for multi-statement protection.Testing that multiple statements (even if they're all SELECT) are rejected is crucial for preventing batch attacks where a malicious user might try to append a harmful statement after a legitimate one.
45-48
: Comprehensive DDL protection tests.These tests verify that all major DDL (Data Definition Language) statements are rejected, which is essential for preventing schema modifications in read-only mode.
Also applies to: 99-102, 104-107
50-53
: Comprehensive DML protection tests.These tests thoroughly check that all major DML (Data Manipulation Language) statements (INSERT, UPDATE, DELETE, TRUNCATE) are rejected, which is critical for ensuring data cannot be modified in read-only mode.
Also applies to: 84-92, 94-97
55-58
: Good catch for SELECT INTO statements.Testing rejection of SELECT INTO statements is important as these can create new tables or modify existing data, despite starting with SELECT. This shows attention to detail in identifying potential bypass methods.
60-73
: Good edge case handling for empty inputs.Testing empty strings, whitespace-only strings, and None values ensures the validation function handles these edge cases properly without errors, improving robustness.
cratedb_mcp/knowledge.py
Outdated
return True | ||
|
||
# Parse the SQL statement. | ||
parsed = sqlparse.parse(expression.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use cratedb-sqlparse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Let's explore it together with a subsequent patch? After giving it a quick try, I haven't been able to remove the dependency to classic sqlparse
just yet, but didn't want to block this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/test_knowledge.py (2)
40-43
: Consider consolidating redundant test casesThere appears to be redundancy between
test_sql_expression_multiple_statements_rejected
andtest_sql_expression_select_multiple_rejected
. Both verify that multiple statements are rejected, just with different SQL content.You could consolidate these into a single test case with multiple assertions or make their purposes more distinct:
def test_sql_expression_select_multiple_rejected(): - """Multiple SQL statements are rejected""" + """Multiple SELECT statements are rejected""" assert sql_expression_permitted("SELECT 42; SELECT 42;") is False def test_sql_expression_multiple_statements_rejected(): + """Mixed statement types (SELECT + DML) are rejected""" assert sql_expression_permitted("SELECT 42; INSERT INTO foo VALUES (1)") is FalseAlso applies to: 76-78
22-118
: Consider additional test cases for complex SQL constructsThe test suite is already comprehensive, but to further strengthen it against potential filtering bypasses, consider adding tests for more complex SQL constructs like:
- Common Table Expressions (CTEs)
- Nested subqueries
- UNION/INTERSECT/EXCEPT operations
- Window functions
- Vendor-specific syntax extensions
These could potentially be used in SQL injection attempts to bypass filtering.
def test_sql_expression_with_cte_permitted(): """SELECT with CTE is permitted""" assert sql_expression_permitted( "WITH nums AS (SELECT 1) SELECT * FROM nums" ) is True def test_sql_expression_with_union_permitted(): """SELECT with UNION is permitted""" assert sql_expression_permitted( "SELECT 1 UNION SELECT 2" ) is True def test_sql_expression_with_cte_dml_rejected(): """CTE with DML is rejected""" assert sql_expression_permitted( "WITH nums AS (SELECT 1) INSERT INTO foo SELECT * FROM nums" ) is False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/knowledge.py
(2 hunks)cratedb_mcp/settings.py
(2 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)tests/test_mcp.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- CHANGES.md
- docs/backlog.md
- README.md
- cratedb_mcp/main.py
- pyproject.toml
- cratedb_mcp/settings.py
- tests/test_mcp.py
- cratedb_mcp/knowledge.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_knowledge.py (1)
cratedb_mcp/knowledge.py (2)
Queries
(12-92)sql_expression_permitted
(114-130)
🔇 Additional comments (6)
tests/test_knowledge.py (6)
1-2
: Appropriate import structure for test mockingGood approach importing both the full module and specific functions, as this allows for proper mocking in your tests while maintaining clean function calls.
22-27
: Thorough basic SELECT statement testingGood coverage of different variations of valid SELECT statements, including whitespace and case variations.
34-38
: Effective environment variable override testingGood test demonstrating that the PERMIT_ALL_STATEMENTS flag correctly bypasses the filtering when enabled. This confirms the override mechanism works as intended.
80-83
: Good security testing for comment-based evasionTesting for SQL injection attempts via comments is critical for security. This test ensures that attackers can't use comments to disguise malicious statements.
110-118
: Strong security tests for bypass attemptsExcellent tests for case manipulation and Unicode character evasion. These are sophisticated SQL injection techniques that could potentially bypass naive filtering implementations.
1-118
: Overall excellent test coverage for SQL filteringThe test suite is thorough and well-structured, covering a wide range of scenarios including valid queries, invalid operations, edge cases, and security bypass attempts. The tests will help ensure that only read-only operations are permitted while properly blocking all attempts at modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/test_knowledge.py (3)
29-31
: Good validation of malformed SELECT statements.This test correctly verifies that commented-out or malformed SELECT statements are rejected. Consider adding a few more examples of malformed statements for more comprehensive coverage.
def test_sql_expression_select_rejected(): """Bogus SQL SELECT statements are rejected""" assert sql_expression_permitted(r"--\; select 42") is False + assert sql_expression_permitted(r"SEL ECT 42") is False + assert sql_expression_permitted(r"SELECT42") is False
22-117
: Consider adding parameterized tests to reduce duplication.While the test coverage is comprehensive, consider using pytest's parameterized testing to reduce code duplication and make it easier to add more test cases in the future.
import pytest @pytest.mark.parametrize( "statement,expected", [ ("SELECT 42", True), (" SELECT 42", True), ("select 42", True), (r"--\; select 42", False), ("SELECT 42; SELECT 42;", False), ("CREATE TABLE foobar AS SELECT 42", False), # Add more cases here ], ) def test_sql_expression_permitted_cases(statement, expected): """Test various SQL statements against permission rules""" assert sql_expression_permitted(statement) is expectedThis approach would maintain the same test coverage while making the code more maintainable.
22-117
: Consider testing additional SQL injection techniques.While the current tests cover many scenarios, consider adding tests for these additional SQL injection techniques:
- Inline comments within SQL keywords (
SEL/**/ECT
)- Stacked queries with alternate delimiters
- UNION-based injection attempts
def test_sql_expression_inline_comments_rejected(): """Statements with inline comments to hide intent are rejected""" assert sql_expression_permitted("SEL/**/ECT * FROM users") is False def test_sql_expression_union_injection_rejected(): """UNION-based injection attempts are rejected""" assert sql_expression_permitted("SELECT 1 UNION SELECT * FROM users") is False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cratedb_mcp/knowledge.py
(2 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- cratedb_mcp/knowledge.py
🔇 Additional comments (8)
tests/test_knowledge.py (8)
1-2
: Sensible import approach for testing.The import of the entire
cratedb_mcp
module alongside specific imports enables the proper mocking of thePERMIT_ALL_STATEMENTS
flag in the test. This is a good practice for testing.
22-27
: Well-structured test for permitted SELECT statements.The test function properly verifies that standard SELECT statements are permitted in different formats (with leading whitespace and case variations). The docstring clearly explains the purpose of the test.
34-37
: Effective testing of permission override.The test correctly verifies that when
PERMIT_ALL_STATEMENTS
is set toTrue
, non-SELECT statements like INSERT are permitted. This is an important test for the configuration flag mentioned in the PR.
40-42
: Appropriate rejection of multiple statements.This test correctly verifies that multiple SQL statements, even if they're all SELECTs, are rejected. This is an important security measure to prevent statement chaining attacks.
45-107
: Comprehensive testing of rejected statement types.The tests effectively verify that various SQL statement types are properly rejected:
- DDL statements (CREATE, DROP, ALTER, TRUNCATE)
- DML statements (INSERT, UPDATE, DELETE)
- Combination statements (SELECT INTO)
This thorough coverage ensures that the read-only enforcement works as expected for all common SQL operations.
61-73
: Good validation of edge cases for input.The tests properly validate edge cases:
- Empty strings
- Whitespace-only strings
- None values
This ensures robustness in handling various input scenarios.
76-82
: Strong defense against SQL injection techniques.These tests verify protection against common SQL injection techniques:
- Multiple statements separated by semicolons
- Hidden statements within comments
This is crucial for maintaining security in the SQL validation.
110-117
: Excellent protection against evasion techniques.The tests verify that the system can detect and reject sophisticated evasion attempts:
- Case manipulation to hide intentions
- Unicode character substitution to bypass filters
These tests demonstrate a strong security mindset in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
cratedb_mcp/util/sql.py (1)
1-141
: 💡 Verification agent🧩 Analysis chain
Consider adding tests for SQL statements with embedded Unicode control characters.
While you check for Unicode semicolons, there are other Unicode characters that could be used to evade filters, such as zero-width spaces, bidirectional overrides, etc.
Run this script to check for additional evasion techniques:
🏁 Script executed:
#!/bin/bash # Description: Test SQL filtering against additional Unicode evasion techniques # Get current Unicode evasion test case echo "Current test:" grep -n "unicode_evasion" tests/test_util.py -A 3 # Test script with additional Unicode evasion test cases cat << 'EOF' > /tmp/test_more_evasions.py from cratedb_mcp.util.sql import sql_is_permitted # Bidirectional override characters print("Testing RIGHT-TO-LEFT OVERRIDE (U+202E):") print(sql_is_permitted("SELECT * FROM users \u202E ELBAT PORD -- ")) # Zero-width spaces print("Testing ZERO WIDTH SPACE (U+200B):") print(sql_is_permitted("SELECT * FROM users;\u200BDROP TABLE users")) # Line separator print("Testing LINE SEPARATOR (U+2028):") print(sql_is_permitted("SELECT * FROM users\u2028DROP TABLE users")) # Paragraph separator print("Testing PARAGRAPH SEPARATOR (U+2029):") print(sql_is_permitted("SELECT * FROM users\u2029DROP TABLE users")) EOF # Run test script python /tmp/test_more_evasions.pyLength of output: 547
🏁 Script executed:
#!/bin/bash # Re-run Unicode evasion tests with the repository on PYTHONPATH export PYTHONPATH="$(pwd)" python - << 'EOF' from cratedb_mcp.util.sql import sql_is_permitted cases = { "RIGHT-TO-LEFT OVERRIDE (U+202E)": "SELECT * FROM users \u202E ELBAT PORD -- ", "ZERO WIDTH SPACE (U+200B)": "SELECT * FROM users;\u200BDROP TABLE users", "LINE SEPARATOR (U+2028)": "SELECT * FROM users\u2028DROP TABLE users", "PARAGRAPH SEPARATOR (U+2029)": "SELECT * FROM users\u2029DROP TABLE users", } for name, sql in cases.items(): result = sql_is_permitted(sql) print(f"{name}: permitted? {result}") EOFLength of output: 318
Add tests for additional Unicode control‐character evasion cases
The current suite only covers the fullwidth semicolon (
\uFF1B
), but SQL filters can be bypassed with other invisible or bidirectional characters. Please extendtests/test_util.py
with cases for:
- RIGHT-TO-LEFT OVERRIDE (U+202E)
- ZERO WIDTH SPACE (U+200B)
- LINE SEPARATOR (U+2028)
- PARAGRAPH SEPARATOR (U+2029)
For example:
import pytest from cratedb_mcp.util.sql import sql_is_permitted @pytest.mark.parametrize("label,expr", [ ("U+202E RTL OVERRIDE", "SELECT * FROM users \u202E ELBAT PORD -- "), ("U+200B ZWSP", "SELECT * FROM users;\u200BDROP TABLE users"), ("U+2028 LINE SEP", "SELECT * FROM users\u2028DROP TABLE users"), ("U+2029 PARAGRAPH SEP", "SELECT * FROM users\u2029DROP TABLE users"), ]) def test_sql_expression_unicode_evasion_rejected(label, expr): """Ensure statements with {} are rejected""".format(label) assert sql_is_permitted(expr) is FalseThis will help ensure the filter catches all known Unicode‐based evasion techniques.
🧹 Nitpick comments (5)
cratedb_mcp/util/sql.py (5)
13-30
: Add type validation for the expression parameter.The function is well-documented, but it doesn't validate the type of the
expression
parameter before using it, which could lead to unexpected behavior with non-string inputs.def sql_is_permitted(expression: str) -> bool: """ Validate the SQL expression, only permit read queries by default. When the `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set, allow all types of statements. FIXME: Revisit implementation, it might be too naive or weak. Issue: https://github.com/crate/cratedb-mcp/issues/10 Question: Does SQLAlchemy provide a solid read-only mode, or any other library? """ + # Ensure expression is a string + if expression is None or not isinstance(expression, str): + logger.warning(f"Denied non-string SQL expression: {expression}") + return False is_dql = SqlFilter(expression=expression, permit_all=PERMIT_ALL_STATEMENTS).is_dql if is_dql is True: logger.info(f"Permitted SQL expression: {expression}") else: logger.warning(f"Denied SQL expression: {expression}") return is_dql
32-47
: Add error handling in the post-initialization method.The
__post_init__
method assumesself.expression
is a string if it's truthy, which could lead to an AttributeError if a non-string object is passed.def __post_init__(self) -> None: - if self.expression: + if self.expression and isinstance(self.expression, str): self.expression = self.expression.strip() + elif self.expression and not isinstance(self.expression, str): + # Handle non-string inputs by converting to string + self.expression = str(self.expression).strip()
48-63
: Add error handling for SQL parsing failures.The parsing methods don't include error handling for failures, which could occur if the expression is not valid SQL or if the libraries raise exceptions.
def parse_cratedb(self): """ Parse expression using `cratedb-sqlparse` library. """ if self._parsed_cratedb is None: - self._parsed_cratedb = cratedb_sqlparse.sqlparse(self.expression) + try: + self._parsed_cratedb = cratedb_sqlparse.sqlparse(self.expression) + except Exception as e: + # If parsing fails, return an empty list to indicate invalid SQL + logger.warning(f"Failed to parse SQL with cratedb_sqlparse: {e}") + self._parsed_cratedb = [] return self._parsed_cratedb def parse_sqlparse(self): """ Parse expression using traditional `sqlparse` library. """ if self._parsed_sqlparse is None: - self._parsed_sqlparse = sqlparse.parse(self.expression) + try: + self._parsed_sqlparse = sqlparse.parse(self.expression) + except Exception as e: + # If parsing fails, return an empty list to indicate invalid SQL + logger.warning(f"Failed to parse SQL with sqlparse: {e}") + self._parsed_sqlparse = [] return self._parsed_sqlparse
88-101
: Add defensive check before accessing parsed SQL data.The
operation
property assumes parsing succeeded and returned at least one statement, which could lead to an IndexError if parsing failed or returned an empty list.@property def operation(self) -> str: """ The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc. """ - return self._parsed_cratedb[0].type.upper() + parsed = self.parse_cratedb() + if not parsed: + return "" # Return empty string for invalid SQL + return parsed[0].type.upper()
109-141
: Add defensive checks in is_select_into and is_evasive properties.Both properties make assumptions about the structure of the parsed SQL, which could lead to IndexErrors if parsing fails or returns unexpected results. Also, the approach for detecting SELECT INTO statements could be improved.
@property def is_select_into(self) -> bool: """ Use traditional `sqlparse` for catching `SELECT ... INTO ...` statements. Note: With `cratedb-sqlparse`, we can't find the `INTO` token at all? Examples: SELECT * INTO foobar FROM bazqux SELECT * FROM bazqux INTO foobar """ parsed = self.parse_sqlparse() + if not parsed: + return False tokens = [str(item).upper() for item in parsed[0]] return "INTO" in tokens @property def is_evasive(self) -> bool: """ Use traditional `sqlparse` for catching evasive SQL statements. A practice picked up from CodeRabbit was to reject multiple statements to prevent potential SQL injections. Is it a viable suggestion? Note: This currently does not work with `cratedb-sqlparse`? Examples: SELECT * FROM users; \uff1b DROP TABLE users """ parsed = self.parse_sqlparse() return len(parsed) > 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/util/sql.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(1 hunks)tests/test_util.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (3)
- pyproject.toml
- docs/backlog.md
- cratedb_mcp/main.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_util.py (1)
cratedb_mcp/util/sql.py (1)
sql_is_permitted
(13-29)
🔇 Additional comments (13)
tests/test_util.py (10)
1-3
: Imports look good.The imports are properly organized, importing the necessary modules and functions for testing the
sql_is_permitted
function.
5-10
: Well-structured test for basic SELECT statements.Good comprehensive test coverage for basic SELECT statements, including variations with whitespace and case sensitivity.
12-15
: Good security test for bogus SELECT statements.This test properly checks that a statement with a comment and an escaped semicolon (which could be an attempt to bypass filtering) is rejected.
17-21
: Good test for override behavior.Effectively tests that when
PERMIT_ALL_STATEMENTS
is set toTrue
, non-SELECT statements are allowed. Using the mocker to patch the variable is the right approach.
23-26
: Strong security check for multiple statements.Properly verifies that multiple SQL statements separated by semicolons are rejected, even if they are all SELECT statements - this is crucial for preventing SQL injection.
28-42
: Comprehensive tests for DDL, DML, and SELECT INTO statements.These tests thoroughly verify that:
- DDL statements like CREATE TABLE are rejected
- DML statements like INSERT are rejected
- SELECT INTO statements (both syntax variations), which are a form of DML disguised as SELECT, are rejected
This provides good coverage for ensuring read-only mode and preventing data modification.
44-57
: Good edge case handling.These tests properly handle edge cases including:
- Empty statements
- Whitespace-only statements
- None input
This ensures the filtering is robust against various invalid inputs.
59-66
: Strong security checks for evasive techniques.These tests effectively verify that potentially malicious SQL injection techniques are caught:
- Multiple statements with different types (SELECT and INSERT)
- Statements with comments attempting to hide malicious SQL
This helps protect against SQL injection attempts.
68-91
: Comprehensive DML and DDL rejection tests.Good coverage of various SQL command types that should be rejected:
- UPDATE statements
- DELETE statements
- TRUNCATE statements
- DROP statements
- ALTER statements
This ensures that all types of data modification and schema changes are blocked in read-only mode.
93-101
: Excellent security checks for evasion techniques.These tests verify protection against sophisticated SQL injection attempts:
- Case manipulation to hide intent (e.g., "SeLeCt * FrOm users; DrOp TaBlE users")
- Unicode character evasion (using Unicode semicolon "\uFF1B" instead of regular semicolon)
These are advanced security checks that show attention to detail in preventing SQL injection.
cratedb_mcp/util/sql.py (3)
1-11
: Proper imports and setup.The imports are organized well, including standard library modules, third-party libraries, and project-specific imports. Setting up logging is also a good practice.
64-87
: The is_dql property looks good.The property has strong logic for determining if an SQL expression is a read-only operation. It properly handles empty inputs, the override flag, and multiple statements, while checking for camouflaged operations.
102-108
: The is_camouflage property is well defined.Good combination of checks to detect innocent-looking SELECT statements that might be trying to evade filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cratedb_mcp/util/sql.py (3)
56-78
: Possible redundancy inis_dql
propertyThe
is_dql
property has solid logic, but there's redundancy in checking for multiple statements.Line 72-73 checks if
len(parsed) > 1
, but this same check is also implemented in theis_evasive
property. Consider refactoring to useself.is_evasive
instead of duplicating the logic:- # Reject multiple statements to prevent potential SQL injections. - if len(parsed) > 1: - return False + # Reject multiple statements to prevent potential SQL injections. + if self.is_evasive: + return FalseAlso, note that line 68 references
cratedb-sqlparse
in the comment but actually usessqlparse
. Consider updating the comment for clarity.
101-115
: Consider more robustINTO
detectionThe
is_select_into
property could be susceptible to false positives if "INTO" appears in string literals.Consider a more syntax-aware approach for detecting "INTO" clauses:
- tokens = [str(token).upper() for token in parsed[0].tokens] - return "INTO" in tokens + # More robust approach to detect INTO not within string literals + for token in parsed[0].tokens: + if token.ttype in (sqlparse.tokens.Keyword, sqlparse.tokens.Keyword.DML) and str(token).upper() == "INTO": + return True + return FalseThis would check that the token is actually a keyword, not just a string containing "INTO".
117-132
: Consider removing redundancy withis_dql
The
is_evasive
property duplicates logic that's already in theis_dql
method.To avoid maintenance issues if one implementation changes but not the other, consider refactoring to make one method call the other:
- parsed = self.parse_sqlparse() - return len(parsed) > 1 + return len(self.parse_sqlparse()) > 1Also, since this check is already performed in
is_dql
, consider restructuring to avoid duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/settings.py
(2 hunks)cratedb_mcp/util/sql.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(1 hunks)tests/test_mcp.py
(1 hunks)tests/test_util.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (8)
- CHANGES.md
- pyproject.toml
- README.md
- docs/backlog.md
- cratedb_mcp/main.py
- cratedb_mcp/settings.py
- tests/test_util.py
- tests/test_mcp.py
🔇 Additional comments (5)
cratedb_mcp/util/sql.py (5)
31-47
: Well-structured dataclass with clear purposeThe
SqlStatementClassifier
is well-designed using dataclasses, with good documentation explaining its purpose. The__post_init__
method appropriately handles whitespace.
48-54
: Good implementation of caching for parsed SQLThe
parse_sqlparse
method efficiently caches parsing results to avoid redundant work.
80-92
: Clear and concise property implementationsThe
is_select
andoperation
properties are well-implemented and clearly named.
94-99
: Good combination of security checksThe
is_camouflage
property effectively combines multiple security checks into a single property.
12-28
:✅ Verification successful
Great enforcement with logging! Consider adding test cases for edge cases.
The
sql_is_permitted
function implements a clean validation approach with appropriate logging. The FIXME comment acknowledges potential limitations, which is good for future improvements.Consider adding test cases for edge cases such as:
🏁 Script executed:
#!/bin/bash # Check if the codebase already has test cases for these edge cases rg -A 2 -B 2 "test.*sql" --type pyLength of output: 5312
Edge case test coverage confirmed
The
tests/test_util.py
suite already includes comprehensive tests for all the edge cases listed (empty/None statements, multiple statements, comments, unicode evasion, DDL/DML types, case manipulation, etc.). No additional tests are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
cratedb_mcp/util/sql.py (6)
12-29
: Review logging of full SQL expressions for potential security issuesWhile the function is well-structured, logging the complete SQL expression might inadvertently expose sensitive data (e.g., data in WHERE clauses).
Consider redacting or truncating the SQL in the logs:
- logger.info(f"Permitted SQL expression: {expression}") + logger.info(f"Permitted SQL expression: {expression[:50]}...") - logger.warning(f"Denied SQL expression: {expression}") + logger.warning(f"Denied SQL expression: {expression[:50]}...")Also, the
is_dql is True
check is redundant:- if is_dql is True: + if is_dql:
31-46
: Consider adding null check for expression in class constructorThe
__post_init__
method strips whitespace ifexpression
is truthy, but there's no validation thatexpression
isn't None.def __post_init__(self) -> None: + if self.expression is None: + self.expression = "" if self.expression: self.expression = self.expression.strip()Also, consider using a more specific type hint for
_parsed_sqlparse
instead oft.Any
.
48-54
: Add return type annotation and error handling to parse_sqlparse methodThe method lacks a return type annotation and could fail if
self.expression
is None.-def parse_sqlparse(self): +def parse_sqlparse(self) -> t.List[sqlparse.sql.Statement]: """ Parse expression using traditional `sqlparse` library. """ if self._parsed_sqlparse is None: + if not self.expression: + return [] self._parsed_sqlparse = sqlparse.parse(self.expression) return self._parsed_sqlparse
56-79
: Incorrect comment in is_dql about cratedb-sqlparseThe comment "Parse the SQL statement using
cratedb-sqlparse
" is inaccurate since the code is using the standardsqlparse
library.- # Parse the SQL statement using `cratedb-sqlparse`. + # Parse the SQL statement using the sqlparse library.Also, there's a duplicate check for multiple statements here and in
is_evasive
, which could be confusing for maintainers.
117-133
: Duplicate check for multiple statementsThe
is_evasive
property duplicates the check already performed inis_dql
.To avoid redundancy and improve maintainability, consider refactoring to use
is_evasive
inis_dql
without repeating the logic:@property def is_dql(self) -> bool: """ Whether the statement is a DQL statement, which effectively invokes read-only operations only. """ if not self.expression: return False if self.permit_all: return True # Parse the SQL statement using the sqlparse library. parsed = self.parse_sqlparse() - # Reject multiple statements to prevent potential SQL injections. - if len(parsed) > 1: - return False # Check if the expression is valid and if it's a DQL/SELECT statement, # also trying to consider `SELECT ... INTO ...` and evasive # `SELECT * FROM users; \uff1b DROP TABLE users` statements. return self.is_select and not self.is_camouflageAdditionally, the docstring mentions
cratedb-sqlparse
but the code uses standardsqlparse
.
1-133
: Enhance security-critical module with additional safeguardsThis is an excellent start for a security-critical filter, but consider these additional protections:
- Handle malicious SQL encodings (hex, comments, etc.)
- Add specific checks for common SQL injection patterns
- Consider implementing a deny list of dangerous keywords
- Add unit tests with comprehensive edge cases
The current implementation mainly focuses on DQL vs. non-DQL distinction. For enhanced security, consider a more comprehensive approach with a proper SQL parser or query executor that runs in a sandbox with appropriate permissions.
For maximum security, you might want to explore ORM-based solutions or database view-based approaches instead of SQL filtering.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/settings.py
(2 hunks)cratedb_mcp/util/sql.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(1 hunks)tests/test_mcp.py
(1 hunks)tests/test_util.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/test_knowledge.py
🚧 Files skipped from review as they are similar to previous changes (8)
- CHANGES.md
- docs/backlog.md
- pyproject.toml
- cratedb_mcp/main.py
- README.md
- cratedb_mcp/settings.py
- tests/test_mcp.py
- tests/test_util.py
🔇 Additional comments (1)
cratedb_mcp/util/sql.py (1)
1-10
: Strong module for SQL validation, clean importsThe module has well-organized imports and proper logger setup, which establishes a good foundation for this security-focused feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
cratedb_mcp/util/sql.py (1)
83-89
:⚠️ Potential issueAdd error handling to the operation property.
The
operation
property accessesparsed[0]
without checking if the parsed result contains any elements, which could cause an IndexError.@property def operation(self) -> str: """ The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc. """ parsed = self.parse_sqlparse() + if not parsed: + return "" return parsed[0].get_type().upper()
🧹 Nitpick comments (3)
cratedb_mcp/util/sql.py (3)
1-11
: Import structure looks good but consider organizing them alphabetically.The imports are well-organized by category (standard library, third-party, and local imports), but within each category, they could be sorted alphabetically for better maintainability.
32-50
: Consider validating permit_all flag in post_initThe class handles None for expression but doesn't validate the permit_all flag. While it's unlikely to be None since it has a default value, being thorough with input validation is good practice.
def __post_init__(self) -> None: if self.expression is None: self.expression = "" if self.expression: self.expression = self.expression.strip() + # Ensure permit_all is a boolean + self.permit_all = bool(self.permit_all)
59-74
: Consider adding explicit validation of SQL statement structureThe
is_dql
property checks if the statement is SELECT and not camouflage, but it doesn't validate that the SQL structure is correct. Consider adding a check that the parsed statement is valid SQL.@property def is_dql(self) -> bool: """ Whether the statement is a DQL statement, which effectively invokes read-only operations only. """ if not self.expression: return False if self.permit_all: return True + # Ensure we have a valid SQL statement + parsed = self.parse_sqlparse() + if not parsed: + return False + # Check if the expression is valid and if it's a DQL/SELECT statement, # also trying to consider `SELECT ... INTO ...` and evasive # `SELECT * FROM users; \uff1b DROP TABLE users` statements. return self.is_select and not self.is_camouflage
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGES.md
(1 hunks)README.md
(1 hunks)cratedb_mcp/__main__.py
(2 hunks)cratedb_mcp/settings.py
(2 hunks)cratedb_mcp/util/sql.py
(1 hunks)docs/backlog.md
(1 hunks)pyproject.toml
(1 hunks)tests/test_knowledge.py
(1 hunks)tests/test_mcp.py
(1 hunks)tests/test_util.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- CHANGES.md
- pyproject.toml
- cratedb_mcp/main.py
- README.md
- docs/backlog.md
- cratedb_mcp/settings.py
- tests/test_knowledge.py
- tests/test_mcp.py
- tests/test_util.py
🔇 Additional comments (2)
cratedb_mcp/util/sql.py (2)
13-29
: The sql_is_permitted function looks solid.The function correctly applies the SQL classification logic and has appropriate logging. The FIXME comment acknowledges areas for improvement, which is good practice.
113-126
: The is_evasive implementation is correct and efficient.The implementation correctly identifies multiple statements by checking the length of the parsed result, which is an effective way to detect SQL injection attempts.
cratedb_mcp/util/sql.py
Outdated
Validate the SQL expression, only permit read queries by default. | ||
|
||
When the `CRATEDB_MCP_PERMIT_ALL_STATEMENTS` environment variable is set, | ||
allow all types of statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow all types of statements. | |
allow all types of statements. This is **not** recommended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've added f1aa610.
Just a thought, is this the best approach? Wouldn’t it be more robust to restrict the CrateDB user’s permissions directly used for the mcp server? |
@proddata: Thanks for bringing up the topic, let's divert it to a dedicated issue, to discuss it for future improvements? |
Happy to discuss this in more detail, but I still want to challenge the overall direction of this PR and its relevance from a user perspective. It seems the main motivation is to enforce that only SELECT queries are allowed via the MCP endpoint — but that raises the question: what’s the actual user need or use case behind this constraint of restricting that endpoint? |
Yes, I think the acute need is to prevent accidental wrongdoings of yet untrusted agents to not modify database content inadvertently. I think it's a good idea to track your talking points and objections per GH-10, so further assistance can be looped in easily. |
Just limit the mcp servers used database users privileges. |
Why not. Can you provide a full example so we can add it to the README? 🙏 |
CodeRabbit can possibly provide assistance here, but it can easily be wrong on relevant topics, because CrateDB knowledge might still be niche?
Please note the code companion 🐇 also refers to a defense in depth strategy, which would use both approaches:
|
About
CodeRabbit had a remark about the previous variant of filtering non-SELECT statements and suggested to use
sqlparse
.Review
This patch aims to implement a stronger approach using
sqlparse
, as suggested.If you think you can come up with finding another leak, please let us know.
Note
That's a stacked PR. GH-9 will need to go in first.