-
Notifications
You must be signed in to change notification settings - Fork 602
Add CREATE FUNCTION
support for SQL Server
#1808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/parser/mod.rs
Outdated
|
||
self.expect_keyword_is(Keyword::AS)?; | ||
self.expect_keyword_is(Keyword::BEGIN)?; | ||
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this will suffer the same fate as other multi statement syntax currently in this parser, which is that semi-colons are required to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah if the desired behavior is to have optional semicolon delimited statements for mssql, what we could do would be to make the behavior configurable when parsing the statements list. We could do something like this?
pub fn parse_statements(&mut self) -> Result<Vec<Statement>, ParserError> {
self.parse_statements_inner(true)
}
fn parse_statements_inner(&mut self, require_semicolon_delimiter) -> Result<Vec<Statement>, ParserError> {
// ... same as the body of `parse_statements()` on main
// ... I _think_ except for this part to make the behavior configurable
if require_semicolon_delimiter && expecting_statement_delimiter {
return self.expected("end of statement", self.peek_token());
}
// ...
}
then the mssql code can call the inner function directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you're getting at with regards to configuration, but I'm not yet sure how to actually implement that statement parsing logic.
Do you see the semicolon-less statement parsing question as a blocker for the PR? If not I'd prefer to get this in (per your other suggestions) and then tackle this separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'm not yet sure how to actually implement that statement parsing logic
hmm not sure I followed what we mean here? parsing a semicolon-less statement I imagine would follow from the above code snippet by calling self.parse_statements_inner(false)
It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aharpervc just in case, note that this part of the PR is unresolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder.
It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.
I was able to refactor the latter, and the former we'll need regardless of the semi colon situation. Does the latest iteration resolve your concerns here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does something like in this suggestion work just as well? its currently unclear to me why the semicolon handling is a special case of an IfCondition
statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not deeply opposed to making an inner parse function, I guess, but it also doesn't seem required for this PR.
its currently unclear to me why the semicolon handling is a special case of an IfCondition statement
I reverted that change, I think either it was a problem while hacking on the branch or pollution from another branch 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have some of the related changes left in this codepath?
datafusion-sqlparser-rs/src/parser/mod.rs
Lines 4458 to 4466 in 6d0ea64
let semi_colon_expected = match values.last() { | |
Some(Statement::If(if_statement)) => if_statement.end_token.is_some(), | |
Some(_) => true, | |
None => false, | |
}; | |
if semi_colon_expected { | |
self.expect_token(&Token::SemiColon)?; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to address a separate issue. The situation is addressed by the "create_function_with_conditional" test case example, which runs verified_stmt on this sql:
CREATE FUNCTION some_scalar_udf()
RETURNS INT
AS
BEGIN
IF 1 = 2
BEGIN
RETURN 1;
END;
RETURN 0;
END
The difficulty is regarding the IF statement. The pre-existing parse_if_stmt
logic advances the parse past semi-colons (if any) before looking for an else
block. If there's no else
block as in this example, the next token will be part of the next statement after END's semi-colon.
That conflicts with the pre-existing logic in parse_statement_list
, which unconditionally required a semi-colon after each parsed statement.
Therefore, this example would incorrectly be a parser error. To work around that limitation I added this conditional logic to parse_statement_list to inspect the previous statement for an IF statement end block. If found, no semi colon is expected.
Thinking about it some more, I suppose it would be better to keep that concern inside the parse_if_stmt function rather than bleeding out into parse_statement_list. I've done that now on the branch, thoughts?
src/parser/mod.rs
Outdated
|
||
self.expect_keyword_is(Keyword::AS)?; | ||
self.expect_keyword_is(Keyword::BEGIN)?; | ||
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah if the desired behavior is to have optional semicolon delimited statements for mssql, what we could do would be to make the behavior configurable when parsing the statements list. We could do something like this?
pub fn parse_statements(&mut self) -> Result<Vec<Statement>, ParserError> {
self.parse_statements_inner(true)
}
fn parse_statements_inner(&mut self, require_semicolon_delimiter) -> Result<Vec<Statement>, ParserError> {
// ... same as the body of `parse_statements()` on main
// ... I _think_ except for this part to make the behavior configurable
if require_semicolon_delimiter && expecting_statement_delimiter {
return self.expected("end of statement", self.peek_token());
}
// ...
}
then the mssql code can call the inner function directly
f5857d9
to
618eb4d
Compare
src/parser/mod.rs
Outdated
@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
fn parse_return(&mut self) -> Result<Statement, ParserError> { | |||
let expr = self.parse_expr()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a maybe_parse, like how parse_raise_stmt does it for Exprs? The approach I picked for the discussion of RETURN variants above was to only do what's needed for this PR & these tests, which always have an Expr after the return.
So I think perhaps what we'd need in a followup PR is:
- test case for bare RETURN
- update this to be use "maybe" logic to find an expr (perhaps we'd like
maybe_expr
to wrap up that pattern?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?
Otherwise indeed the expectation would be that this uses maybe_parse and we have the test cases in place as part of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?
I don't think this is possible, since SQL Server requires that functions have a return, so create function goes together with return. I'll take another pass on the PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah alright, in that case I think its only the test cases missing for the return statement we can add those to common.rs I imagine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test case for a bare RETURN
to sqlparser_common.rs, although I'm unsure how useful it is. At least for SQL Server, a top level return must be bare (eg, can't return anything). Therefore, the more useful tests are where return is inside a function body, trigger body, etc. And we have those tests in their respective branches.
src/parser/mod.rs
Outdated
@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
fn parse_return(&mut self) -> Result<Statement, ParserError> { | |||
let expr = self.parse_expr()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?
Otherwise indeed the expectation would be that this uses maybe_parse and we have the test cases in place as part of this PR
src/parser/mod.rs
Outdated
|
||
self.expect_keyword_is(Keyword::AS)?; | ||
self.expect_keyword_is(Keyword::BEGIN)?; | ||
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'm not yet sure how to actually implement that statement parsing logic
hmm not sure I followed what we mean here? parsing a semicolon-less statement I imagine would follow from the above code snippet by calling self.parse_statements_inner(false)
It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.
580114e
to
9954416
Compare
src/ast/spans.rs
Outdated
end_token: AttachedToken(end), | ||
} => union_spans([start.span, end.span].into_iter()), | ||
ConditionalStatements::BeginEnd(bes) => { | ||
union_spans([bes.begin_token.0.span, bes.end_token.0.span].into_iter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we impl Spanned for BeginEndStatements
and then call bes.span()
here instead? That would make the span impl reusable by nother nodes that embed the BeginEndStatements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you're asking for, I'll give the implementation a shot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/ast/mod.rs
Outdated
} | ||
} | ||
} | ||
|
||
/// A shared representation of `BEGIN`, multiple statements, and `END` tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// A shared representation of `BEGIN`, multiple statements, and `END` tokens. | |
/// Represents a list of statements enclosed within `BEGIN` and `END` keywords. | |
/// Example: | |
/// ```sql | |
/// BEGIN | |
/// SELECT 1; | |
/// SELECT 2; | |
/// END | |
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/parser/mod.rs
Outdated
@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
fn parse_return(&mut self) -> Result<Statement, ParserError> { | |||
let expr = self.parse_expr()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah alright, in that case I think its only the test cases missing for the return statement we can add those to common.rs I imagine
008d821
to
1e51121
Compare
tests/sqlparser_common.rs
Outdated
|
||
#[test] | ||
fn parse_return() { | ||
all_dialects().verified_stmt("RETURN"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this check assert the AST? Also we can add another test case that includes a return statement having an expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this check assert the AST?
Done 👍
Also we can add another test case that includes a return statement having an expression
Done, although this is invalid for SQL Server. I think having that test here would also mean we can't have a test that asserts an error when return with value is found as a top level statement. Thoughts? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah given that its currently part of the parser behavior I think its reasonable to assert it. If the parser later stops accepting a top-level return value then we would change the test here to assert an error instead
src/parser/mod.rs
Outdated
|
||
self.expect_keyword_is(Keyword::AS)?; | ||
self.expect_keyword_is(Keyword::BEGIN)?; | ||
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does something like in this suggestion work just as well? its currently unclear to me why the semicolon handling is a special case of an IfCondition
statement
49f746c
to
6d0ea64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aharpervc! Left a couple minor comments, I think beyond the if condition semicolon special handling the changes look reasonable to me!
src/parser/mod.rs
Outdated
let return_type = if self.parse_keyword(Keyword::RETURNS) { | ||
Some(self.parse_data_type()?) | ||
} else { | ||
return parser_err!("Expected RETURNS keyword", self.peek_token().span.start); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let return_type = if self.parse_keyword(Keyword::RETURNS) { | |
Some(self.parse_data_type()?) | |
} else { | |
return parser_err!("Expected RETURNS keyword", self.peek_token().span.start); | |
}; | |
self.expect_keyword(Keyword::RETURNS)?; | |
let return_type = Some(self.parse_data_type()?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
@@ -15063,6 +15127,15 @@ impl<'a> Parser<'a> { | |||
message: Box::new(self.parse_expr()?), | |||
})) | |||
} | |||
/// Parse [Statement::Return] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Parse [Statement::Return] | |
/// Parse [Statement::Return] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/parser/mod.rs
Outdated
|
||
self.expect_keyword_is(Keyword::AS)?; | ||
self.expect_keyword_is(Keyword::BEGIN)?; | ||
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have some of the related changes left in this codepath?
datafusion-sqlparser-rs/src/parser/mod.rs
Lines 4458 to 4466 in 6d0ea64
let semi_colon_expected = match values.last() { | |
Some(Statement::If(if_statement)) => if_statement.end_token.is_some(), | |
Some(_) => true, | |
None => false, | |
}; | |
if semi_colon_expected { | |
self.expect_token(&Token::SemiColon)?; | |
} |
tests/sqlparser_common.rs
Outdated
|
||
#[test] | ||
fn parse_return() { | ||
all_dialects().verified_stmt("RETURN"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah given that its currently part of the parser behavior I think its reasonable to assert it. If the parser later stops accepting a top-level return value then we would change the test here to assert an error instead
- in this dialect, functions can have statement(s) bodies like stored procedures (including `BEGIN`..`END`) - functions must end with `RETURN`, so a corresponding statement type is also introduced
…nctions - this lets us disacard the former (unfortunate, bespoke) multi statement parsing because we can just use `parse_statement_list` - however, `parse_statement_list` also needed a small change to allow subsequent statements to come after the final `END`
bfd767e
to
b451b2d
Compare
- no need for special IF logic to bleed over to another function
e04f4b7
to
ca28413
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @aharpervc!
cc @alamb
Partially related: #1800
For reference: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver16
In SQL Server, functions look a lot like procedures, with begin/end, multiple statements, and a return. To accommodate that here in the parser,
BeginEndStatement
was extracted from the conditional logic to be reused here, along with aRETURN
statement typeThen also while I was working on this, I noticed that only SQL Server (so far...?) supports
OR ALTER
, so I introduced a new struct field to track that. (Again, very similar toCREATE PROCEDURE
logic).I'm not aiming to enable parsing every function type here in this PR -- just trying to get something operational, and the additional capabilities like table expressions, etc, can be added in later.
To continue the recent discussion, I noticed that both these new multi statement functions & the existing procedure parsing fails unless the statements are semi-colon delimited. Semi-colons are optional in SQL Server, so for example, this code executes fine for real but fails to parse here in this library, unless that "SET" ends with a
;
...I think this is because the procedure body parsing logic is using
parse_statements()
, which requires it? I'm game to tackle the problem with some guidance on what's a good approach to make it parse 🤔