Skip to content

Add CREATE FUNCTION support for SQL Server #1808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 23, 2025

Conversation

aharpervc
Copy link
Contributor

@aharpervc aharpervc commented Apr 10, 2025

Partially related: #1800


For reference: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver16

In SQL Server, functions look a lot like procedures, with begin/end, multiple statements, and a return. To accommodate that here in the parser, BeginEndStatement was extracted from the conditional logic to be reused here, along with a RETURN statement type

Formerly, this was a new struct, but was later consolidated in with BeginEndStatement. I'm open to suggestions about how this could be improve (now or later). For example, since it's a BEGIN (statements) END construct, should this (and procedures) be using the statement block logic introduced here? https://github.com//pull/1791

Then also while I was working on this, I noticed that only SQL Server (so far...?) supports OR ALTER, so I introduced a new struct field to track that. (Again, very similar to CREATE PROCEDURE logic).

I'm not aiming to enable parsing every function type here in this PR -- just trying to get something operational, and the additional capabilities like table expressions, etc, can be added in later.


To continue the recent discussion, I noticed that both these new multi statement functions & the existing procedure parsing fails unless the statements are semi-colon delimited. Semi-colons are optional in SQL Server, so for example, this code executes fine for real but fails to parse here in this library, unless that "SET" ends with a ;...

CREATE OR ALTER PROCEDURE test (@foo INT, @bar VARCHAR(256))
AS
BEGIN
    SET @foo = @foo + 1
    SELECT @foo
END

I think this is because the procedure body parsing logic is using parse_statements(), which requires it? I'm game to tackle the problem with some guidance on what's a good approach to make it parse 🤔

@aharpervc aharpervc marked this pull request as ready for review April 10, 2025 20:06

self.expect_keyword_is(Keyword::AS)?;
self.expect_keyword_is(Keyword::BEGIN)?;
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this will suffer the same fate as other multi statement syntax currently in this parser, which is that semi-colons are required to parse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah if the desired behavior is to have optional semicolon delimited statements for mssql, what we could do would be to make the behavior configurable when parsing the statements list. We could do something like this?

pub fn parse_statements(&mut self) -> Result<Vec<Statement>, ParserError> {
    self.parse_statements_inner(true)
}

fn parse_statements_inner(&mut self, require_semicolon_delimiter) -> Result<Vec<Statement>, ParserError> {
    // ... same as the body of `parse_statements()` on main
    // ... I _think_ except for this part to make the behavior configurable
    if require_semicolon_delimiter && expecting_statement_delimiter {
        return self.expected("end of statement", self.peek_token());
    }
    // ...
}

then the mssql code can call the inner function directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you're getting at with regards to configuration, but I'm not yet sure how to actually implement that statement parsing logic.

Do you see the semicolon-less statement parsing question as a blocker for the PR? If not I'd prefer to get this in (per your other suggestions) and then tackle this separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'm not yet sure how to actually implement that statement parsing logic

hmm not sure I followed what we mean here? parsing a semicolon-less statement I imagine would follow from the above code snippet by calling self.parse_statements_inner(false)

It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aharpervc just in case, note that this part of the PR is unresolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder.

It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.

I was able to refactor the latter, and the former we'll need regardless of the semi colon situation. Does the latest iteration resolve your concerns here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does something like in this suggestion work just as well? its currently unclear to me why the semicolon handling is a special case of an IfCondition statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not deeply opposed to making an inner parse function, I guess, but it also doesn't seem required for this PR.

its currently unclear to me why the semicolon handling is a special case of an IfCondition statement

I reverted that change, I think either it was a problem while hacking on the branch or pollution from another branch 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have some of the related changes left in this codepath?

let semi_colon_expected = match values.last() {
Some(Statement::If(if_statement)) => if_statement.end_token.is_some(),
Some(_) => true,
None => false,
};
if semi_colon_expected {
self.expect_token(&Token::SemiColon)?;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to address a separate issue. The situation is addressed by the "create_function_with_conditional" test case example, which runs verified_stmt on this sql:

CREATE FUNCTION some_scalar_udf()
RETURNS INT
AS
BEGIN
    IF 1 = 2
    BEGIN
        RETURN 1;
    END;
    RETURN 0;
END

The difficulty is regarding the IF statement. The pre-existing parse_if_stmt logic advances the parse past semi-colons (if any) before looking for an else block. If there's no else block as in this example, the next token will be part of the next statement after END's semi-colon.

That conflicts with the pre-existing logic in parse_statement_list, which unconditionally required a semi-colon after each parsed statement.

Therefore, this example would incorrectly be a parser error. To work around that limitation I added this conditional logic to parse_statement_list to inspect the previous statement for an IF statement end block. If found, no semi colon is expected.


Thinking about it some more, I suppose it would be better to keep that concern inside the parse_if_stmt function rather than bleeding out into parse_statement_list. I've done that now on the branch, thoughts?


self.expect_keyword_is(Keyword::AS)?;
self.expect_keyword_is(Keyword::BEGIN)?;
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah if the desired behavior is to have optional semicolon delimited statements for mssql, what we could do would be to make the behavior configurable when parsing the statements list. We could do something like this?

pub fn parse_statements(&mut self) -> Result<Vec<Statement>, ParserError> {
    self.parse_statements_inner(true)
}

fn parse_statements_inner(&mut self, require_semicolon_delimiter) -> Result<Vec<Statement>, ParserError> {
    // ... same as the body of `parse_statements()` on main
    // ... I _think_ except for this part to make the behavior configurable
    if require_semicolon_delimiter && expecting_statement_delimiter {
        return self.expected("end of statement", self.peek_token());
    }
    // ...
}

then the mssql code can call the inner function directly

@aharpervc aharpervc force-pushed the mssql-create-function branch 2 times, most recently from f5857d9 to 618eb4d Compare April 14, 2025 14:37
@aharpervc aharpervc requested a review from iffyio April 14, 2025 14:39
@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> {
}
}

fn parse_return(&mut self) -> Result<Statement, ParserError> {
let expr = self.parse_expr()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a maybe_parse, like how parse_raise_stmt does it for Exprs? The approach I picked for the discussion of RETURN variants above was to only do what's needed for this PR & these tests, which always have an Expr after the return.

So I think perhaps what we'd need in a followup PR is:

  1. test case for bare RETURN
  2. update this to be use "maybe" logic to find an expr (perhaps we'd like maybe_expr to wrap up that pattern?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?

Otherwise indeed the expectation would be that this uses maybe_parse and we have the test cases in place as part of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?

I don't think this is possible, since SQL Server requires that functions have a return, so create function goes together with return. I'll take another pass on the PR though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah alright, in that case I think its only the test cases missing for the return statement we can add those to common.rs I imagine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test case for a bare RETURN to sqlparser_common.rs, although I'm unsure how useful it is. At least for SQL Server, a top level return must be bare (eg, can't return anything). Therefore, the more useful tests are where return is inside a function body, trigger body, etc. And we have those tests in their respective branches.

@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> {
}
}

fn parse_return(&mut self) -> Result<Statement, ParserError> {
let expr = self.parse_expr()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove the return statement bits from this PR and tackle it separately as its own PR?

Otherwise indeed the expectation would be that this uses maybe_parse and we have the test cases in place as part of this PR


self.expect_keyword_is(Keyword::AS)?;
self.expect_keyword_is(Keyword::BEGIN)?;
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'm not yet sure how to actually implement that statement parsing logic

hmm not sure I followed what we mean here? parsing a semicolon-less statement I imagine would follow from the above code snippet by calling self.parse_statements_inner(false)

It would indeed be good to resolve in this PR, currently in the PR we seem to have a workaround for this behavior here and here and another in the Go statement PR.

@aharpervc aharpervc force-pushed the mssql-create-function branch 3 times, most recently from 580114e to 9954416 Compare April 15, 2025 21:34
@aharpervc aharpervc requested a review from iffyio April 15, 2025 22:01
src/ast/spans.rs Outdated
end_token: AttachedToken(end),
} => union_spans([start.span, end.span].into_iter()),
ConditionalStatements::BeginEnd(bes) => {
union_spans([bes.begin_token.0.span, bes.end_token.0.span].into_iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we impl Spanned for BeginEndStatements and then call bes.span() here instead? That would make the span impl reusable by nother nodes that embed the BeginEndStatements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you're asking for, I'll give the implementation a shot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

src/ast/mod.rs Outdated
}
}
}

/// A shared representation of `BEGIN`, multiple statements, and `END` tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A shared representation of `BEGIN`, multiple statements, and `END` tokens.
/// Represents a list of statements enclosed within `BEGIN` and `END` keywords.
/// Example:
/// ```sql
/// BEGIN
/// SELECT 1;
/// SELECT 2;
/// END
/// ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -15017,6 +15075,13 @@ impl<'a> Parser<'a> {
}
}

fn parse_return(&mut self) -> Result<Statement, ParserError> {
let expr = self.parse_expr()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah alright, in that case I think its only the test cases missing for the return statement we can add those to common.rs I imagine

@aharpervc aharpervc force-pushed the mssql-create-function branch 2 times, most recently from 008d821 to 1e51121 Compare April 16, 2025 21:22

#[test]
fn parse_return() {
all_dialects().verified_stmt("RETURN");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this check assert the AST? Also we can add another test case that includes a return statement having an expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this check assert the AST?

Done 👍

Also we can add another test case that includes a return statement having an expression

Done, although this is invalid for SQL Server. I think having that test here would also mean we can't have a test that asserts an error when return with value is found as a top level statement. Thoughts? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah given that its currently part of the parser behavior I think its reasonable to assert it. If the parser later stops accepting a top-level return value then we would change the test here to assert an error instead


self.expect_keyword_is(Keyword::AS)?;
self.expect_keyword_is(Keyword::BEGIN)?;
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does something like in this suggestion work just as well? its currently unclear to me why the semicolon handling is a special case of an IfCondition statement

@aharpervc aharpervc force-pushed the mssql-create-function branch 3 times, most recently from 49f746c to 6d0ea64 Compare April 18, 2025 19:53
@aharpervc aharpervc requested a review from iffyio April 18, 2025 20:19
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aharpervc! Left a couple minor comments, I think beyond the if condition semicolon special handling the changes look reasonable to me!

Comment on lines 5142 to 5137
let return_type = if self.parse_keyword(Keyword::RETURNS) {
Some(self.parse_data_type()?)
} else {
return parser_err!("Expected RETURNS keyword", self.peek_token().span.start);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let return_type = if self.parse_keyword(Keyword::RETURNS) {
Some(self.parse_data_type()?)
} else {
return parser_err!("Expected RETURNS keyword", self.peek_token().span.start);
};
self.expect_keyword(Keyword::RETURNS)?;
let return_type = Some(self.parse_data_type()?);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -15063,6 +15127,15 @@ impl<'a> Parser<'a> {
message: Box::new(self.parse_expr()?),
}))
}
/// Parse [Statement::Return]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Parse [Statement::Return]
/// Parse [Statement::Return]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍


self.expect_keyword_is(Keyword::AS)?;
self.expect_keyword_is(Keyword::BEGIN)?;
let function_body = Some(CreateFunctionBody::MultiStatement(self.parse_statements()?));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have some of the related changes left in this codepath?

let semi_colon_expected = match values.last() {
Some(Statement::If(if_statement)) => if_statement.end_token.is_some(),
Some(_) => true,
None => false,
};
if semi_colon_expected {
self.expect_token(&Token::SemiColon)?;
}


#[test]
fn parse_return() {
all_dialects().verified_stmt("RETURN");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah given that its currently part of the parser behavior I think its reasonable to assert it. If the parser later stops accepting a top-level return value then we would change the test here to assert an error instead

- in this dialect, functions can have statement(s) bodies like stored procedures (including `BEGIN`..`END`)
- functions must end with `RETURN`, so a corresponding statement type is also introduced
@aharpervc aharpervc force-pushed the mssql-create-function branch from bfd767e to b451b2d Compare April 21, 2025 18:46
- no need for special IF logic to bleed over to another function
@aharpervc aharpervc force-pushed the mssql-create-function branch from e04f4b7 to ca28413 Compare April 21, 2025 18:52
@aharpervc aharpervc requested a review from iffyio April 21, 2025 21:40
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @aharpervc!
cc @alamb

@iffyio iffyio merged commit 2eb1e7b into apache:main Apr 23, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants