Skip to content

Rust: Add LiteralExpr sub classes #19475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2025
Merged

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented May 12, 2025

In preparation for #19474.

Eventually we will also want to have predicates for retrieving the literal values, e.g. int IntegerLiteralExpr.getValue(), but that is left as future work.

@github-actions github-actions bot added the Rust Pull requests that update Rust code label May 12, 2025
@hvitved hvitved force-pushed the rust/literal-sub-classes branch from f93c86f to cd01bd0 Compare May 12, 2025 18:35
@hvitved hvitved marked this pull request as ready for review May 13, 2025 07:12
@hvitved hvitved requested a review from a team as a code owner May 13, 2025 07:12
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and I haven't dug into all the specifics of the regular expressions — otherwise looks good to me :)

@@ -0,0 +1,15 @@
/** Provides sub classes of literal expressions. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the Ext suffix in the filename indicate? I can't find any existing files with that suffix.

My first thought was it meant that these could be _ext_ended, but that's not the case.

Copy link
Contributor

@geoffw0 geoffw0 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just means the file is an extension to the functionality provided in LiteralExpr.qll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

s = this.getTextValue() and
reg = IntegerLiteralRegexs::integerLiteral() and
last = strictcount(reg.indexOf("(")) and
result = s.regexpCapture(reg, last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a helper predicate for getting the last capture group? Something like:

bindingset[s, reg]
string regexpCaptureLast(string s, string reg) {
  exists(int i | result = s.regexpCapture(reg, i) and not exists(s.regexpCapture(reg, i + 1)))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will work; it will give you the last group that matches, but the suffix part is optional.

Comment on lines 211 to 215
reg =
[
FloatLiteralRegexs::floatLiteralSuffix1(), FloatLiteralRegexs::floatLiteralSuffix2(),
FloatLiteralRegexs::integerSuffixLiteral()
] and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the helper predicate above we could instead do a disjunction here:

Suggested change
reg =
[
FloatLiteralRegexs::floatLiteralSuffix1(), FloatLiteralRegexs::floatLiteralSuffix2(),
FloatLiteralRegexs::integerSuffixLiteral()
] and
reg =
FloatLiteralRegexs::floatLiteralSuffix1() + "|" +
FloatLiteralRegexs::floatLiteralSuffix2() + "|" +
FloatLiteralRegexs::integerSuffixLiteral()
and

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I have some questions and comments...

*/
class CharLiteralExpr extends LiteralExpr {
CharLiteralExpr() {
// todo: proper implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs to be done to make this a "proper implementation"?

I was initially concerned about escaped quote characters, but I think with lazy matching they may just work. Testing will confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is a lot simpler than the official spec, which is why I added the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove the comment, or rephrase it away from "todo", if there is nothing specific we want to do here.

class StringLiteralExpr extends LiteralExpr {
StringLiteralExpr() {
// todo: proper implementation
this.getTextValue().regexpMatch("r?#*\".*\"#*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs to be done to make this a "proper implementation"?

I think it's OK that this matches some invalid string literals (e.g. r##"foo"#) as long as it matches all valid ones and there's no overlap with matching for literals of other types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for char literals, the spec is more involved, but perhaps as you say the above is good enough.

s = this.getTextValue() and
reg = IntegerLiteralRegexs::integerLiteral() and
last = strictcount(reg.indexOf("(")) and
result = s.regexpCapture(reg, last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a scary approach, I'm not convinced it will be reliable.

Can't we just match against ".*(" + IntegerLiteralRegexs::suffix() + ")" here and grab the first group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to be able to just reuse the existing regexes; one way this could be made better is if QL supported named groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the number of opening brackets isn't going to be equal to the number of capture groups, for example the regex (?is)(\\() has three ( but only one group. You might be able to stick to simpler regexs where the relationship holds but it feels a bit fragile to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Would it be possible to write the regex'es such that the thing we want is always in a fixed capture group? Then we could write something like s.regexpCapture(reg, 3) (or whatever the number it would be).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can hard-code the numbers instead.

FloatLiteralRegexs::integerSuffixLiteral()
] and
last = strictcount(reg.indexOf("(")) and
result = s.regexpCapture(reg, last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thoughts as for IntegerLiteralExpr::getSuffix.

.regexpMatch([
FloatLiteralRegexs::floatLiteral(), FloatLiteralRegexs::integerSuffixLiteral()
]) and
// E.g. `0x01_f32` is an integer, not a float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the cases 0x01_f32 and 0x01_e3. I understand they're supposed to be understood as integers, I'm not sure why this is so (in the Rust language) and I'm not sure how it happens (in your QL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is exactly the restriction on this line that makes it an integer only; otherwise it would be consider a float as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how IntegerLiteralExpr accepts 0x01_f32 when its suffix() function doesn't include f32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x01_f32 is just a HEX number, underscores are allowed as separators.

@hvitved hvitved force-pushed the rust/literal-sub-classes branch from 6ec1565 to 7494eac Compare May 13, 2025 12:26
paldepind
paldepind previously approved these changes May 13, 2025
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor and TIL about non-capturing groups ;)

IntegerLiteralRegexs::paren(FloatLiteralRegexs::floatLiteralSuffix2()) + "|" +
IntegerLiteralRegexs::paren(FloatLiteralRegexs::integerSuffixLiteral()) and
s = this.getTextValue() and
result = s.regexpCapture(reg, [1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does [1, 2, 3] not end up mixing things up? Do they all three always correspond to the suffix for all the three disjuncts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They each correspond to a suffix capturing group for one of the disjuncts, so all should be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being dense, but if 3 is the right capture group for one of the disjuncts why won't trying 1 or 2 give the wrong thing for that disjunct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying 1 for, say, the third disjunct will simply not give a result, since capture group 1 does not exist in that disjunct, it exists in the first disjunct.

@hvitved hvitved merged commit 3fc9da7 into github:main May 13, 2025
17 checks passed
@hvitved hvitved deleted the rust/literal-sub-classes branch May 13, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants