-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: Add LiteralExpr
sub classes
#19475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f93c86f
to
cd01bd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and I haven't dug into all the specifics of the regular expressions — otherwise looks good to me :)
@@ -0,0 +1,15 @@ | |||
/** Provides sub classes of literal expressions. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the Ext
suffix in the filename indicate? I can't find any existing files with that suffix.
My first thought was it meant that these could be _ext_ended, but that's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just means the file is an extension to the functionality provided in LiteralExpr.qll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
s = this.getTextValue() and | ||
reg = IntegerLiteralRegexs::integerLiteral() and | ||
last = strictcount(reg.indexOf("(")) and | ||
result = s.regexpCapture(reg, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a helper predicate for getting the last capture group? Something like:
bindingset[s, reg]
string regexpCaptureLast(string s, string reg) {
exists(int i | result = s.regexpCapture(reg, i) and not exists(s.regexpCapture(reg, i + 1)))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will work; it will give you the last group that matches, but the suffix part is optional.
reg = | ||
[ | ||
FloatLiteralRegexs::floatLiteralSuffix1(), FloatLiteralRegexs::floatLiteralSuffix2(), | ||
FloatLiteralRegexs::integerSuffixLiteral() | ||
] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the helper predicate above we could instead do a disjunction here:
reg = | |
[ | |
FloatLiteralRegexs::floatLiteralSuffix1(), FloatLiteralRegexs::floatLiteralSuffix2(), | |
FloatLiteralRegexs::integerSuffixLiteral() | |
] and | |
reg = | |
FloatLiteralRegexs::floatLiteralSuffix1() + "|" + | |
FloatLiteralRegexs::floatLiteralSuffix2() + "|" + | |
FloatLiteralRegexs::integerSuffixLiteral() | |
and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I have some questions and comments...
*/ | ||
class CharLiteralExpr extends LiteralExpr { | ||
CharLiteralExpr() { | ||
// todo: proper implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What needs to be done to make this a "proper implementation"?
I was initially concerned about escaped quote characters, but I think with lazy matching they may just work. Testing will confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is a lot simpler than the official spec, which is why I added the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove the comment, or rephrase it away from "todo", if there is nothing specific we want to do here.
class StringLiteralExpr extends LiteralExpr { | ||
StringLiteralExpr() { | ||
// todo: proper implementation | ||
this.getTextValue().regexpMatch("r?#*\".*\"#*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What needs to be done to make this a "proper implementation"?
I think it's OK that this matches some invalid string literals (e.g. r##"foo"#
) as long as it matches all valid ones and there's no overlap with matching for literals of other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for char literals, the spec is more involved, but perhaps as you say the above is good enough.
s = this.getTextValue() and | ||
reg = IntegerLiteralRegexs::integerLiteral() and | ||
last = strictcount(reg.indexOf("(")) and | ||
result = s.regexpCapture(reg, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a scary approach, I'm not convinced it will be reliable.
Can't we just match against ".*(" + IntegerLiteralRegexs::suffix() + ")"
here and grab the first group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to be able to just reuse the existing regexes; one way this could be made better is if QL supported named groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the number of opening brackets isn't going to be equal to the number of capture groups, for example the regex (?is)(\\()
has three (
but only one group. You might be able to stick to simpler regexs where the relationship holds but it feels a bit fragile to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Would it be possible to write the regex'es such that the thing we want is always in a fixed capture group? Then we could write something like s.regexpCapture(reg, 3)
(or whatever the number it would be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can hard-code the numbers instead.
FloatLiteralRegexs::integerSuffixLiteral() | ||
] and | ||
last = strictcount(reg.indexOf("(")) and | ||
result = s.regexpCapture(reg, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thoughts as for IntegerLiteralExpr::getSuffix
.
.regexpMatch([ | ||
FloatLiteralRegexs::floatLiteral(), FloatLiteralRegexs::integerSuffixLiteral() | ||
]) and | ||
// E.g. `0x01_f32` is an integer, not a float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the cases 0x01_f32
and 0x01_e3
. I understand they're supposed to be understood as integers, I'm not sure why this is so (in the Rust language) and I'm not sure how it happens (in your QL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is exactly the restriction on this line that makes it an integer only; otherwise it would be consider a float as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how IntegerLiteralExpr
accepts 0x01_f32
when its suffix()
function doesn't include f32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x01_f32
is just a HEX number, underscores are allowed as separators.
6ec1565
to
7494eac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor and TIL about non-capturing groups ;)
IntegerLiteralRegexs::paren(FloatLiteralRegexs::floatLiteralSuffix2()) + "|" + | ||
IntegerLiteralRegexs::paren(FloatLiteralRegexs::integerSuffixLiteral()) and | ||
s = this.getTextValue() and | ||
result = s.regexpCapture(reg, [1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does [1, 2, 3]
not end up mixing things up? Do they all three always correspond to the suffix for all the three disjuncts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They each correspond to a suffix capturing group for one of the disjuncts, so all should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being dense, but if 3 is the right capture group for one of the disjuncts why won't trying 1 or 2 give the wrong thing for that disjunct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying 1
for, say, the third disjunct will simply not give a result, since capture group 1 does not exist in that disjunct, it exists in the first disjunct.
In preparation for #19474.
Eventually we will also want to have predicates for retrieving the literal values, e.g.
int IntegerLiteralExpr.getValue()
, but that is left as future work.