Skip to content

Fix escaped character handling in regex #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2021

Conversation

msm-code
Copy link
Contributor

I've tried to fix issue #186.

This was suspiciously easy (especially compared to my previous PR: #93), so please check that I don't do something obviously wrong. FWIW this fixes the issue at hand. Using the POC from the linked issue:

import yaramod

y = yaramod.Yaramod(yaramod.Features.AllCurrent)
yara_file = y.parse_string(r'''
rule regex_test {
    strings:
        $a = /1\x32?3/
    condition:
        $a
}''')
for rule in yara_file.rules:
    for string in rule.strings:
        for unit in string.unit.units:
            print(type(unit), unit.text)

Previous results:

<class 'yaramod.RegexpText'> 1
<class 'yaramod.RegexpText'> \x
<class 'yaramod.RegexpText'> 3
<class 'yaramod.RegexpOptional'> 2?
<class 'yaramod.RegexpText'> 3

Current (correct) results:

<class 'yaramod.RegexpText'> 1
<class 'yaramod.RegexpOptional'> \x32?
<class 'yaramod.RegexpText'> 3

I've also added a test.

Fixes #186.

@@ -375,6 +375,9 @@ void ParserDriver::defineTokens()

return std::make_pair(range, range);
});
_parser.token(R"(\\x[a-zA-Z0-9]{2})").states("$regexp").symbol("REGEXP_ESCAPE").description("regexp escaped character").action([](std::string_view str) -> Value {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches hex escaped characters, like "\x61", as a single token.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we care about other encoding methods as well? It seems like \65? is a valid regex as well, not sure about the YARA side though 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yara doesn't support that:

rule kot {
    strings: $a = /kot\61kot/
    condition: $a
}
❯ yara -rs kot.yar /tmp/a.txt
kot.yar(2): error: invalid regular expression "$a": backreferences are not allowed

But sure, we should check for any other alternative encodings (if they exist). By the way I've checked, and \X61 is not a valid encoding (x must be lowercase).

@TadeasKucera
Copy link
Contributor

Thank you very much for this very nicely written PR. It is good, I will merge it.

@TadeasKucera TadeasKucera merged commit ef3a54a into avast:master Nov 29, 2021
@msm-code msm-code deleted the fix-optional-regex-rules branch November 29, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escaped characters with operators are parsed incorrectly
3 participants