base64 and widebase64 modifiers #1185

wxsBSD · 2019-12-23T01:16:29Z

Add two new modifiers: base64 and widebase64.

These modifiers take the given text string and generate 3 different strings
being careful to trim off the bytes which are dependent upon leading or
trailing bytes in the larger search space.

I've implemented it by slightly cheating. I generate all the search strings in
a list and then creating one large string suitable for the RE compiler to deal
with. For example, the string "This program cannot" generates these three
base64 encoded strings:

VGhpcyBwcm9ncmFtIGNhbm5vd
RoaXMgcHJvZ3JhbSBjYW5ub3
UaGlzIHByb2dyYW0gY2Fubm90

Those three strings are then transformed into a RE that looks like this:

(VGhpcyBwcm9ncmFtIGNhbm5vd|RoaXMgcHJvZ3JhbSBjYW5ub3|UaGlzIHByb2dyYW0gY2Fubm90)

That string is then passed to the RE compiler for parsing and AST generation.
The AST is then emitted into the appropriate spot and YARA believes it was
given a regex from that point on.

I've also implemented support for specifying custom alphabets:

base64("!@#$%^&*(){}[].,|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu")

This means that I have to be careful to escape any special RE characters in the
string passed to the compiler. The base64 alphabet has to be 64 bytes long, and
does support escaped bytes properly too.

To avoid the need to deal with escaping I had a first implementation which
attempted to generate the AST by hand, which was mostly working but was very
cumbersome to maintain. In doing this I ended up improving yr_re_print_node()
so that it indents the tree properly, which made debugging that attempt easier.
My apologies for including it in this diff. I can split it out if you want.

Lastly, I did most of the work in re.c but I think it belongs in it's own
base64.c. I'm happy to move it out of re.c if you would prefer.

Add two new modifiers: base64 and widebase64. These modifiers take the given text string and generate 3 different strings being careful to trim off the bytes which are dependent upon leading or trailing bytes in the larger search space. I've implemented it by slightly cheating. I generate all the search strings in a list and then creating one large string suitable for the RE compiler to deal with. For example, the string "This program cannot" generates these three base64 encoded strings: VGhpcyBwcm9ncmFtIGNhbm5vd RoaXMgcHJvZ3JhbSBjYW5ub3 UaGlzIHByb2dyYW0gY2Fubm90 Those three strings are then transformed into a RE that looks like this: (VGhpcyBwcm9ncmFtIGNhbm5vd|RoaXMgcHJvZ3JhbSBjYW5ub3|UaGlzIHByb2dyYW0gY2Fubm90) That string is then passed to the RE compiler for parsing and AST generation. The AST is then emitted into the appropriate spot and YARA believes it was given a regex from that point on. I've also implemented support for specifying custom alphabets: base64("!@#$%^&*(){}[].,|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu") This means that I have to be careful to escape any special RE characters in the string passed to the compiler. The base64 alphabet has to be 64 bytes long, and does support escaped bytes properly too. To avoid the need to deal with escaping I had a first implementation which attempted to generate the AST by hand, which was mostly working but was very cumbersome to maintain. In doing this I ended up improving yr_re_print_node() so that it indents the tree properly, which made debugging that attempt easier. My apologies for including it in this diff. I can split it out if you want. Lastly, I did most of the work in re.c but I think it belongs in it's own base64.c. I'm happy to move it out of re.c if you would prefer.

wxsBSD · 2019-12-23T01:24:10Z

I'm putting this up for review so I can gather comments and iterate on it. I think this belongs in its own set of source files, instead of forcing it into re.(c|h) for the most part. I'll move things over to base64.(c|h) and get them into the build system, but would first like to gather feedback on the design.

The details are in the documentation (included in this PR) but the short version is that:

$a = "This program cannot" base64 will search for the three base64 representations of that string and $a = "This program cannot" base64 wide will apply the wide modifier and then the base64 - it does not include the plaintext version if you use base64.

The widebase64 modifier works just like the base64 modifier but inserts null bytes between each character in the base64 strings.

My first implementation of this was generating the RE AST by hand, but it was far too fragile so I switched over to the current method (generating an RE string and letting the RE compiler do the heavy lifting for me). This is an easy approach to this but I'm not super happy with it either. I think a better long-term answer here is to have a way to have different representations of strings but it is still only referenced by a single string internally. This is a lot more work and may not be worth the effort until a more concrete use-case is found.

Specifically, you can only use these modifiers with text strings (hex strings and regular expressions are not supported) and you can't use the xor or nocase modifiers in combination with these.

plusvic · 2020-01-07T10:14:21Z

I have two concerns about these modifiers, which are:

The more modifiers we have, the larger the number of possible combinations and therefore it's more difficult to understand how they interact between them. For example, a user may wonder what happens if base64 and nocase are used together. Does it take all possible case combinations for the original string and apply base64 to them? Does it find any case combination of the resulting base64 string? It's an error?. In this case it will be an error, but the more intuitive options if probably the first one. I'm afraid that by adding more modifiers is going to be difficult both for the user and for ourselves to reason about how the different modifiers interact between them.
These modifiers are convenient, but not necessary. Other modifiers like nocase and xor are necessary because the alternative would be writing hundreds or even thousands of possible strings, but base64 can be replaced by just three strings instead of one. It's certainly easier for the user to use base64 than having to deal with writing the three strings by herself, but considering the maintenance burden and the added complexity (even for the user) I'm not convinced about adding them.

Thoughts?

wxsBSD · 2020-01-07T20:02:25Z

The more modifiers we have, the larger the number of possible combinations and therefore it's more difficult to understand how they interact between them. For example, a user may wonder what happens if base64 and nocase are used together. Does it take all possible case combinations for the original string and apply base64 to them? Does it find any case combination of the resulting base64 string? It's an error?. In this case it will be an error, but the more intuitive options if probably the first one. I'm afraid that by adding more modifiers is going to be difficult both for the user and for ourselves to reason about how the different modifiers interact between them.

I understand this concern but I believe the benefit of a more expressive and readable language is worth it. You are right that it does increase the cognitive burden on users, but with good documentation (which I tried to provide a start to) we can help users understand what is and what is not supported. With tests, which I tried to make very thorough, I think we can lessen the burden on ourselves for maintenance.

These modifiers are convenient, but not necessary. Other modifiers like nocase and xor are necessary because the alternative would be writing hundreds or even thousands of possible strings, but base64 can be replaced by just three strings instead of one. It's certainly easier for the user to use base64 than having to deal with writing the three strings by herself, but considering the maintenance burden and the added complexity (even for the user) I'm not convinced about adding them.

You're right that these are more for convenience but not having them means each user needs to write code (or find someone else's) which does the base64 encoding with proper padding and trimming of the encoded strings. It's not hard to do, and there are a handful of implementations out there already. In fact, when I was writing this I was using one which turned out to be buggy. The fact that we can provide a convenient way for users to express this while also increasing readability makes this at least worth considering. The alternative is users needing to write or find a script to do this, and any bugs that may include. Also, this is more expressive than the versions I've seen that don't let you specify a custom alphabet or even wide support.

I'd much rather see $a = "This program cannot" base64 instead of the alternatives. :)

thehellu · 2020-01-08T15:19:41Z

Jumping on the debate as a Yara user, as some feedback was asked on Twitter.

At the moment, and probably as other folks, I use a third-party script to generate the 3 possible base64 strings for every input. This technique proved to be successful to monitor some threat actors.
When you search a specific string without knowing if it will be ascii or wide, you need to generate not only 3 but 6 strings. In Yara, this could be replaced by one line, with the "ascii" and "wide" modifier in addition to "base64" and "widebase64". So, as a user, those new modifiers would be useful, convenient, and would make rules more readable as they would turn 6 lines into 1. The ability to specify the alphabet is a nice bonus.

Regarding the risk of "user confusion", my feeling is that the basic user will not bother with the modifier, and the advanced user will read the documentation.

I cannot say how much burden it adds as a developer, my first thought was that the base64 algorithm is not going to change, so once the code is written and tested you should be fine, but then I realized you will need to handle the modifier against every new modifier that you may add in the future. You are in the best position to answer this question, and I will still cherish Yara if you decide not to implement the feature :)

JohnLaTwC · 2020-01-08T16:14:58Z

I strongly support this addition.

Base64 is a very common encoding format found across most major languages and platforms. It is used to encode multi-line parameters & scripts, encode binary content (e.g. compressed data), and to support common file encodings (e.g. X509 certificate encodings).

Attackers make use of Base64 encoding very often because it allows them to utilize native functionality to achieve simple obfuscation objectives. Payloads across .NET, PowerShell, Python, PHP, VBA Macros, and beyond have used base64. It is probably the #1 encoding used by attackers for simple evasion and obfuscation goals.

Matching encoded content is also the way an entry infosec practitioner can progress beyond matching on "plaintext strings" to find more advanced payloads. These payloads require additional analysis skills and tools (e.g. CyberChef) that help practitioners gain the skills necessary to defend their environments. Giving defenders more power in a convenient way simplifies the progression of mastery.

While this feature can be approximated by generating the appropriate strings in standalone tools, native support for this encoding would simplify the effort required by defenders to match content and eliminate points of error (e.g. forgetting to including multiple base64 strings to account for leading and trailing bytes, neglecting unicode, etc).

plusvic · 2020-01-08T16:44:44Z

Given the arguments, and how passionately @wxsBSD has defended this feature, I'm changing my mind. Let me do the final review before merging it.

libyara/grammar.y

libyara/parser.c

wxsBSD · 2020-01-13T01:34:55Z

I think all of the code I added in re.c belongs in a new file (I'm thinking base64.c) - really the majority of the code has nothing to do with the re code. By moving it to it's own file it will make things cleaner. Do you want me to do that?

plusvic · 2020-01-13T08:48:05Z

I totally agree on creating the base64.c file.

I haven't tested the build with bazel but the normal build process works and tests are passing.

plusvic · 2020-02-04T21:29:58Z

I was about to merge this PR and noticed that for some reason the tests in Travis CI are failing. They pass in my local machine though.

wxsBSD · 2020-02-04T23:12:34Z

Not sure why those are failing and I can’t find a way to get the test logs to see what failed exactly. :(

wxsBSD · 2020-02-05T01:18:32Z

The windows builds are broken because I didn't update the VS solution files - I can download a copy of VS in a VM if you want me to fix that, but it may be a day or two before I can get to it.

I have no idea why those tests are failing and I can't seem to get travis to give me the logs I need to debug it. Maybe it makes sense to modify the Makefile so it cats out the test-suite.log if tests fail? This way we can see the test failures in travis (and it just makes it easier to not have to cat that file all the time during development).

plusvic · 2020-02-11T22:16:35Z

The tests fail because of memory leaks:

=================================================================
==27990==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 585 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b245b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b245b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 585 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b21a1 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21a1)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b23b8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b23b8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b231b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b231b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b21d8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21d8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 240 byte(s) in 6 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1010 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1010)
    #2 0x4b227b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b227b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 228 byte(s) in 3 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x44f420 in sized_string_new (/home/travis/build/VirusTotal/yara/test-rules+0x44f420)

Direct leak of 72 byte(s) in 3 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b245b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b245b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 72 byte(s) in 3 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b21d8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21d8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 48 byte(s) in 2 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b23b8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b23b8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 48 byte(s) in 2 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b21a1 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21a1)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 30 byte(s) in 2 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x42a91d in yara_yylex (/home/travis/build/VirusTotal/yara/test-rules+0x42a91d)

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b231b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b231b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b227b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b227b)

Indirect leak of 1008 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x44f504 in sized_string_convert_to_wide (/home/travis/build/VirusTotal/yara/test-rules+0x44f504)
    #2 0x4b21a1 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21a1)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 558 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1502 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1502)
    #2 0x4b245b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b245b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 552 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x44f504 in sized_string_convert_to_wide (/home/travis/build/VirusTotal/yara/test-rules+0x44f504)
    #2 0x4b231b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b231b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 368 byte(s) in 6 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x44f504 in sized_string_convert_to_wide (/home/travis/build/VirusTotal/yara/test-rules+0x44f504)
    #2 0x4b227b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b227b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 330 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1502 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1502)
    #2 0x4b21d8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21d8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 330 byte(s) in 9 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b1502 in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b1502)
    #2 0x4b23b8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b23b8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 192 byte(s) in 8 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b231b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b231b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 168 byte(s) in 7 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b23b8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b23b8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 168 byte(s) in 7 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b21a1 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21a1)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 144 byte(s) in 6 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b245b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b245b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 144 byte(s) in 6 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b21d8 in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b21d8)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

Indirect leak of 120 byte(s) in 5 object(s) allocated from:
    #0 0x7fbe984d8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x4b0f6a in _yr_base64_create_nodes (/home/travis/build/VirusTotal/yara/test-rules+0x4b0f6a)
    #2 0x4b227b in yr_base64_ast_from_string (/home/travis/build/VirusTotal/yara/test-rules+0x4b227b)
    #3 0x4f1f7f  (/home/travis/build/VirusTotal/yara/test-rules+0x4f1f7f)

SUMMARY: AddressSanitizer: 7118 byte(s) leaked in 158 allocation(s).

plusvic · 2020-02-11T22:21:13Z

In Mac OS X the leaks are not detected by AddressSanitizer :(

wxsBSD · 2020-02-11T22:29:12Z

Sigh, will fix these tonight.

Most of these memory leaks were silly and were because I was forgetting to free the encoded string or the nodes upon successful creation of the AST. There were some leaks when the compiler would error in the case of incorrect alphabet size. Lastly, I realized that if you specify two different alphabets in your modifier the last one used would apply to the other base64 modifier, which is not intuitive behavior. Fix it by making different alphabets a compiler error. Add a test case for this while I'm here.

wxsBSD · 2020-02-12T04:43:52Z

OK, I've fixed the leaks (thanks for pointing them out) and also added a fix for an edge case I hadn't considered.

I've verified the leaks are fixed and all tests pass on a linux system and on OS X.

plusvic

Great! I'm just adding two more comments about things that discovered while looking again at the code.

plusvic · 2020-02-12T09:05:11Z

libyara/base64.c

+      FAIL_ON_ERROR_WITH_CLEANUP(
+          _yr_base64_create_nodes(wide_str, modifier.alphabet, 0, &head, &tail),
+          {
+            strcpy(error->message, "Failure encoding base64 wide string");


Are these errors necessary? I mean, if _yr_base64_create_nodes fails only on out-of-memory conditions, we don't need to add an error message here, and this function doesn't need to receive the pointer to RE_ERROR.

I did it this way so it matches yr_re_parse_hex() and yr_re_parse() when called from parser.c. Also, I eventually call into yr_re_parse() to do the final parsing, which requires an RE_ERROR*. This is arguably the more important piece since if the RE parsing fails we would like to get detailed error information.

libyara/base64.c

Dynamically allocate string to do base64 conversion. While here, remove trailing new lines.

malvidin · 2020-04-08T21:11:18Z

In the current (3.12.0 or 4.0.0) docs, I see that nocase and xor cannot be used with the base64 modifiers, and that base64 cannot be used with base64wide.

Should it also include a restriction on use with the fullword modifier?

Add two new modifiers: base64 and widebase64. These modifiers take the given text string and generate 3 different strings being careful to trim off the bytes which are dependent upon leading or trailing bytes in the larger search space. I've implemented it by slightly cheating. I generate all the search strings in a list and then creating one large string suitable for the RE compiler to deal with. For example, the string "This program cannot" generates these three base64 encoded strings: VGhpcyBwcm9ncmFtIGNhbm5vd RoaXMgcHJvZ3JhbSBjYW5ub3 UaGlzIHByb2dyYW0gY2Fubm90 Those three strings are then transformed into a RE that looks like this: (VGhpcyBwcm9ncmFtIGNhbm5vd|RoaXMgcHJvZ3JhbSBjYW5ub3|UaGlzIHByb2dyYW0gY2Fubm90) That string is then passed to the RE compiler for parsing and AST generation. The AST is then emitted into the appropriate spot and YARA believes it was given a regex from that point on. I've also implemented support for specifying custom alphabets: base64("!@#$%^&*(){}[].,|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu") This means that I have to be careful to escape any special RE characters in the string passed to the compiler. The base64 alphabet has to be 64 bytes long, and does support escaped bytes properly too. To avoid the need to deal with escaping I had a first implementation which attempted to generate the AST by hand, which was mostly working but was very cumbersome to maintain. In doing this I ended up improving yr_re_print_node() so that it indents the tree properly, which made debugging that attempt easier.

wxsBSD added 4 commits December 14, 2019 21:57

Add base64 and widebase64 documentation.

e1ae86d

Merge branch 'master' into base64

69902a9

Free base64 alphabets used in compiler.

a4be6d7

wxsBSD added 2 commits December 30, 2019 12:08

Add clarifications around what is not supported.

0c4238e

Specifically, you can only use these modifiers with text strings (hex strings and regular expressions are not supported) and you can't use the xor or nocase modifiers in combination with these.

Remove unused variable.

a060439

plusvic reviewed Jan 8, 2020

View reviewed changes

libyara/grammar.y Show resolved Hide resolved

libyara/parser.c Outdated Show resolved Hide resolved

wxsBSD added 2 commits January 12, 2020 20:09

Merge branch 'master' into base64

9236718

Switch back to 'base64wide' and make flags check explicit.

da15beb

wxsBSD added 2 commits January 13, 2020 08:00

Move base64 code to its own file.

c791556

I haven't tested the build with bazel but the normal build process works and tests are passing.

Merge branch 'master' into base64

7892d0c

plusvic reviewed Feb 12, 2020

View reviewed changes

wxsBSD added 2 commits February 12, 2020 09:05

Keep C89 compatibility.

987d2a3

Dynamically allocate string to do base64 conversion. While here, remove trailing new lines.

Use correct type for allocation.

6475fb2

Merge branch 'master' into base64

32e4802

plusvic merged commit 2e755ac into VirusTotal:master Feb 17, 2020

wxsBSD deleted the base64 branch February 18, 2020 00:58

base64 and widebase64 modifiers #1185

base64 and widebase64 modifiers #1185

Uh oh!

Conversation

wxsBSD commented Dec 23, 2019

Uh oh!

wxsBSD commented Dec 23, 2019

Uh oh!

plusvic commented Jan 7, 2020

Uh oh!

wxsBSD commented Jan 7, 2020

Uh oh!

thehellu commented Jan 8, 2020

Uh oh!

JohnLaTwC commented Jan 8, 2020

Uh oh!

plusvic commented Jan 8, 2020

Uh oh!

Uh oh!

Uh oh!

wxsBSD commented Jan 13, 2020

Uh oh!

plusvic commented Jan 13, 2020

Uh oh!

plusvic commented Feb 4, 2020

Uh oh!

wxsBSD commented Feb 4, 2020

Uh oh!

wxsBSD commented Feb 5, 2020

Uh oh!

plusvic commented Feb 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

plusvic commented Feb 11, 2020

Uh oh!

wxsBSD commented Feb 11, 2020

Uh oh!

wxsBSD commented Feb 12, 2020

Uh oh!

plusvic left a comment

Choose a reason for hiding this comment

Uh oh!

plusvic Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

wxsBSD Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malvidin commented Apr 8, 2020

Uh oh!

Uh oh!

plusvic commented Feb 11, 2020 •

edited

Loading