Skip to content
This repository was archived by the owner on Jun 12, 2024. It is now read-only.

Commit 76acf68

Browse files
authored
base64 and widebase64 modifiers (VirusTotal#1185)
Add two new modifiers: base64 and widebase64. These modifiers take the given text string and generate 3 different strings being careful to trim off the bytes which are dependent upon leading or trailing bytes in the larger search space. I've implemented it by slightly cheating. I generate all the search strings in a list and then creating one large string suitable for the RE compiler to deal with. For example, the string "This program cannot" generates these three base64 encoded strings: VGhpcyBwcm9ncmFtIGNhbm5vd RoaXMgcHJvZ3JhbSBjYW5ub3 UaGlzIHByb2dyYW0gY2Fubm90 Those three strings are then transformed into a RE that looks like this: (VGhpcyBwcm9ncmFtIGNhbm5vd|RoaXMgcHJvZ3JhbSBjYW5ub3|UaGlzIHByb2dyYW0gY2Fubm90) That string is then passed to the RE compiler for parsing and AST generation. The AST is then emitted into the appropriate spot and YARA believes it was given a regex from that point on. I've also implemented support for specifying custom alphabets: base64("!@#$%^&*(){}[].,|ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu") This means that I have to be careful to escape any special RE characters in the string passed to the compiler. The base64 alphabet has to be 64 bytes long, and does support escaped bytes properly too. To avoid the need to deal with escaping I had a first implementation which attempted to generate the AST by hand, which was mostly working but was very cumbersome to maintain. In doing this I ended up improving yr_re_print_node() so that it indents the tree properly, which made debugging that attempt easier.
1 parent 0fa5e97 commit 76acf68

19 files changed

+2736
-1384
lines changed

docs/writingrules.rst

+62-1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ keywords are reserved and cannot be used as an identifier:
6666
- uint32be
6767
- wide
6868
- xor
69+
- base64
70+
- base64wide
6971
-
7072

7173
Rules are generally composed of two sections: strings definition and condition.
@@ -362,7 +364,7 @@ The following rule will search for every single byte xor applied to the string
362364
$xor_string = "This program cannot" xor
363365
364366
condition:
365-
$xor_string
367+
$xor_string
366368
}
367369
368370
The above rule is logically equivalent to:
@@ -435,6 +437,65 @@ If you want more control over the range of bytes used with the xor modifier use:
435437
The above example will apply the bytes from 0x01 to 0xff, inclusively, to the
436438
string when searching. The general syntax is ``xor(minimum-maximum)``.
437439

440+
base64 strings
441+
^^^^^^^^^^^^^^
442+
443+
The ``base64`` modifier can be used to search for strings that have been base64
444+
encoded. A good explanation of the technique is at:
445+
446+
https://www.leeholmes.com/blog/2019/12/10/searching-for-content-in-base-64-strings-2/
447+
448+
The following rule will search for the three base64 permutations of the string
449+
"This program cannot":
450+
451+
.. code-block:: yara
452+
453+
rule Base64Example1
454+
{
455+
strings:
456+
$a = "This program cannot" base64
457+
458+
condition:
459+
$a
460+
}
461+
462+
This will cause YARA to search for these three permutations:
463+
464+
VGhpcyBwcm9ncmFtIGNhbm5vd
465+
RoaXMgcHJvZ3JhbSBjYW5ub3
466+
UaGlzIHByb2dyYW0gY2Fubm90
467+
468+
The ``base64wide`` modifier works just like the base64 modifier but the results
469+
of the base64 modifier are converted to wide.
470+
471+
The interaction between ``base64`` (or ``base64wide``) and ``wide`` and
472+
``ascii`` is as you might expect. ``wide`` and ``ascii`` are applied to the
473+
string first, and then the ``base64`` and ``base64wide`` modifiers are applied.
474+
At no point is the plaintext of the ``ascii`` or ``wide`` versions of the
475+
strings included in the search. If you want to also include those you can put
476+
them in a secondary string.
477+
478+
The ``base64`` and ``widebas64`` modifiers also support a custom alphabet. For
479+
example:
480+
481+
.. code-block:: yara
482+
483+
rule Base64Example2
484+
{
485+
strings:
486+
$a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")
487+
488+
condition:
489+
$a
490+
}
491+
492+
The alphabet must be 64 bytes long.
493+
494+
The ``base64`` and ``base64wide`` modifiers are only supported with text
495+
strings. Using these modifiers with a hexadecmial string or a regular expression
496+
will cause a compiler error. Also, the ``xor`` and ``nocase`` modifiers used in
497+
combination with ``base64`` or ``base64wide`` will cause a compiler error.
498+
438499
Searching for full words
439500
^^^^^^^^^^^^^^^^^^^^^^^^
440501

libyara/Makefile.am

+2
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ yarainclude_HEADERS = \
9595
include/yara/ahocorasick.h \
9696
include/yara/arena.h \
9797
include/yara/atoms.h \
98+
include/yara/base64.h \
9899
include/yara/bitmask.h \
99100
include/yara/compiler.h \
100101
include/yara/error.h \
@@ -158,6 +159,7 @@ libyara_la_SOURCES = \
158159
ahocorasick.c \
159160
arena.c \
160161
atoms.c \
162+
base64.c \
161163
bitmask.c \
162164
compiler.c \
163165
endian.c \

libyara/atoms.c

+6-2
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,7 @@ static int _yr_atoms_case_insensitive(
812812
// _yr_atoms_xor
813813
//
814814
// For a given list of atoms returns another list after a single byte xor
815-
// has been applied to it (0x01 - 0xff).
815+
// has been applied to it.
816816
//
817817

818818
static int _yr_atoms_xor(
@@ -1411,7 +1411,11 @@ int yr_atoms_extract_from_re(
14111411
*atoms = NULL;
14121412
});
14131413

1414-
if (modifier.flags & STRING_GFLAGS_WIDE)
1414+
// Don't do convert atoms to wide here if either base64 modifier is used.
1415+
// This is to avoid the situation where we have "base64 wide" because
1416+
// the wide has already been applied BEFORE the base64 encoding.
1417+
if (modifier.flags & STRING_GFLAGS_WIDE &&
1418+
!(modifier.flags & STRING_GFLAGS_BASE64 || modifier.flags & STRING_GFLAGS_BASE64_WIDE))
14151419
{
14161420
FAIL_ON_ERROR_WITH_CLEANUP(
14171421
_yr_atoms_wide(*atoms, &wide_atoms),

0 commit comments

Comments
 (0)