Skip to content

Commit 0a6fcc5

Browse files
authored
Merge pull request #401 from fandango-fuzzer/dev
Dev
2 parents dd980ed + 9b32a59 commit 0a6fcc5

File tree

16 files changed

+375
-140
lines changed

16 files changed

+375
-140
lines changed

docs/Conversion.md

Lines changed: 204 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,21 @@ kernelspec:
1111
---
1212

1313
(sec:conversion)=
14-
# Conversion and Compression
14+
# Data Conversions
1515

1616
When defining a complex input format, some parts may be the result of applying an _operation_ on another, more structured part.
17-
Most importantly, content may be _encoded_, _compressed_, or _sanitized_.
17+
Most importantly, content may be _encoded_, _compressed_, or _converted_.
1818

19-
Fandango uses a special form of [generators](sec:generators) to handle these, namely generators with _symbols_.
19+
Fandango uses a special form of [generators](sec:generators) to handle these, called _converters_.
20+
These are generator expressions with _symbols_, mostly functions that take symbols as arguments.
2021
Let's have a look at how these work.
2122

2223

23-
## Encoding Data
24+
## Encoding Data During Fuzzing
2425

2526
In Fandango, a [generator](sec:generators) expression can contain _symbols_ (enclosed in `<...>`) as elements.
26-
When fuzzing, this has the effect of Fandango using the grammar to
27+
Such generators are called _converters_.
28+
When fuzzing, converters have the effect of Fandango using the grammar to
2729

2830
* instantiate each symbol from the grammar,
2931
* evaluate the resulting expression, and
@@ -45,7 +47,8 @@ Of course, these can be decoded again:
4547
base64.b64decode(encoded)
4648
```
4749

48-
Let us assume we have a `<data>` field that contains a number of bytes:
50+
Let us make use of these functions.
51+
Assume we have a `<data>` field that contains a number of bytes:
4952

5053
```{code-cell}
5154
:tags: ["remove-input"]
@@ -73,10 +76,202 @@ In a third step, we embed the `<item>` into a (binary) string:
7376
!grep '^<start>' encode.fan
7477
```
7578

76-
The resulting [`encode.fan`](encode.fan) spec allows us to encode and embed binary data:
79+
The full resulting [`encode.fan`](encode.fan) spec looks like this:
7780

7881
```{code-cell}
79-
!fandango fuzz -f encode.fan -n 1
82+
:tags: ["remove-input"]
83+
!cat encode.fan
84+
```
85+
86+
With this, we can encode and embed binary data:
87+
88+
```shell
89+
$ fandango fuzz -f encode.fan -n 1
90+
```
91+
92+
```{code-cell}
93+
:tags: ["remove-input"]
94+
!fandango fuzz -f encode.fan -n 1 --random-seed 7
95+
assert _exit_code == 0
96+
```
97+
98+
In the same vein, one can use functions for compressing data or any other kind of conversion.
99+
100+
101+
## Sources, Encoders, and Constraints
102+
103+
When Fandango produces an input using a generator, it _saves_ the generated arguments as a _source_ in the produced derivation tree.
104+
Sources become visible as soon as the input is shown as a grammar:
105+
106+
```shell
107+
$ fandango fuzz -f encode.fan -n 1 --format=grammar
80108
```
81109

82-
In the same vein, one can use functions for compressing data or any other kind of conversion.
110+
```{code-cell}
111+
:tags: ["remove-input"]
112+
!fandango fuzz -f encode.fan -n 1 --format=grammar --random-seed 7
113+
assert _exit_code == 0
114+
```
115+
116+
In the definition of `<item>`, we see a generic converter `f(<data>)` as well as the definition of `<data>` that went into the generator.
117+
(The actual generator code, `base64.b64encode(bytes(<data>))`, is not saved in the derivation tree.)
118+
119+
We can visualize the resulting tree, using a double arrow between `<item>` and its source `<data>`, indicating that their values depend on each other:
120+
121+
```{code-cell}
122+
:tags: ["remove-input"]
123+
from Tree import Tree
124+
125+
tree = Tree('<start>',
126+
Tree(b'Data: '),
127+
Tree('<item>',
128+
Tree(b'RmFuZGFuZ29Nyhg='),
129+
sources=[
130+
Tree('<data>',
131+
Tree(b'Fandango'),
132+
Tree('<byte>', Tree('<_byte>', Tree(b'M'))),
133+
Tree('<byte>', Tree('<_byte>', Tree(b'\xca'))),
134+
Tree('<byte>', Tree('<_byte>', Tree(b'\x18')))
135+
),
136+
]
137+
)
138+
)
139+
tree.visualize()
140+
```
141+
142+
Since sources like `<data>` are preserved, we can use them in [constraints](sec:constraints).
143+
For instance, we can produce a string with specific values for `<data>`:
144+
145+
```shell
146+
$ fandango fuzz -f encode.fan -n 1 -c '<data> == b"Fandango author"'
147+
```
148+
149+
```{code-cell}
150+
:tags: ["remove-input"]
151+
!fandango fuzz -f encode.fan -n 1 -c '<data> == b"Fandango author"' --population-size 1
152+
assert _exit_code == 0
153+
```
154+
155+
Is this string a correct encoding of a correct string?
156+
We will see in the next section.
157+
158+
159+
## Decoding Parsed Data
160+
161+
So far, we can only _encode_ data during fuzzing.
162+
But what if we also want to _decode_ data, say during [parsing](sec:parsing)?
163+
Our `encode.fan` will help us _parse_ the data, but not decode it:
164+
165+
```shell
166+
$ echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode.fan
167+
```
168+
169+
```{code-cell}
170+
:tags: ["remove-input"]
171+
!echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode.fan
172+
assert _exit_code == 1
173+
```
174+
175+
The fact that parsing fails is not a big surprise, as we only have specified an _encoder_, but not a _decoder_.
176+
As the error message suggests, we need to add a generator for `<data>` - a decoder that converts `<item>` elements into `<data>`.
177+
178+
We can achieve this by providing a generator for `<data>` that builds on `<item>`:
179+
180+
```{code-cell}
181+
:tags: ["remove-input"]
182+
!grep '^<data>' encode-decode.fan
183+
```
184+
185+
Here, `base64.b64decode(bytes(<item>))` takes an `<item>` (which is previously parsed) and decodes it.
186+
The decoded result is parsed and placed in `<data>`.
187+
188+
The resulting [`encode-decode.fan`](encode-decode.fan) file now looks like this:
189+
190+
```{code-cell}
191+
:tags: ["remove-input"]
192+
!cat encode-decode.fan
193+
```
194+
195+
```{margin}
196+
Fandango allows generators in both directions so one `.fan` file can be used for fuzzing and parsing.
197+
```
198+
199+
If this looks like a mutual recursive definition, that is because it is.
200+
During fuzzing and parsing, Fandango tracks the _dependencies_ between generators and uses them to decide which generators to use first:
201+
202+
* When fuzzing, Fandango operates _top-down_, starting with the topmost generator encountered; their arguments are _produced_.
203+
In our case, this is the `<item>` generator, generating a value for `<data>`.
204+
* When parsing, Fandango operates _bottom-up_, starting with the lowest generators encountered; their arguments are _parsed_.
205+
In our case, this is the `<data>` generator, parsing a value for `<item>`.
206+
207+
In both case, when Fandango encounters a recursion, _it stops evaluating the generator_:
208+
209+
* When parsing an `<item>`, Fandango does not invoke the generator for `<data>` because `<data>` is being processed already.
210+
* Likewise, when producing `<data>`, Fandango does not invoke the generator for `<item>` because `<item>` is being processed already.
211+
212+
Let us see if all of this works and if this input is indeed properly parsed and decoded.
213+
214+
```shell
215+
$ echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode-decode.fan -o - --format=grammar
216+
```
217+
218+
```{code-cell}
219+
:tags: ["remove-input"]
220+
!echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode-decode.fan -o - --format=grammar
221+
assert _exit_code == 0
222+
```
223+
224+
We see that the `<data>` element contains the `"Fandango author"` string we provided as a constraint during generation.
225+
This is what the parsed derivation tree looks like:
226+
227+
```{code-cell}
228+
:tags: ["remove-input"]
229+
from Tree import Tree
230+
231+
tree = Tree('<start>',
232+
Tree(b'Data: '),
233+
Tree('<item>',
234+
Tree(b'RmFuZGFuZ28gYXV0aG9y'),
235+
sources=[
236+
Tree('<data>',
237+
Tree(b'Fandango'),
238+
Tree('<byte>', Tree('<_byte>', Tree(b' '))),
239+
Tree('<byte>', Tree('<_byte>', Tree(b'a'))),
240+
Tree('<byte>', Tree('<_byte>', Tree(b'u'))),
241+
Tree('<byte>', Tree('<_byte>', Tree(b't'))),
242+
Tree('<byte>', Tree('<_byte>', Tree(b'h'))),
243+
Tree('<byte>', Tree('<_byte>', Tree(b'o'))),
244+
Tree('<byte>', Tree('<_byte>', Tree(b'r')))
245+
),
246+
]
247+
)
248+
)
249+
tree.visualize()
250+
```
251+
252+
With a constraint, we can check that the decoded string is correct:
253+
254+
```shell
255+
$ echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode-decode.fan -c '<data> == b"Fandango author"'
256+
```
257+
258+
```{code-cell}
259+
:tags: ["remove-input"]
260+
!echo -n 'Data: RmFuZGFuZ28gYXV0aG9y' | fandango parse -f encode-decode.fan -c '<data> == b"Fandango author"'
261+
assert _exit_code == 0
262+
```
263+
264+
We get no error - so the parse was successful, and that all constraints hold.
265+
266+
267+
## Applications
268+
269+
The above scheme can be used for all kinds of encodings and compressions - and thus allow _translations between abstraction layers_.
270+
Typical applications include:
271+
272+
* _Compressed_ data (e.g. pixels in a GIF or PNG file)
273+
* _Encoded_ data (e.g. binary input as ASCII chars in MIME encodings)
274+
* _Converted_ data (e.g. ASCII to UTF-8 to UTF-16 and back)
275+
276+
Even though parts of the input are encoded (or compressed), you can still use _constraints_ to shape them.
277+
And if the encoding or compression can be inverted, you can also use it to _parse_ inputs again.

docs/DerivationTree.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -181,19 +181,22 @@ Invoking methods (`<SYMBOL>.METHOD()`), as well as operators (say, `<SYMBOL> + .
181181
Since any `<SYMBOL>` has the type `DerivationTree`, one must convert it first into a standard Python type before passing it as argument to a standard Python function.
182182

183183
`str(<SYMBOL>) -> str`
184-
: Convert `<SYMBOL>` into a Unicode string. If `<SYMBOL>` is a byte string, its contents are converted using `latin-1` encoding.
184+
: Convert `<SYMBOL>` into a Unicode string. Byte strings in `<SYMBOL>` are converted using `latin-1` encoding.
185+
186+
`bytes(<SYMBOL>) -> bytes`
187+
: Convert `<SYMBOL>` into a byte string. Unicode strings in `<SYMBOL>` are converted using `utf-8` encoding.
185188

186189
`int(<SYMBOL>) -> int`
187190
: Convert `<SYMBOL>` into an integer, like the Python `int()` function.
188191
`<SYMBOL>` must be an `int`, or a Unicode string or byte string representing an integer literal.
189192

190193
`float(<SYMBOL>) -> float`
191194
: Convert `<SYMBOL>` into a floating-point number, like the Python `float()` function.
192-
`<SYMBOL>` must be an int, or a Unicode string or byte string representing a float literal.
195+
`<SYMBOL>` must be an `int`, or a Unicode string or byte string representing a float literal.
193196

194197
`complex(<SYMBOL>) -> complex`
195198
: Convert `<SYMBOL>` into a complex number, like the Python `complex()` function.
196-
`<SYMBOL>` must be an int, or a Unicode string or byte string representing a float literal.
199+
`<SYMBOL>` must be an `int`, or a Unicode string or byte string representing a float or complex literal.
197200

198201
`bool(<SYMBOL>) -> bool`
199202
: Convert `<SYMBOL>` into a truth value:
@@ -276,6 +279,12 @@ Each element of the list can have a different type, depending on the type the `v
276279
: Return the parent of the current node, or `None` for the root node.
277280

278281

282+
### Accessing Sources
283+
284+
`<SYMBOL>.sources() -> list[DerivationTree]`
285+
: Return a list containing all sources of `<SYMBOL>`. Sources are symbols used in generator expressions out of which the value of `<SYMBOL>` was created; see [the section on data conversions](sec:conversion) for details.
286+
287+
279288
### Comparisons
280289

281290
`<SYMBOL_1> == <SYMBOL_2>`

docs/Tree.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,21 @@ def visualize(self, title="Derivation Tree"):
3636
def _visualize(self):
3737
name = f"node-{Tree.id_counter}"
3838
Tree.id_counter += 1
39-
if isinstance(self.symbol, int):
40-
label = str(self.symbol)
41-
else:
39+
if str(self.symbol).startswith("<"):
4240
label = self.symbol
41+
else:
42+
label = repr(self.symbol)
4343

4444
# https://graphviz.org/doc/info/colors.html
45+
# Colors checked against color vision deficiency
4546
if isinstance(self.symbol, int):
4647
color = "bisque4"
48+
elif isinstance(self.symbol, bytes):
49+
color = "darkblue"
4750
elif self.symbol.startswith("<"):
4851
color = "firebrick"
4952
else:
50-
color = "darkblue"
53+
color = "olivedrab4"
5154

5255
label = label.replace("<", "\\<")
5356
label = label.replace(">", "\\>")
@@ -59,6 +62,6 @@ def _visualize(self):
5962

6063
for source in self.sources():
6164
source_name = source._visualize()
62-
Tree.dot.edge(name, source_name)
65+
Tree.dot.edge(name, source_name, style="dotted", color="gray", dir="both")
6366

6467
return name

evaluation/experiments/generator_params/generator_params.fan

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ def un_convert(input):
55
return input[::-1]
66

77
<start> ::= <number> <rev_number>
8-
<rev_number> ::= <number_tail>{0, 2} <number_start> :: convert(<source_number>.to_string())
9-
<source_number> ::= <number> :: un_convert(<rev_number>.to_string())
8+
<rev_number> ::= <number_tail>{0, 2} <number_start> := convert(<source_number>.to_string())
9+
<source_number> ::= <number> := un_convert(<rev_number>.to_string())
1010

1111
<number> ::= <number_start> <number_tail>{0, 2}
1212
<number_start> ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

evaluation/experiments/generator_params/generator_params.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55

66
def count_g_params(tree: DerivationTree):
77
count = 0
8-
if len(tree.generator_params) > 0:
8+
if len(tree.sources) > 0:
99
count += 1
1010
for child in tree.children:
1111
count += count_g_params(child)
12-
for child in tree.generator_params:
12+
for child in tree.sources:
1313
count += count_g_params(child)
1414
return count
1515

evaluation/experiments/transactions/transactions.fan

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ def compute_end_balance_sender(start_balance, amount):
1515
<info> ::= ' <info>\n' <currency> <stmt_date> <amount>' </info>'
1616
<currency> ::= ' <currency>' 'EUR' '</currency>\n' | ' <currency>' 'USD' '</currency>\n'
1717
<stmt_date> ::= ' <stmt_date>' <timestamp> '</stmt_date>\n'
18-
<timestamp> ::= <digit>+ :: str(int(time.time()))
18+
<timestamp> ::= <digit>+ := str(int(time.time()))
1919
<amount> ::= ' <amount>' <am> '</amount>\n'
2020
<am> ::= <digit>+ ;
2121
<sender> ::= '\n <sender>' <name> <account_no> <bank_key> <start_balance> <end_balance> ' </sender>'
2222
<receiver> ::= '\n <receiver>' <name> <account_no> <bank_key> <start_balance> <end_balance> ' </receiver>\n'
2323
<name> ::= '\n <name>' <name_str> '</name>'
24-
<name_str> ::= <letter>* :: str(fake.name()).upper();
24+
<name_str> ::= <letter>* := str(fake.name()).upper();
2525
<account_no> ::= '\n <account_no>' <account_number> '</account_no>\n'
2626
<account_number> ::= <digit><digit><digit><digit><digit> <digit><digit><digit><digit><digit> <digit><digit><digit><digit><digit> <digit><digit><digit><digit><digit> <digit><digit>
2727
<bank_key> ::= ' <bank_key>' <bank_id> '</bank_key>\n'

evaluation/vs_isla/tar_evaluation/tar.fan

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<file_name_prefix>
2222
<header_padding>
2323
;
24-
<file_name> ::= <file_name_first_char> <file_name_chars> <NULs> :: generate_file_name() ;
24+
<file_name> ::= <file_name_first_char> <file_name_chars> <NULs> := generate_file_name() ;
2525
<file_name_chars> ::= <file_name_char> <file_name_chars> | "" ;
2626
<NULs> ::= <NUL> <NULs> | "" ;
2727
<file_mode> ::= <octal_digit>{6} <SPACE> <NUL>;
@@ -31,10 +31,10 @@
3131
<mod_time> ::= <octal_digit>{11} <SPACE>;
3232
<checksum> ::= <octal_digit>{6} <NUL> <SPACE> ;
3333
<typeflag> ::= '0' ;
34-
<linked_file_name> ::= <file_name_first_char> <file_name_char_or_nul>{99} | <NUL>{100} :: generate_linked_file_name();
34+
<linked_file_name> ::= <file_name_first_char> <file_name_char_or_nul>{99} | <NUL>{100} := generate_linked_file_name();
3535
<file_name_char_or_nul> ::= <file_name_char> | <NUL> ;
36-
<uname> ::= <uname_first_char> <name_char_dollar_nul>{31} :: generate_uname("<uname>") ;
37-
<gname> ::= <uname_first_char> <name_char_dollar_nul>{31} :: generate_uname("<gname>") ;
36+
<uname> ::= <uname_first_char> <name_char_dollar_nul>{31} := generate_uname("<uname>") ;
37+
<gname> ::= <uname_first_char> <name_char_dollar_nul>{31} := generate_uname("<gname>") ;
3838
<name_char_dollar_nul> ::= <uname_char> | '$' | <NUL> ;
3939
<uname_first_char> ::=
4040
'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm'
@@ -49,7 +49,7 @@
4949
<dev_min_num> ::= <octal_digit>{6} <SPACE> <NUL> ;
5050
<file_name_prefix> ::= <NUL>{155};
5151
<header_padding> ::= <NUL>{12};
52-
<content> ::= <char_or_nul>{512} :: generate_content() ;
52+
<content> ::= <char_or_nul>{512} := generate_content() ;
5353
<char_or_nul> ::= <character> | <NUL> ;
5454
<final_entry> ::= <NUL>{1024};
5555
<octal_digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "fandango-fuzzer"
10-
version = "0.1.8"
10+
version = "0.2.0"
1111
authors = [
1212
{ name = "José Antonio Zamudio Amaya", email = "[email protected]" },
1313
{ name = "Marius Smytzek", email = "[email protected]" },

0 commit comments

Comments
 (0)