Skip to content

Commit fce82c4

Browse files
committed
Added README for merge_drcov
1 parent c15dc5d commit fce82c4

File tree

1 file changed

+144
-0
lines changed

1 file changed

+144
-0
lines changed

utils/drcov/README.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# About
2+
This directory contains utilities for collecting and processing coverage information in DrCov format with additional capabilities.
3+
4+
Capabilities:
5+
1. Counting the number of times each block was executed
6+
1. Aggregating coverage information over time
7+
1. Extracting BB addresses and size from PCTable
8+
1. Symbolization using the debug_line info
9+
10+
Original DrCov files have a rigid format that does not include the counters, thus the counters are contained in a companion drcov.cnt files in a binary format. On the other hand, DrCov comes with a number of useful utilities that visualise coverage and thus we keep the original format. To be precise, `drcov2lcov` converts drcov files to the lcov-formatted files, which can be converted to HTML using `genhtml`.
11+
12+
# Collecting coverage
13+
DynamoRio collects DrCov using its runtime instrumentation. In LibAFL, we support collecting it using both compile-time and runtime instrumentation. This allows for a unified backend processing and analytics.
14+
15+
## Static instrumentation
16+
LibAFL uses llvm coverage instrumentation when compiling form sources. This is done by using a compiler wrapper libafl_cc that adds the needed options to the clang's command line. While regular LibAFL coverage adds just `-fsanitize-coverage=trace-pc-guard`, for DrCov coverage the libafl_cc should add `-fsanitize-coverage=pc-table,bb,trace-pc-guard`.
17+
The meaning of the additions is:
18+
- `bb` causes the compiler to instrument each basic block (the default is 'edge' which introduces artificial BBs to track the edges that are known statically).
19+
- `pc-table` adds to the result ELF file a section that holds offsets of each BB. Without it, trace-pc-guard instrumentation does not provide any address information, just inserts callbacks to thecoverage instrumentation.
20+
21+
It is also important to compile the source with the debug information (`-g`) to be able to map the addresses to the source code lines.
22+
23+
The accumulated counters files are created by the `AccMapObserver` that collectes the data from the same in-memory map the regular coverage uses.
24+
See the `fuzzers/libfuzzer_libpng` in this repo for an example of how to use static DrCov collection and look into its README for more information.
25+
See also the `fuzzers/forkserver_libafl_cc` for an example of collecting coverage from a forked process with static instrumentation.
26+
27+
The static instrumentation flow:
28+
```puml
29+
file foo.c as src #yellow
30+
node libafl_cc
31+
file foo_exe
32+
folder coverage{
33+
artifact id_from_to.drcov
34+
artifact id_from_to.drcov.cnt
35+
}
36+
note as foo_exe_note
37+
Contains PCGuard callbacks
38+
and PCTable
39+
end note
40+
src -d-> libafl_cc
41+
libafl_cc -> foo_exe : Compilation
42+
foo_exe .. foo_exe_note
43+
foo_exe -> coverage : Created during execution
44+
```
45+
## Dynamic instrumentation
46+
LibAFL uses Frida to instrument and collect coverage information. This is done by the `CoverageRuntime`. It this repo, I extended the original `CoverageRuntime` so that it can accumulate the coverage info between runs and save the counter files alongside the original DrCov files.
47+
48+
See the `fuzzers\frida_gdiplus` for how to use this. Most of what you need is to pass two additional parameters:
49+
```
50+
let coverage = CoverageRuntime::new()
51+
.save_dr_cov(options.save_bb_coverage)
52+
.max_cnt(options.drcov_max_execution_cnt);
53+
```
54+
55+
The dynamic instrumentation flow:
56+
57+
```puml
58+
file foo.c as src #yellow
59+
component FridaCoverageRuntime #purple
60+
node compiler
61+
file foo_exe
62+
folder coverage{
63+
artifact id_from_to.drcov
64+
artifact id_from_to.drcov.cnt
65+
}
66+
src -d-> compiler
67+
compiler -> foo_exe : Compilation
68+
foo_exe <-d- FridaCoverageRuntime : Embedded/Injected
69+
foo_exe -> coverage : Created during execution
70+
```
71+
72+
## Post processing
73+
Once the .drcov and .drcov.cnt are created they can be processed to extract and analyze the coverage information.
74+
75+
The main utility for the post processing is merge_drcov.py:
76+
```
77+
usage: merge_drcov.py [-h] [-d DIRECTORY] [-od OUTPUT_DIRECTORY] [-a {s,m,h,d}] [-k] [-v] [-c] [-f] {convert,symbolize,pc-table} ...
78+
79+
Utility to merge files in DrCov format
80+
81+
positional arguments:
82+
{convert,symbolize,pc-table}
83+
84+
optional arguments:
85+
-h, --help show this help message and exit
86+
-d DIRECTORY, --directory DIRECTORY
87+
Directory to process
88+
-od OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
89+
Output directory
90+
-a {s,m,h,d}, --aggregate {s,m,h,d}
91+
Aggregate per second|minute|hour|day
92+
-k, --keep Keep the original files
93+
-v, --verbose Verbose output
94+
-c, --counters Generate block counters for a merged file
95+
-f, --fix-sizes Disassembles the .text section of the module and fixes the basic block sizes in the drcov file
96+
```
97+
Post processing contains the following steps:
98+
1. Merge drcov files. The files are written every X executions. We might be interested to see the total coverage, or its aggregation over time.
99+
The simplest way to merge the files:
100+
`python3 ../../utils/drcov/merge_drcov.py -d ./coverage`
101+
**Notes:**
102+
1. Merging will produce a `merged.drcov`, that can be processed by DrCov utilities. The produced files are in the old, version 2 format, thus some utilitites can warn about it.
103+
1. In order to maintain the counters data and to be able to process it and to display it in HTML, one needs to specify `-c`. This will produce a `merged.drcov.json` file that contains the counters for each executed basic block.
104+
1. When using static instrumentation, the sizes of the basic blocks are unknown. This will result in wrong line coverage displayed in HTML, as only the first file of each block will have the correct counter.The `-f` option fixes the problem by extracting the BB infromation from the binary. Thus it is recommened to use `-f` option when working with static instrumentation.
105+
1. By default, the original .drcov and .drcov.cnt files are deleted. `-k` will keep them after the merge.
106+
1. Convert Drcov to lcov. Use DynamoRio utility for that.
107+
**Notes:**:
108+
1. make sure to use the 64 bit utility for 64 bit targets. Example: `~/DynamoRIO-Linux-10.0.19798/tools/bin64/drcov2lcov -input ./coverage/merged.drcov -output ./coverage.info -src_filter libpng-1.6.37`
109+
1. Version 10.0 has a bug, use a newer version
110+
1. Correct the counters in the generated .info file. Use the `symbolize` command of the merge_drcov. This requires llvm-symbolizer installed and should be done with the `coverage.info`` produced by drcov2lcov.
111+
`python3 ../../utils/drcov/merge_drcov.py symbolize -i ./coverage/merged.drcov.json -s /usr/lib/llvm-17/bin/llvm-symbolizer -c ./coverage.info -co ./coverage.cnt.info `
112+
**Notes**: Use the symbolizer from new llvm. Old ones (e.g.10) will not work. Tested with llvm-17.
113+
1. Generate the html with lines coverage. Example: `perl ~/dynamorio/third_party/lcov/genhtml --ignore-errors=source ./coverage.cnt.info -o /tmp/cov`. Browse to `/tmp/cov/index.html` to view the files.
114+
**Notes**: Make sure the source files are accessible at the same path as they where during the collection. Look for error messages that will list the missing files, e.g. `genhtml: WARNING: cannot read /libafl-fuzz/fuzz/fuzz.c!`
115+
116+
The post processing flow:
117+
118+
```puml
119+
file merged.drcov
120+
file merged.drcov.json #lightgreen
121+
file coverage.info as info
122+
file coverage_cnt.info as info_cnt
123+
cloud HTML
124+
node merge_drcov.py as merge_drcov
125+
node "merge_drcov symbolize" as merge_drcov_sym
126+
node drcov2lcov #lightblue
127+
node genhtml #lightblue
128+
129+
folder coverage{
130+
artifact "id_from_to.drcov" as drcov
131+
artifact "id_from_to.drcov.cnt" as drcov_cnt
132+
}
133+
drcov --> merge_drcov
134+
drcov_cnt --> merge_drcov
135+
merge_drcov --> merged.drcov
136+
merge_drcov --> merged.drcov.json
137+
merged.drcov --> drcov2lcov
138+
drcov2lcov -> info
139+
info --> merge_drcov_sym
140+
merged.drcov.json --> merge_drcov_sym
141+
merge_drcov_sym -> info_cnt
142+
info_cnt --> genhtml
143+
genhtml -> HTML
144+
```

0 commit comments

Comments
 (0)