|
| 1 | +# About |
| 2 | +This directory contains utilities for collecting and processing coverage information in DrCov format with additional capabilities. |
| 3 | + |
| 4 | +Capabilities: |
| 5 | +1. Counting the number of times each block was executed |
| 6 | +1. Aggregating coverage information over time |
| 7 | +1. Extracting BB addresses and size from PCTable |
| 8 | +1. Symbolization using the debug_line info |
| 9 | + |
| 10 | +Original DrCov files have a rigid format that does not include the counters, thus the counters are contained in a companion drcov.cnt files in a binary format. On the other hand, DrCov comes with a number of useful utilities that visualise coverage and thus we keep the original format. To be precise, `drcov2lcov` converts drcov files to the lcov-formatted files, which can be converted to HTML using `genhtml`. |
| 11 | + |
| 12 | +# Collecting coverage |
| 13 | +DynamoRio collects DrCov using its runtime instrumentation. In LibAFL, we support collecting it using both compile-time and runtime instrumentation. This allows for a unified backend processing and analytics. |
| 14 | + |
| 15 | +## Static instrumentation |
| 16 | +LibAFL uses llvm coverage instrumentation when compiling form sources. This is done by using a compiler wrapper libafl_cc that adds the needed options to the clang's command line. While regular LibAFL coverage adds just `-fsanitize-coverage=trace-pc-guard`, for DrCov coverage the libafl_cc should add `-fsanitize-coverage=pc-table,bb,trace-pc-guard`. |
| 17 | +The meaning of the additions is: |
| 18 | + - `bb` causes the compiler to instrument each basic block (the default is 'edge' which introduces artificial BBs to track the edges that are known statically). |
| 19 | + - `pc-table` adds to the result ELF file a section that holds offsets of each BB. Without it, trace-pc-guard instrumentation does not provide any address information, just inserts callbacks to thecoverage instrumentation. |
| 20 | + |
| 21 | +It is also important to compile the source with the debug information (`-g`) to be able to map the addresses to the source code lines. |
| 22 | + |
| 23 | +The accumulated counters files are created by the `AccMapObserver` that collectes the data from the same in-memory map the regular coverage uses. |
| 24 | +See the `fuzzers/libfuzzer_libpng` in this repo for an example of how to use static DrCov collection and look into its README for more information. |
| 25 | +See also the `fuzzers/forkserver_libafl_cc` for an example of collecting coverage from a forked process with static instrumentation. |
| 26 | + |
| 27 | +The static instrumentation flow: |
| 28 | +```puml |
| 29 | + file foo.c as src #yellow |
| 30 | + node libafl_cc |
| 31 | + file foo_exe |
| 32 | + folder coverage{ |
| 33 | + artifact id_from_to.drcov |
| 34 | + artifact id_from_to.drcov.cnt |
| 35 | + } |
| 36 | + note as foo_exe_note |
| 37 | + Contains PCGuard callbacks |
| 38 | + and PCTable |
| 39 | + end note |
| 40 | + src -d-> libafl_cc |
| 41 | + libafl_cc -> foo_exe : Compilation |
| 42 | + foo_exe .. foo_exe_note |
| 43 | + foo_exe -> coverage : Created during execution |
| 44 | +``` |
| 45 | +## Dynamic instrumentation |
| 46 | +LibAFL uses Frida to instrument and collect coverage information. This is done by the `CoverageRuntime`. It this repo, I extended the original `CoverageRuntime` so that it can accumulate the coverage info between runs and save the counter files alongside the original DrCov files. |
| 47 | + |
| 48 | +See the `fuzzers\frida_gdiplus` for how to use this. Most of what you need is to pass two additional parameters: |
| 49 | +``` |
| 50 | +let coverage = CoverageRuntime::new() |
| 51 | + .save_dr_cov(options.save_bb_coverage) |
| 52 | + .max_cnt(options.drcov_max_execution_cnt); |
| 53 | +``` |
| 54 | + |
| 55 | +The dynamic instrumentation flow: |
| 56 | + |
| 57 | +```puml |
| 58 | + file foo.c as src #yellow |
| 59 | + component FridaCoverageRuntime #purple |
| 60 | + node compiler |
| 61 | + file foo_exe |
| 62 | + folder coverage{ |
| 63 | + artifact id_from_to.drcov |
| 64 | + artifact id_from_to.drcov.cnt |
| 65 | + } |
| 66 | + src -d-> compiler |
| 67 | + compiler -> foo_exe : Compilation |
| 68 | + foo_exe <-d- FridaCoverageRuntime : Embedded/Injected |
| 69 | + foo_exe -> coverage : Created during execution |
| 70 | +``` |
| 71 | + |
| 72 | +## Post processing |
| 73 | +Once the .drcov and .drcov.cnt are created they can be processed to extract and analyze the coverage information. |
| 74 | + |
| 75 | +The main utility for the post processing is merge_drcov.py: |
| 76 | +``` |
| 77 | +usage: merge_drcov.py [-h] [-d DIRECTORY] [-od OUTPUT_DIRECTORY] [-a {s,m,h,d}] [-k] [-v] [-c] [-f] {convert,symbolize,pc-table} ... |
| 78 | +
|
| 79 | +Utility to merge files in DrCov format |
| 80 | +
|
| 81 | +positional arguments: |
| 82 | + {convert,symbolize,pc-table} |
| 83 | +
|
| 84 | +optional arguments: |
| 85 | + -h, --help show this help message and exit |
| 86 | + -d DIRECTORY, --directory DIRECTORY |
| 87 | + Directory to process |
| 88 | + -od OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY |
| 89 | + Output directory |
| 90 | + -a {s,m,h,d}, --aggregate {s,m,h,d} |
| 91 | + Aggregate per second|minute|hour|day |
| 92 | + -k, --keep Keep the original files |
| 93 | + -v, --verbose Verbose output |
| 94 | + -c, --counters Generate block counters for a merged file |
| 95 | + -f, --fix-sizes Disassembles the .text section of the module and fixes the basic block sizes in the drcov file |
| 96 | +``` |
| 97 | +Post processing contains the following steps: |
| 98 | +1. Merge drcov files. The files are written every X executions. We might be interested to see the total coverage, or its aggregation over time. |
| 99 | +The simplest way to merge the files: |
| 100 | +`python3 ../../utils/drcov/merge_drcov.py -d ./coverage` |
| 101 | +**Notes:** |
| 102 | + 1. Merging will produce a `merged.drcov`, that can be processed by DrCov utilities. The produced files are in the old, version 2 format, thus some utilitites can warn about it. |
| 103 | + 1. In order to maintain the counters data and to be able to process it and to display it in HTML, one needs to specify `-c`. This will produce a `merged.drcov.json` file that contains the counters for each executed basic block. |
| 104 | + 1. When using static instrumentation, the sizes of the basic blocks are unknown. This will result in wrong line coverage displayed in HTML, as only the first file of each block will have the correct counter.The `-f` option fixes the problem by extracting the BB infromation from the binary. Thus it is recommened to use `-f` option when working with static instrumentation. |
| 105 | + 1. By default, the original .drcov and .drcov.cnt files are deleted. `-k` will keep them after the merge. |
| 106 | +1. Convert Drcov to lcov. Use DynamoRio utility for that. |
| 107 | +**Notes:**: |
| 108 | + 1. make sure to use the 64 bit utility for 64 bit targets. Example: `~/DynamoRIO-Linux-10.0.19798/tools/bin64/drcov2lcov -input ./coverage/merged.drcov -output ./coverage.info -src_filter libpng-1.6.37` |
| 109 | + 1. Version 10.0 has a bug, use a newer version |
| 110 | +1. Correct the counters in the generated .info file. Use the `symbolize` command of the merge_drcov. This requires llvm-symbolizer installed and should be done with the `coverage.info`` produced by drcov2lcov. |
| 111 | +`python3 ../../utils/drcov/merge_drcov.py symbolize -i ./coverage/merged.drcov.json -s /usr/lib/llvm-17/bin/llvm-symbolizer -c ./coverage.info -co ./coverage.cnt.info ` |
| 112 | +**Notes**: Use the symbolizer from new llvm. Old ones (e.g.10) will not work. Tested with llvm-17. |
| 113 | +1. Generate the html with lines coverage. Example: `perl ~/dynamorio/third_party/lcov/genhtml --ignore-errors=source ./coverage.cnt.info -o /tmp/cov`. Browse to `/tmp/cov/index.html` to view the files. |
| 114 | +**Notes**: Make sure the source files are accessible at the same path as they where during the collection. Look for error messages that will list the missing files, e.g. `genhtml: WARNING: cannot read /libafl-fuzz/fuzz/fuzz.c!` |
| 115 | + |
| 116 | +The post processing flow: |
| 117 | + |
| 118 | +```puml |
| 119 | + file merged.drcov |
| 120 | + file merged.drcov.json #lightgreen |
| 121 | + file coverage.info as info |
| 122 | + file coverage_cnt.info as info_cnt |
| 123 | + cloud HTML |
| 124 | + node merge_drcov.py as merge_drcov |
| 125 | + node "merge_drcov symbolize" as merge_drcov_sym |
| 126 | + node drcov2lcov #lightblue |
| 127 | + node genhtml #lightblue |
| 128 | +
|
| 129 | + folder coverage{ |
| 130 | + artifact "id_from_to.drcov" as drcov |
| 131 | + artifact "id_from_to.drcov.cnt" as drcov_cnt |
| 132 | + } |
| 133 | + drcov --> merge_drcov |
| 134 | + drcov_cnt --> merge_drcov |
| 135 | + merge_drcov --> merged.drcov |
| 136 | + merge_drcov --> merged.drcov.json |
| 137 | + merged.drcov --> drcov2lcov |
| 138 | + drcov2lcov -> info |
| 139 | + info --> merge_drcov_sym |
| 140 | + merged.drcov.json --> merge_drcov_sym |
| 141 | + merge_drcov_sym -> info_cnt |
| 142 | + info_cnt --> genhtml |
| 143 | + genhtml -> HTML |
| 144 | +``` |
0 commit comments