FoRREST is structured as a set of Python classes. The primary class, FoRREST, contains basic features for manipulating the object (load a file, call the shell, etc.). The remaining classes are referred to as Plugins, and are divided according to the levels of information representation described below.
The information found in a binary can be represented in many ways and on many different levels. You start on a low-level with very simplistic information and as you extract data, you start to get higher-level information.
- Filename - The name of the file that you are working with.
- Extension - The name of the extension for the file you are working with if it's present.
- Size - The size of the file you are working with.
- inode - The physical location the file on the file system.
- Path - The path that leads to the file.
- Checksums - The digest value obtained by running a hashing algorithm on the raw data.
- Filetype - The type of file which is denoted by a sequence of bits at the start of a file. This is considered the mime type.
- Version - The version numbers of the program (if used).
- Architecture - The architecture that the program was compiled for.
- Compiler - The compiler that was used to generate the file (if known).
- Sections - The name and offset addresses that divide the file into meaningful paritions.
- Opcodes - The bytes from the .text section that can be decoded into a stream of opcodes. The .text is the binary's actual code.
- Strings - A list of strings from the .data section and other sections.
- Imports - A list of functions the program references from a linked object file.
- Exports - A list of variables and function addresses made available to outside programs.
- Header Information - Information about how the program is organized.
- Disassembly - The output from the transformation from a binary file to an assembly file.
- Mnemonics - The portion of the assembly instruction that is related to the action being preformed.
- Functions - The list of addresses that are discovered from analyzing the targets of call instructions.
- Basic Blocks - The directed graph of basic blocks that can be constructed by analyzing the boundaries and control flow of instructions.
- Data References - The location of data values derived from interpreting memory store and load operations.
- Jump Targets - The memory address offsets obtained by analyzing the targets of conditional and unconditional control flow change instructions.
- Control Flow Graph - Control flow graphs (CFGs) are visual representations of basic blocks that include arrows signifying which blocks a given block calls or is called by
- System Calls - System call traces determine what operating system functions the program attempts to access.
- Function Traces - Function traces note sequences of functions that are executed, and in what order.
- Program Slices - Program slices are sub-sequences of code that affect the value of a variable through assignment or some other operation.
- Intermediate Representation - Intermediate representation (IR) is a form of expanded assembly code, that breaks each processor instruction into multiple steps to explicitly define every action necessary to perform the instruction.
- Decompilation - Decompiled source code is the final step in the transition from the original string of bytes to a high-level, human-readable programming language. It is, ideally, very similar to the original source code that created the executable.
- Stack Frames - Stack frames are records of the program stacks built in memory when calling functions within the program, i.e. parameter values, local variables, and return addresses.
- Packet Captures - Packet captures are sets of data that are collected whenever a program tries to communicate over a network
- Symbolic Execution - Symbolic execution of programs allows for analysis of potential paths without using concrete input. This is done by automated analysis tools which substitute symbols for the values of variables.
- Deobfuscation - Obfuscation makes exeuctables very hard to reverse engineer. Developers may obfuscate their code for many reasons, from wanting to protect their intellectual property to disguising a program's true intent because it's malicious. Obfuscation can also happen unintentionally when a compiler optimizes code. Deobfuscation attempts to simplify the code in order to make it clear what the developer's intent was