Fast Linear Predictor

By David Trapp (CherryDT) - MIT License

fast-linear-predictor is a high-performance tool for cracking and predicting future outputs of GF(2)-linear pseudorandom number generators (PRNGs), such as the Mersenne Twister or linear congruential generators, even when only a masked subset of output bits is available.

For example, if you have some application that uses random.getrandbits in Python (which is a linear PRNG) in combination with some bitmask, xor or shift, e.g. random.getrandbits(32) & 0xFF, and you can gather ~40,000 of such consecutive outputs, you can then predict the next outputs.

(This was created for a CTF in which I encountered such a situation where symbolic Mersenne “untwisters” failed because the runtime became extremely high when only a few bits per input value were available and the upper bits were unknown.)

Features

Per-bit Berlekamp-Massey: Recovers, for each observed output bit position, the minimal linear feedback shift register (LFSR) that reproduces the bit-stream.
LFSR Stepping: Uses the recovered connection polynomials and seed states to advance each bit’s LFSR and predict arbitrary‑length future bit sequences.
Extremely fast: Each bit‑position is recovered and stepped in parallel, using several CPU cores.
Low memory footprint: Requires just a few MB of RAM, even for 64-bit streams and tens of thousands of samples.

How It Works

Sampling: It reads n observed outputs (one integer per line). The tool only uses the lower b bits of each value (max. b = 64).
Bit‑Stream Extraction: For each bit position 0 ... b-1, it extracts the binary sequence of that bit across all samples.
Berlekamp-Massey: For each binary sequence, it runs the Berlekamp-Massey algorithm in O(n²) time to recover the shortest LFSR (connection polynomial and degree L).
Seed State: It takes the final L bits of each sequence as the initial register state.
Parallel Prediction: It steps each LFSR forward by c steps, collecting the new bits.
Reassembly: It combines the predicted bits across positions to form the full predicted integer outputs.

Build & Usage

# Build with OpenMP support
make all

# Basic usage: predict 16 future values
./fast-linear-predictor -c 16 samples.txt

# Specify bit-width to speed up processing (e.g. only low 8 bits)
./fast-linear-predictor -b 8 -c 16 samples.txt

You can run make test for a self-test that uses Python's PRNG.

If you omit the input file, the samples are taken from stdin instead.

Only tested on 64-Bit Linux!

Performance on an AMD Ryzen 7 Pro 4750U

Scenario	Command	Time
64-bit values	`-b 64 -c 16`	3-5 s
32-bit values	`-b 32 -c 16`	2-3 s
16-bit values	`-b 16 -c 16`	1-2 s
8-bit values	`-b 8 -c 16`	<1 s

Memory usage remains under 6 MB across all scenarios.

Capabilities and Limitations

Unlike symbolic solutions based on Z3 or similar, this tool does not reconstruct the full internal state of the PRNG. Instead, it looks at each bit position in the inputs/outputs at a time and “learns its pattern” to predict future bits.

What it can do
- Recover and predict future outputs very efficiently when the PRNG’s output function is a linear Boolean function of its internal state bits (e.g. raw getrandbits in Python, raw rand in C, xorshift, LCG output, MT tempering is linear).
- Handle up to 64 independent bit‑streams in one run (the different bits in input and output values), using multicore parallelism for significant speedup.
What it cannot do
- Crack sequences generated by non-linear transformations (e.g. multiplication with carries, rejection sampling loops like in Python's randint, modulo bias, cryptographic ciphers) break the GF(2)-linearity assumption and are not recoverable by Berlekamp-Massey.
- Work on non-consecutive samples in the input (unless they are spaced in always the same interval, e.g. always each second output, but then the predicted values will also be only every second next value).
- Work on samples where some bits are unknown but those unknown bits are not always the same and the bits should be still predicted.

It requires at least 2 * b samples to fully recover an LFSR of degree up to b. In practice, you will need twice as many samples as the internal state of the PRNG has. For example, for Python's MT19937 which has 624x32 bits of internal state, you will need at least 39,936 (2x624x32) samples.

If you need to work with something that does not fulfill the requirements this tool has, you could look into Z3-based solutions instead, such as symbolic_mersenne_cracker. Such a tool will need a lot more RAM and CPU but can work with incomplete or transformed data and can work with less samples as well because bits are processed together and not in isolation like this tool does.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
fast-linear-predictor.c		fast-linear-predictor.c
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fast Linear Predictor

Features

How It Works

Build & Usage

Performance on an AMD Ryzen 7 Pro 4750U

Capabilities and Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CherryDT/fast-linear-predictor

Folders and files

Latest commit

History

Repository files navigation

Fast Linear Predictor

Features

How It Works

Build & Usage

Performance on an AMD Ryzen 7 Pro 4750U

Capabilities and Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages