|
| 1 | +* rust_1brc |
| 2 | +** Introduction |
| 3 | +*rust_1brc* is a Rust implementation for the [[https://1brc.dev/][One Billion Requests Challenge (1BRC)]]. The challenge involves processing one billion temperature measurements to calculate the minimum, mean, and maximum temperatures per weather station. This project aims to explore Rust's capabilities for efficient data handling and processing. The main motivation for undertaking this challenge was to leverage Rust's `std::mpsc` and `std::thread` libraries. The challenge is well-suited for parallelization, making it an ideal choice for exploring these libraries. |
| 4 | + |
| 5 | +The input file can be obtained by using one of the official scripts, such as this [[https://github.com/gunnarmorling/1brc/blob/main/src/main/python/create_measurements.py][python version]]. In the end, you will have a text file containing one billion temperature measurements, approximately 13GB in size. We refer to this file as =measurements.txt=. *This project will load the /entire/ file into memory and process it using all available CPU cores to maximize performance.* |
| 6 | + |
| 7 | +** Usage |
| 8 | +To use this project, start by cloning the repository. |
| 9 | +#+begin_src bash |
| 10 | + git clone https://github.com/hesampakdaman/rust_1brc.git |
| 11 | + cd rust_1brc |
| 12 | +#+end_src |
| 13 | + |
| 14 | +Then you can build and run the project with the following command. |
| 15 | +#+begin_src bash |
| 16 | + cargo run --release /path/to/measurements.txt |
| 17 | +#+end_src |
| 18 | + |
| 19 | +** Results |
| 20 | +On a MacBook M1 Pro (2021), the project processes the input file in approximately 2.8 seconds. This demonstrates the efficiency of the implementation and Rust's capability to handle large-scale data processing tasks effectively. |
| 21 | + |
| 22 | +** Architecture |
| 23 | +This section provides a high-level overview of the system's architecture: the main components and their interactions. |
| 24 | +*** Main Components |
| 25 | +- =main.rs= :: The entry point of the application. Handles command-line arguments and initiates the processing workflow. |
| 26 | +- =compute.rs= :: Contains the core logic for processing the temperature data, including calculations for min, mean, and max temperatures. |
| 27 | +- =pre_processing.rs= :: Manages the initial parsing and preparation of data, utilizing memory-mapped files for efficient access. |
| 28 | +- =weather.rs= :: Defines data structures and utility functions that are used throughout the application. |
| 29 | +- =aggregate.rs= :: Responsible for aggregating the results of the temperature data processing. |
| 30 | + |
| 31 | +*** Workflow |
| 32 | +1. *Initialization*: The application starts in =main.rs=, where it parses command-line arguments to get the path of the input file. |
| 33 | +2. *Data Loading*: =pre_processing.rs= handles the loading of the input data file using memory-mapped files to efficiently manage large data volumes. |
| 34 | +3. *Data Processing*: =compute.rs= processes the loaded data, calculating the required statistics (min, mean, max) for each weather station. |
| 35 | +4. *Aggregation*: =aggregate.rs= aggregates the computed results for final output. |
| 36 | +5. *Output*: Results are then output in the format specified by the challenge requirements. |
| 37 | + |
| 38 | +*** Architectural Invariants |
| 39 | +- Efficiency :: The system must handle the entire 13GB input file in memory, leveraging Rust's performance capabilities and concurrency features to utilize all available CPU cores. |
| 40 | +- Scalability :: The architecture should be scalable to ensure performance remains consistent even as data volume grows. |
| 41 | + |
| 42 | +*** Boundaries |
| 43 | +- Module Boundaries :: Clear separation between data loading (=pre_processing.rs=), data processing (=compute.rs=), aggregation (=aggregate.rs=), and utility functions (=weather.rs=). |
| 44 | +- Concurrency Management :: Concurrency is managed at the processing level to ensure maximum utilization of CPU resources. |
0 commit comments