Skip to content

Commit 885e7a3

Browse files
committed
Add README.org
1 parent e583ca1 commit 885e7a3

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

README.org

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
* rust_1brc
2+
** Introduction
3+
*rust_1brc* is a Rust implementation for the [[https://1brc.dev/][One Billion Requests Challenge (1BRC)]]. The challenge involves processing one billion temperature measurements to calculate the minimum, mean, and maximum temperatures per weather station. This project aims to explore Rust's capabilities for efficient data handling and processing. The main motivation for undertaking this challenge was to leverage Rust's `std::mpsc` and `std::thread` libraries. The challenge is well-suited for parallelization, making it an ideal choice for exploring these libraries.
4+
5+
The input file can be obtained by using one of the official scripts, such as this [[https://github.com/gunnarmorling/1brc/blob/main/src/main/python/create_measurements.py][python version]]. In the end, you will have a text file containing one billion temperature measurements, approximately 13GB in size. We refer to this file as =measurements.txt=. *This project will load the /entire/ file into memory and process it using all available CPU cores to maximize performance.*
6+
7+
** Usage
8+
To use this project, start by cloning the repository.
9+
#+begin_src bash
10+
git clone https://github.com/hesampakdaman/rust_1brc.git
11+
cd rust_1brc
12+
#+end_src
13+
14+
Then you can build and run the project with the following command.
15+
#+begin_src bash
16+
cargo run --release /path/to/measurements.txt
17+
#+end_src
18+
19+
** Results
20+
On a MacBook M1 Pro (2021), the project processes the input file in approximately 2.8 seconds. This demonstrates the efficiency of the implementation and Rust's capability to handle large-scale data processing tasks effectively.
21+
22+
** Architecture
23+
This section provides a high-level overview of the system's architecture: the main components and their interactions.
24+
*** Main Components
25+
- =main.rs= :: The entry point of the application. Handles command-line arguments and initiates the processing workflow.
26+
- =compute.rs= :: Contains the core logic for processing the temperature data, including calculations for min, mean, and max temperatures.
27+
- =pre_processing.rs= :: Manages the initial parsing and preparation of data, utilizing memory-mapped files for efficient access.
28+
- =weather.rs= :: Defines data structures and utility functions that are used throughout the application.
29+
- =aggregate.rs= :: Responsible for aggregating the results of the temperature data processing.
30+
31+
*** Workflow
32+
1. *Initialization*: The application starts in =main.rs=, where it parses command-line arguments to get the path of the input file.
33+
2. *Data Loading*: =pre_processing.rs= handles the loading of the input data file using memory-mapped files to efficiently manage large data volumes.
34+
3. *Data Processing*: =compute.rs= processes the loaded data, calculating the required statistics (min, mean, max) for each weather station.
35+
4. *Aggregation*: =aggregate.rs= aggregates the computed results for final output.
36+
5. *Output*: Results are then output in the format specified by the challenge requirements.
37+
38+
*** Architectural Invariants
39+
- Efficiency :: The system must handle the entire 13GB input file in memory, leveraging Rust's performance capabilities and concurrency features to utilize all available CPU cores.
40+
- Scalability :: The architecture should be scalable to ensure performance remains consistent even as data volume grows.
41+
42+
*** Boundaries
43+
- Module Boundaries :: Clear separation between data loading (=pre_processing.rs=), data processing (=compute.rs=), aggregation (=aggregate.rs=), and utility functions (=weather.rs=).
44+
- Concurrency Management :: Concurrency is managed at the processing level to ensure maximum utilization of CPU resources.

0 commit comments

Comments
 (0)