Skip to content

Commit 739e2a9

Browse files
authored
Microoptimizations (#1153)
1 parent 51198c2 commit 739e2a9

23 files changed

+366
-420
lines changed

Changelog.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
### CLI
77
- Providing full static rust binary with [Eyra](https://github.com/sunfishcode/eyra) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
8+
- Fixed duplicated `-c` argument, now saving as compact json is handled via `-C` - ????
89

910
### Krokiet GUI
1011
- Initial release of new gui - [#1102](https://github.com/qarmin/czkawka/pull/1102)
@@ -17,7 +18,7 @@
1718
- Added bigger stack size by default(fixes stack overflow in some musl apps) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
1819
- Added optional libraw dependency(better single-core performance and support more raw files) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
1920
- Speedup checking for wildcards and fix invalid recognizing long excluded items - [#1152](https://github.com/qarmin/czkawka/pull/1152)
20-
- Even 10x speedup when searching for empty folders - [#1152](https://github.com/qarmin/czkawka/pull/1152)
21+
- Big speedup when searching for empty folders(especially with multithreading + cached FS schema) - [#1152](https://github.com/qarmin/czkawka/pull/1152)
2122
- Collecting files for scan can be a lot of faster due lazy file metadata gathering - [#1152](https://github.com/qarmin/czkawka/pull/1152)
2223
- Fixed recognizing not accessible folders as non-empty - [#1152](https://github.com/qarmin/czkawka/pull/1152)
2324

README.md

+43-29
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,16 @@
22

33
**Czkawka** (_tch•kav•ka_ (IPA: [ˈʧ̑kafka]), "hiccup" in Polish) is a simple, fast and free app to remove unnecessary files from your computer.
44

5+
**Krokiet** ((IPA: [ˈkrɔcɛt]), "croquet" in Polish) same as above, but uses Slint frontend.
6+
57
## Features
68
- Written in memory-safe Rust
79
- Amazingly fast - due to using more or less advanced algorithms and multithreading
810
- Free, Open Source without ads
911
- Multiplatform - works on Linux, Windows, macOS, FreeBSD and many more
1012
- Cache support - second and further scans should be much faster than the first one
1113
- CLI frontend - for easy automation
12-
- GUI frontend - uses GTK 4 framework and looks similar to FSlint
14+
- GUI frontend - uses GTK 4 or Slint frameworks
1315
- No spying - Czkawka does not have access to the Internet, nor does it collect any user information or statistics
1416
- Multilingual - support multiple languages like Polish, English or Italian
1517
- Multiple tools to use:
@@ -36,9 +38,18 @@ Each tool uses different technologies, so you can find instructions for each of
3638

3739
## Benchmarks
3840

39-
Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint or DupeGuru which are written in Python, we need to compare the speed of these tools.
41+
Previous benchmark was done mostly with two python project - dupeguru and fslint.
42+
Both were written in python so it was mostly obvious that Czkawka will be faster due using more low-level functions and faster language.
43+
44+
I tried to use rmlint gui but it not even started on my computer, so instead I used Detwinner, fclones-gui and dupeguru.
45+
46+
I tested it on a 1024 GB SSD(Sata 3) and an i7-4770 CPU(4/8HT), disk contains 1742102 files which took 850 GB
47+
Minimum file size 64KB, with search in hidden folders without any excluded folders/files.
4048

41-
I tested it on a 256 GB SSD and an i7-4770 CPU.
49+
Czkawka 7.0.0
50+
Detwinner 0.4.2
51+
Dupeguru 4.3.1
52+
Fclones-gui 0.2.0
4253

4354
I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links. The disk contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which occupied 4,1 GB.
4455

@@ -83,38 +94,40 @@ Similar images which check 349 image files that occupied 1.7 GB
8394
| DupeGuru 4.1.1 (First Run) | 55s |
8495
| DupeGuru 4.1.1 (Second Run) | 1s |
8596

97+
Of course there are multiple tools that offer even better performance, but usually are only specialized in one simple area.
98+
8699
## Comparison to other tools
87100

88101
Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So these two apps shouldn't be compared directly or be considered as an alternative to one another.
89102

90103
In this comparison remember, that even if app have same features they may work different(e.g. one app may have more options to choose than other).
91104

92-
| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
93-
|:------------------------:|:-----------:|:-----------:|:------:|:------------------:|:-----------:|
94-
| Language | Rust | Rust | Python | Python/Obj-C | Python |
95-
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
96-
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
97-
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
98-
| Duplicate finder |||| | |
99-
| Empty files |||| | |
100-
| Empty folders |||| | |
101-
| Temporary files |||| ||
102-
| Big files ||| | | |
103-
| Similar images ||| | | |
104-
| Similar videos ||| | | |
105-
| Music duplicates(tags) ||| | | |
106-
| Invalid symlinks |||| | |
107-
| Broken files ||| | | |
108-
| Names conflict |||| | |
109-
| Invalid names/extensions |||| | |
110-
| Installed packages | | || | |
111-
| Bad ID | | || | |
112-
| Non stripped binaries | | || | |
113-
| Redundant whitespace | | || | |
114-
| Overwriting files | | || ||
115-
| Multiple languages || || ||
116-
| Cache support ||| | | |
117-
| In active development | Yes | | No | Yes | Yes |
105+
| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
106+
|:------------------------:|:-----------:|:-----------:|:------:|:-----------------:|:-----------:|
107+
| Language | Rust | Rust | Python | Python/Obj-C | Python |
108+
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
109+
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
110+
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
111+
| Duplicate finder ||||| |
112+
| Empty files |||| | |
113+
| Empty folders |||| | |
114+
| Temporary files |||| ||
115+
| Big files ||| | | |
116+
| Similar images ||| || |
117+
| Similar videos ||| | | |
118+
| Music duplicates(tags) ||| || |
119+
| Invalid symlinks |||| | |
120+
| Broken files ||| | | |
121+
| Names conflict |||| | |
122+
| Invalid names/extensions |||| | |
123+
| Installed packages | | || | |
124+
| Bad ID | | || | |
125+
| Non stripped binaries | | || | |
126+
| Redundant whitespace | | || | |
127+
| Overwriting files | | || ||
128+
| Multiple languages || ||||
129+
| Cache support ||| || |
130+
| In active development | Yes | Yes | No | Yes | Yes |
118131

119132
## Other apps
120133
There are many similar applications to Czkawka on the Internet, which do some things better and some things worse:
@@ -123,6 +136,7 @@ There are many similar applications to Czkawka on the Internet, which do some th
123136
- [FSlint](https://github.com/pixelb/fslint) - A little outdated, but still have some tools not available in Czkawka
124137
- [AntiDupl.NET](https://github.com/ermig1979/AntiDupl) - Shows a lot of metadata of compared images
125138
- [Video Duplicate Finder](https://github.com/0x90d/videoduplicatefinder) - Finds similar videos(surprising, isn't it), supports video thumbnails
139+
126140
### CLI
127141
Due to limited time, the biggest emphasis is on the GUI version so if you are looking for really good and feature-packed console apps, then take a look at these:
128142
- [Fclones](https://github.com/pkolaczk/fclones) - One of the fastest tools to find duplicates; it is written also in Rust

ci_tester/src/main.rs

+4-4
Original file line numberDiff line numberDiff line change
@@ -319,12 +319,12 @@ fn collect_all_files_and_dirs(dir: &str) -> std::io::Result<CollectedFiles> {
319319
let path = entry.path();
320320

321321
if path.is_dir() {
322-
folders.insert(path.display().to_string());
323-
folders_to_check.push(path.display().to_string());
322+
folders.insert(path.to_string_lossy().to_string());
323+
folders_to_check.push(path.to_string_lossy().to_string());
324324
} else if path.is_symlink() {
325-
symlinks.insert(path.display().to_string());
325+
symlinks.insert(path.to_string_lossy().to_string());
326326
} else if path.is_file() {
327-
files.insert(path.display().to_string());
327+
files.insert(path.to_string_lossy().to_string());
328328
} else {
329329
panic!("Unknown type of file {:?}", path);
330330
}

czkawka_cli/src/commands.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,7 @@ pub struct FileToSave {
675675

676676
#[derive(Debug, clap::Args)]
677677
pub struct JsonCompactFileToSave {
678-
#[clap(short, long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
678+
#[clap(short = 'C', long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
679679
pub compact_file_to_save: Option<PathBuf>,
680680
}
681681

czkawka_core/src/bad_extensions.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,7 @@ impl PrintResults for BadExtensions {
426426
writeln!(writer, "Found {} files with invalid extension.\n", self.information.number_of_files_with_bad_extension)?;
427427

428428
for file_entry in &self.bad_extensions_files {
429-
writeln!(writer, "{} ----- {}", file_entry.path.display(), file_entry.proper_extensions)?;
429+
writeln!(writer, "{:?} ----- {}", file_entry.path, file_entry.proper_extensions)?;
430430
}
431431

432432
Ok(())

czkawka_core/src/big_file.rs

+15-27
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ use log::debug;
1313
use rayon::prelude::*;
1414
use serde::{Deserialize, Serialize};
1515

16-
use crate::common::{check_folder_children, check_if_stop_received, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, split_path};
17-
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
16+
use crate::common::{check_folder_children, check_if_stop_received, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, split_path_compare};
17+
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
1818
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
1919
use crate::common_traits::{DebugPrint, PrintResults};
2020

@@ -68,13 +68,9 @@ impl BigFile {
6868

6969
#[fun_time(message = "look_for_big_files", level = "debug")]
7070
fn look_for_big_files(&mut self, stop_receiver: Option<&Receiver<()>>, progress_sender: Option<&Sender<ProgressData>>) -> bool {
71-
let mut folders_to_check: Vec<PathBuf> = Vec::with_capacity(1024 * 2);
7271
let mut old_map: BTreeMap<u64, Vec<FileEntry>> = Default::default();
7372

74-
// Add root folders for finding
75-
for id in &self.common_data.directories.included_directories {
76-
folders_to_check.push(id.clone());
77-
}
73+
let mut folders_to_check: Vec<PathBuf> = self.common_data.directories.included_directories.clone();
7874

7975
let (progress_thread_handle, progress_thread_run, atomic_counter, _check_was_stopped) =
8076
prepare_thread_handler_common(progress_sender, 0, 0, 0, CheckingMethod::None, self.common_data.tool_type);
@@ -87,13 +83,13 @@ impl BigFile {
8783
}
8884

8985
let segments: Vec<_> = folders_to_check
90-
.par_iter()
86+
.into_par_iter()
9187
.map(|current_folder| {
9288
let mut dir_result = vec![];
9389
let mut warnings = vec![];
9490
let mut fe_result = vec![];
9591

96-
let Some(read_dir) = common_read_dir(current_folder, &mut warnings) else {
92+
let Some(read_dir) = common_read_dir(&current_folder, &mut warnings) else {
9793
return (dir_result, warnings, fe_result);
9894
};
9995

@@ -110,22 +106,22 @@ impl BigFile {
110106
check_folder_children(
111107
&mut dir_result,
112108
&mut warnings,
113-
current_folder,
109+
&current_folder,
114110
&entry_data,
115111
self.common_data.recursive_search,
116112
&self.common_data.directories,
117113
&self.common_data.excluded_items,
118114
);
119115
} else if file_type.is_file() {
120-
self.collect_file_entry(&atomic_counter, &entry_data, &mut fe_result, &mut warnings, current_folder);
116+
self.collect_file_entry(&atomic_counter, &entry_data, &mut fe_result, &mut warnings, &current_folder);
121117
}
122118
}
123119
(dir_result, warnings, fe_result)
124120
})
125121
.collect();
126122

127-
// Advance the frontier
128-
folders_to_check.clear();
123+
let required_size = segments.iter().map(|(segment, _, _)| segment.len()).sum::<usize>();
124+
folders_to_check = Vec::with_capacity(required_size);
129125

130126
// Process collected data
131127
for (segment, warnings, fe_result) in segments {
@@ -155,12 +151,7 @@ impl BigFile {
155151
current_folder: &Path,
156152
) {
157153
atomic_counter.fetch_add(1, Ordering::Relaxed);
158-
159-
let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
160-
return;
161-
};
162-
163-
if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
154+
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
164155
return;
165156
}
166157

@@ -178,9 +169,9 @@ impl BigFile {
178169
}
179170

180171
let fe: FileEntry = FileEntry {
181-
path: current_file_name.clone(),
182-
size: metadata.len(),
183172
modified_date: get_modified_time(&metadata, warnings, &current_file_name, false),
173+
path: current_file_name,
174+
size: metadata.len(),
184175
};
185176

186177
fe_result.push((fe.size, fe));
@@ -198,10 +189,7 @@ impl BigFile {
198189
for (_size, mut vector) in iter {
199190
if self.information.number_of_real_files < self.number_of_files_to_check {
200191
if vector.len() > 1 {
201-
vector.sort_unstable_by_key(|e| {
202-
let t = split_path(e.path.as_path());
203-
(t.0, t.1)
204-
});
192+
vector.sort_unstable_by(|a, b| split_path_compare(a.path.as_path(), b.path.as_path()));
205193
}
206194
for file in vector {
207195
if self.information.number_of_real_files < self.number_of_files_to_check {
@@ -222,7 +210,7 @@ impl BigFile {
222210
DeleteMethod::Delete => {
223211
for file_entry in &self.big_files {
224212
if fs::remove_file(&file_entry.path).is_err() {
225-
self.common_data.text_messages.warnings.push(file_entry.path.display().to_string());
213+
self.common_data.text_messages.warnings.push(file_entry.path.to_string_lossy().to_string());
226214
}
227215
}
228216
}
@@ -271,7 +259,7 @@ impl PrintResults for BigFile {
271259
writeln!(writer, "{} the smallest files.\n\n", self.information.number_of_real_files)?;
272260
}
273261
for file_entry in &self.big_files {
274-
writeln!(writer, "{} ({}) - {}", format_size(file_entry.size, BINARY), file_entry.size, file_entry.path.display())?;
262+
writeln!(writer, "{} ({}) - {:?}", format_size(file_entry.size, BINARY), file_entry.size, file_entry.path)?;
275263
}
276264
} else {
277265
write!(writer, "Not found any files.").unwrap();

0 commit comments

Comments
 (0)