Skip to content

Microoptimizations #1153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

### CLI
- Providing full static rust binary with [Eyra](https://github.com/sunfishcode/eyra) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
- Fixed duplicated `-c` argument, now saving as compact json is handled via `-C` - ????

### Krokiet GUI
- Initial release of new gui - [#1102](https://github.com/qarmin/czkawka/pull/1102)
Expand All @@ -17,7 +18,7 @@
- Added bigger stack size by default(fixes stack overflow in some musl apps) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
- Added optional libraw dependency(better single-core performance and support more raw files) - [#1102](https://github.com/qarmin/czkawka/pull/1102)
- Speedup checking for wildcards and fix invalid recognizing long excluded items - [#1152](https://github.com/qarmin/czkawka/pull/1152)
- Even 10x speedup when searching for empty folders - [#1152](https://github.com/qarmin/czkawka/pull/1152)
- Big speedup when searching for empty folders(especially with multithreading + cached FS schema) - [#1152](https://github.com/qarmin/czkawka/pull/1152)
- Collecting files for scan can be a lot of faster due lazy file metadata gathering - [#1152](https://github.com/qarmin/czkawka/pull/1152)
- Fixed recognizing not accessible folders as non-empty - [#1152](https://github.com/qarmin/czkawka/pull/1152)

Expand Down
72 changes: 43 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@

**Czkawka** (_tch•kav•ka_ (IPA: [ˈʧ̑kafka]), "hiccup" in Polish) is a simple, fast and free app to remove unnecessary files from your computer.

**Krokiet** ((IPA: [ˈkrɔcɛt]), "croquet" in Polish) same as above, but uses Slint frontend.

## Features
- Written in memory-safe Rust
- Amazingly fast - due to using more or less advanced algorithms and multithreading
- Free, Open Source without ads
- Multiplatform - works on Linux, Windows, macOS, FreeBSD and many more
- Cache support - second and further scans should be much faster than the first one
- CLI frontend - for easy automation
- GUI frontend - uses GTK 4 framework and looks similar to FSlint
- GUI frontend - uses GTK 4 or Slint frameworks
- No spying - Czkawka does not have access to the Internet, nor does it collect any user information or statistics
- Multilingual - support multiple languages like Polish, English or Italian
- Multiple tools to use:
Expand All @@ -36,9 +38,18 @@ Each tool uses different technologies, so you can find instructions for each of

## Benchmarks

Since Czkawka is written in Rust and it aims to be a faster alternative to FSlint or DupeGuru which are written in Python, we need to compare the speed of these tools.
Previous benchmark was done mostly with two python project - dupeguru and fslint.
Both were written in python so it was mostly obvious that Czkawka will be faster due using more low-level functions and faster language.

I tried to use rmlint gui but it not even started on my computer, so instead I used Detwinner, fclones-gui and dupeguru.

I tested it on a 1024 GB SSD(Sata 3) and an i7-4770 CPU(4/8HT), disk contains 1742102 files which took 850 GB
Minimum file size 64KB, with search in hidden folders without any excluded folders/files.

I tested it on a 256 GB SSD and an i7-4770 CPU.
Czkawka 7.0.0
Detwinner 0.4.2
Dupeguru 4.3.1
Fclones-gui 0.2.0

I prepared a disk and performed a test without any folder exceptions and with disabled ignoring of hard links. The disk contained 363 215 files, took 221,8 GB and had 62093 duplicate files in 31790 groups which occupied 4,1 GB.

Expand Down Expand Up @@ -83,38 +94,40 @@ Similar images which check 349 image files that occupied 1.7 GB
| DupeGuru 4.1.1 (First Run) | 55s |
| DupeGuru 4.1.1 (Second Run) | 1s |

Of course there are multiple tools that offer even better performance, but usually are only specialized in one simple area.

## Comparison to other tools

Bleachbit is a master at finding and removing temporary files, while Czkawka only finds the most basic ones. So these two apps shouldn't be compared directly or be considered as an alternative to one another.

In this comparison remember, that even if app have same features they may work different(e.g. one app may have more options to choose than other).

| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
|:------------------------:|:-----------:|:-----------:|:------:|:------------------:|:-----------:|
| Language | Rust | Rust | Python | Python/Obj-C | Python |
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
| Duplicate finder | ✔ | ✔ | ✔ | ✔ | |
| Empty files | ✔ | ✔ | ✔ | | |
| Empty folders | ✔ | ✔ | ✔ | | |
| Temporary files | ✔ | ✔ | ✔ | | ✔ |
| Big files | ✔ | ✔ | | | |
| Similar images | ✔ | ✔ | | ✔ | |
| Similar videos | ✔ | ✔ | | | |
| Music duplicates(tags) | ✔ | ✔ | | ✔ | |
| Invalid symlinks | ✔ | ✔ | ✔ | | |
| Broken files | ✔ | ✔ | | | |
| Names conflict | ✔ | ✔ | ✔ | | |
| Invalid names/extensions | ✔ | ✔ | ✔ | | |
| Installed packages | | | ✔ | | |
| Bad ID | | | ✔ | | |
| Non stripped binaries | | | ✔ | | |
| Redundant whitespace | | | ✔ | | |
| Overwriting files | | | ✔ | | ✔ |
| Multiple languages | ✔ | | ✔ | ✔ | ✔ |
| Cache support | ✔ | ✔ | | ✔ | |
| In active development | Yes | | No | Yes | Yes |
| | Czkawka | Krokiet | FSlint | DupeGuru | Bleachbit |
|:------------------------:|:-----------:|:-----------:|:------:|:-----------------:|:-----------:|
| Language | Rust | Rust | Python | Python/Obj-C | Python |
| Framework base language | C | Rust | C | C/C++/Obj-C/Swift | C |
| Framework | GTK 4 | Slint | PyGTK2 | Qt 5 (PyQt)/Cocoa | PyGTK3 |
| OS | Lin,Mac,Win | Lin,Mac,Win | Lin | Lin,Mac,Win | Lin,Mac,Win |
| Duplicate finder | ✔ | ✔ | ✔ | ✔ | |
| Empty files | ✔ | ✔ | ✔ | | |
| Empty folders | ✔ | ✔ | ✔ | | |
| Temporary files | ✔ | ✔ | ✔ | | ✔ |
| Big files | ✔ | ✔ | | | |
| Similar images | ✔ | ✔ | | ✔ | |
| Similar videos | ✔ | ✔ | | | |
| Music duplicates(tags) | ✔ | ✔ | | ✔ | |
| Invalid symlinks | ✔ | ✔ | ✔ | | |
| Broken files | ✔ | ✔ | | | |
| Names conflict | ✔ | ✔ | ✔ | | |
| Invalid names/extensions | ✔ | ✔ | ✔ | | |
| Installed packages | | | ✔ | | |
| Bad ID | | | ✔ | | |
| Non stripped binaries | | | ✔ | | |
| Redundant whitespace | | | ✔ | | |
| Overwriting files | | | ✔ | | ✔ |
| Multiple languages | ✔ | | ✔ | ✔ | ✔ |
| Cache support | ✔ | ✔ | | ✔ | |
| In active development | Yes | Yes | No | Yes | Yes |

## Other apps
There are many similar applications to Czkawka on the Internet, which do some things better and some things worse:
Expand All @@ -123,6 +136,7 @@ There are many similar applications to Czkawka on the Internet, which do some th
- [FSlint](https://github.com/pixelb/fslint) - A little outdated, but still have some tools not available in Czkawka
- [AntiDupl.NET](https://github.com/ermig1979/AntiDupl) - Shows a lot of metadata of compared images
- [Video Duplicate Finder](https://github.com/0x90d/videoduplicatefinder) - Finds similar videos(surprising, isn't it), supports video thumbnails

### CLI
Due to limited time, the biggest emphasis is on the GUI version so if you are looking for really good and feature-packed console apps, then take a look at these:
- [Fclones](https://github.com/pkolaczk/fclones) - One of the fastest tools to find duplicates; it is written also in Rust
Expand Down
8 changes: 4 additions & 4 deletions ci_tester/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -319,12 +319,12 @@ fn collect_all_files_and_dirs(dir: &str) -> std::io::Result<CollectedFiles> {
let path = entry.path();

if path.is_dir() {
folders.insert(path.display().to_string());
folders_to_check.push(path.display().to_string());
folders.insert(path.to_string_lossy().to_string());
folders_to_check.push(path.to_string_lossy().to_string());
} else if path.is_symlink() {
symlinks.insert(path.display().to_string());
symlinks.insert(path.to_string_lossy().to_string());
} else if path.is_file() {
files.insert(path.display().to_string());
files.insert(path.to_string_lossy().to_string());
} else {
panic!("Unknown type of file {:?}", path);
}
Expand Down
2 changes: 1 addition & 1 deletion czkawka_cli/src/commands.rs
Original file line number Diff line number Diff line change
Expand Up @@ -675,7 +675,7 @@ pub struct FileToSave {

#[derive(Debug, clap::Args)]
pub struct JsonCompactFileToSave {
#[clap(short, long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
#[clap(short = 'C', long, value_name = "json-file-name", help = "Saves the results into the compact json file")]
pub compact_file_to_save: Option<PathBuf>,
}

Expand Down
2 changes: 1 addition & 1 deletion czkawka_core/src/bad_extensions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ impl PrintResults for BadExtensions {
writeln!(writer, "Found {} files with invalid extension.\n", self.information.number_of_files_with_bad_extension)?;

for file_entry in &self.bad_extensions_files {
writeln!(writer, "{} ----- {}", file_entry.path.display(), file_entry.proper_extensions)?;
writeln!(writer, "{:?} ----- {}", file_entry.path, file_entry.proper_extensions)?;
}

Ok(())
Expand Down
42 changes: 15 additions & 27 deletions czkawka_core/src/big_file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ use log::debug;
use rayon::prelude::*;
use serde::{Deserialize, Serialize};

use crate::common::{check_folder_children, check_if_stop_received, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, split_path};
use crate::common_dir_traversal::{common_read_dir, get_lowercase_name, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common::{check_folder_children, check_if_stop_received, prepare_thread_handler_common, send_info_and_wait_for_ending_all_threads, split_path_compare};
use crate::common_dir_traversal::{common_read_dir, get_modified_time, CheckingMethod, ProgressData, ToolType};
use crate::common_tool::{CommonData, CommonToolData, DeleteMethod};
use crate::common_traits::{DebugPrint, PrintResults};

Expand Down Expand Up @@ -68,13 +68,9 @@ impl BigFile {

#[fun_time(message = "look_for_big_files", level = "debug")]
fn look_for_big_files(&mut self, stop_receiver: Option<&Receiver<()>>, progress_sender: Option<&Sender<ProgressData>>) -> bool {
let mut folders_to_check: Vec<PathBuf> = Vec::with_capacity(1024 * 2);
let mut old_map: BTreeMap<u64, Vec<FileEntry>> = Default::default();

// Add root folders for finding
for id in &self.common_data.directories.included_directories {
folders_to_check.push(id.clone());
}
let mut folders_to_check: Vec<PathBuf> = self.common_data.directories.included_directories.clone();

let (progress_thread_handle, progress_thread_run, atomic_counter, _check_was_stopped) =
prepare_thread_handler_common(progress_sender, 0, 0, 0, CheckingMethod::None, self.common_data.tool_type);
Expand All @@ -87,13 +83,13 @@ impl BigFile {
}

let segments: Vec<_> = folders_to_check
.par_iter()
.into_par_iter()
.map(|current_folder| {
let mut dir_result = vec![];
let mut warnings = vec![];
let mut fe_result = vec![];

let Some(read_dir) = common_read_dir(current_folder, &mut warnings) else {
let Some(read_dir) = common_read_dir(&current_folder, &mut warnings) else {
return (dir_result, warnings, fe_result);
};

Expand All @@ -110,22 +106,22 @@ impl BigFile {
check_folder_children(
&mut dir_result,
&mut warnings,
current_folder,
&current_folder,
&entry_data,
self.common_data.recursive_search,
&self.common_data.directories,
&self.common_data.excluded_items,
);
} else if file_type.is_file() {
self.collect_file_entry(&atomic_counter, &entry_data, &mut fe_result, &mut warnings, current_folder);
self.collect_file_entry(&atomic_counter, &entry_data, &mut fe_result, &mut warnings, &current_folder);
}
}
(dir_result, warnings, fe_result)
})
.collect();

// Advance the frontier
folders_to_check.clear();
let required_size = segments.iter().map(|(segment, _, _)| segment.len()).sum::<usize>();
folders_to_check = Vec::with_capacity(required_size);

// Process collected data
for (segment, warnings, fe_result) in segments {
Expand Down Expand Up @@ -155,12 +151,7 @@ impl BigFile {
current_folder: &Path,
) {
atomic_counter.fetch_add(1, Ordering::Relaxed);

let Some(file_name_lowercase) = get_lowercase_name(entry_data, warnings) else {
return;
};

if !self.common_data.allowed_extensions.matches_filename(&file_name_lowercase) {
if !self.common_data.allowed_extensions.check_if_entry_ends_with_extension(entry_data) {
return;
}

Expand All @@ -178,9 +169,9 @@ impl BigFile {
}

let fe: FileEntry = FileEntry {
path: current_file_name.clone(),
size: metadata.len(),
modified_date: get_modified_time(&metadata, warnings, &current_file_name, false),
path: current_file_name,
size: metadata.len(),
};

fe_result.push((fe.size, fe));
Expand All @@ -198,10 +189,7 @@ impl BigFile {
for (_size, mut vector) in iter {
if self.information.number_of_real_files < self.number_of_files_to_check {
if vector.len() > 1 {
vector.sort_unstable_by_key(|e| {
let t = split_path(e.path.as_path());
(t.0, t.1)
});
vector.sort_unstable_by(|a, b| split_path_compare(a.path.as_path(), b.path.as_path()));
}
for file in vector {
if self.information.number_of_real_files < self.number_of_files_to_check {
Expand All @@ -222,7 +210,7 @@ impl BigFile {
DeleteMethod::Delete => {
for file_entry in &self.big_files {
if fs::remove_file(&file_entry.path).is_err() {
self.common_data.text_messages.warnings.push(file_entry.path.display().to_string());
self.common_data.text_messages.warnings.push(file_entry.path.to_string_lossy().to_string());
}
}
}
Expand Down Expand Up @@ -271,7 +259,7 @@ impl PrintResults for BigFile {
writeln!(writer, "{} the smallest files.\n\n", self.information.number_of_real_files)?;
}
for file_entry in &self.big_files {
writeln!(writer, "{} ({}) - {}", format_size(file_entry.size, BINARY), file_entry.size, file_entry.path.display())?;
writeln!(writer, "{} ({}) - {:?}", format_size(file_entry.size, BINARY), file_entry.size, file_entry.path)?;
}
} else {
write!(writer, "Not found any files.").unwrap();
Expand Down
Loading