Skip to content

Commit 7389ccb

Browse files
authored
Merge pull request #74 from elisemercury/dev-v4
Updates to v4
2 parents 96478f5 + 497a091 commit 7389ccb

23 files changed

+975
-679
lines changed

.readthedocs.yaml

+4-3
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,11 @@ build:
1414
sphinx:
1515
configuration: docs/conf.py
1616

17-
#formats:
18-
# - pdf
17+
formats:
18+
- pdf
19+
- epub
1920

2021
# Python requirements
2122
python:
2223
install:
23-
- requirements: requirements.txt
24+
- requirements: docs/requirements.txt

MANIFEST.in

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include difPy/difPy.bat

README.md

+91-51
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
<p align="center">
2+
<img src="static/difPy_logo_3.png" width="300" title="Example Output: Duplicate Image Finder">
3+
</p>
4+
15
# Duplicate Image Finder (difPy)
26

37
[![PyPIv](https://img.shields.io/pypi/v/difPy)](https://pypi.org/project/difPy/)
@@ -16,11 +20,9 @@
1620
pip install difPy
1721
```
1822

19-
> :iphone: **Try the new [difPy Web App](https://difpy.app/)**!
20-
21-
> :point_right: :new: With **difPy v3.x** you can count on signifcant **performance increases**, **new features** and **bug fixes**. Check out the [release notes](https://github.com/elisemercury/Duplicate-Image-Finder/releases/) for details.
23+
> :point_right: :new: **difPy v4** is out! difPy v4 comes with up to **10x more performace** than previous difPy versions. Check out the [release notes](https://github.com/elisemercury/Duplicate-Image-Finder/releases/) for details.
2224
23-
> :open_hands: Our motto? We :heart: Open Source! **Contributions and new ideas for difPy are always welcome** - check our [Contributor Guidelines](https://github.com/elisemercury/Duplicate-Image-Finder/wiki/Contributing-to-difPy) for more information.
25+
> :open_hands: Our motto? We :heart: Open Source! **Contributions and new ideas for difPy are always welcome** - check our [Contributor Guidelines](https://difpy.readthedocs.io/en/latest/contributing.html) for more information.
2426
2527
Read more on how the algorithm of difPy works in my Medium article [Finding Duplicate Images with Python](https://towardsdatascience.com/finding-duplicate-images-with-python-71c04ec8051).
2628

@@ -31,32 +33,43 @@ Check out the [difPy package on PyPI.org](https://pypi.org/project/difPy/)
3133
## Description
3234
difPy searches for images in **one or more different folders**, compares the images it found and checks whether these are duplicates. It then outputs the **image files classified as duplicates** as well as the **images having the lowest resolutions**, so you know which of the duplicate images are safe to be deleted. You can then either delete them manually, or let difPy delete them for you.
3335

34-
<p align="center">
35-
<img src="example_output.png" width="400" title="Example Output: Duplicate Image Finder">
36-
</p>
37-
3836
difPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content - this allows difPy to **not only search for duplicate images, but also for similar images**.
3937

38+
difPy leverages Python's **multiprocessing capabilities** and is therefore able to perform at high performance even on large datasets.
39+
40+
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
41+
42+
## Table of Contents
43+
1. [Basic Usage](https://github.com/elisemercury/Duplicate-Image-Finder#basic-usage)
44+
2. [Output](https://github.com/elisemercury/Duplicate-Image-Finder#output)
45+
3. [Additional Parameters](https://github.com/elisemercury/Duplicate-Image-Finder#additional-parameters)
46+
4. [CLI Usage](https://github.com/elisemercury/Duplicate-Image-Finder#cli-usage)
47+
5. [difPy Web App](https://github.com/elisemercury/Duplicate-Image-Finder#difpy-web-app)
48+
4049
## Basic Usage
4150
To make difPy search for duplicates **within one folder**:
4251

4352
```python
44-
from difPy import dif
45-
search = dif("C:/Path/to/Folder/")
53+
import difPy
54+
dif = difPy.build("C:/Path/to/Folder/")
55+
search = difPy.search(dif)
4656
```
4757
To search for duplicates **within multiple folders**:
4858

4959
```python
50-
from difPy import dif
51-
search = dif(["C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", "C:/Path/to/Folder_C/", ... ])
60+
import difPy
61+
dif = difPy.build(["C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", "C:/Path/to/Folder_C/", ... ])
62+
search = difPy.search(dif)
5263
```
53-
Folder paths can be specified as standalone Python strings, or within a list.
64+
65+
Folder paths can be specified as standalone Python strings, or within a list. With `difPy.build()`, difPy first scans the images in the provided folders and builds a collection of images by generating image tensors. `difPy.search()` then starts the search for duplicate images.
5466

5567
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
5668

5769
## Output
5870
difPy returns various types of output that you may use depending on your use case:
5971

72+
### I. Search Result Dictionary
6073
A **JSON formatted collection** of duplicates/similar images (i. e. **match groups**) that were found, where the keys are a **randomly generated unique id** for each image file:
6174

6275
```python
@@ -73,46 +86,65 @@ search.result
7386
}
7487
```
7588

76-
A **list** of duplicates/similar images that have the **lowest quality** among match groups:
89+
### II. Lower Quality Files
90+
A **JSON formatted collection** of duplicates/similar images that have the **lowest quality** among match groups:
7791

7892
```python
7993
search.lower_quality
8094

8195
> Output:
82-
["C:/Path/to/Image/duplicate_image1.jpg",
83-
"C:/Path/to/Image/duplicate_image2.jpg", ...]
96+
{"lower_quality" : ["C:/Path/to/Image/duplicate_image1.jpg",
97+
"C:/Path/to/Image/duplicate_image2.jpg", ...]}
8498
```
85-
A **JSON formatted collection** with statistics on the completed difPy process:
99+
100+
Lower quality images then can be **moved** to a different location:
101+
102+
```python
103+
search.move_to(destination_path="C:/Path/to/Destination/")
104+
```
105+
Or **deleted**:
106+
107+
```python
108+
search.delete(silent_del=False)
109+
```
110+
111+
### III. Process Statistics
112+
113+
A **JSON formatted collection** with statistics on the completed difPy processes:
86114

87115
```python
88116
search.stats
89117

90118
> Output:
91119
{"directory" : ("C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", ... ),
92-
"duration" : {"start_date" : "2023-02-15",
93-
"start_time" : "18:44:19",
94-
"end_date" : "2023-02-15",
95-
"end_time" : "18:44:38",
96-
"seconds_elapsed" : 18.6113},
97-
"fast_search" : True,
98-
"recursive" : True,
99-
"match_mse" : 0,
100-
"px_size" : 50,
101-
"files_searched" : 1032,
102-
"matches_found" : {"duplicates" : 52,
103-
"similar" : 0},
104-
"invalid_files" : {"count" : 4},
105-
"deleted_files" : {"count" : 0},
106-
"skipped_files" : {"count" : 0}}
120+
"process" : {"build" : {"duration" : {"start" : "2023-08-28T21:22:48.691008",
121+
"end" : "2023-08-28T21:23:59.104351",
122+
"seconds_elapsed" : "70.4133"},
123+
"parameters" : {"recursive" : True,
124+
"in_folder" : False,
125+
"limit_extensions" : True,
126+
"px_size" : 50}},
127+
"search" : {"duration" : {"start" : "2023-08-28T21:23:59.106351",
128+
"end" : "2023-08-28T21:25:17.538015",
129+
"seconds_elapsed" : "78.4317"},
130+
"parameters" : {"similarity_mse" : 0}
131+
"files_searched" : 5225,
132+
"matches_found" : {"duplicates" : 5,
133+
"similar" : 0}}}
134+
"invalid_files" : {"count" : 230,
135+
"logs" : {...}}}
107136
```
108137

109138
## Additional Parameters
110139
difPy supports the following parameters:
111140

112141
```python
113-
dif(*directory, fast_search=True, recursive=True, similarity='duplicates', px_size=50, move_to=None,
114-
limit_extensions=False, show_progress=True, show_output=False, delete=False, silent_del=False,
115-
logs=False)
142+
difPy.build(*directory, recursive=True, in_folder=False, limit_extensions=True,
143+
px_size=50, show_progress=False, logs=True)
144+
```
145+
146+
```python
147+
difPy.search(difpy_obj, similarity='duplicates', show_progress=False, logs=True)
116148
```
117149

118150
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
@@ -121,41 +153,49 @@ dif(*directory, fast_search=True, recursive=True, similarity='duplicates', px_si
121153
difPy can also be invoked through the CLI by using the following commands:
122154

123155
```python
156+
python dif.py #working directory
157+
124158
python dif.py -D "C:/Path/to/Folder/"
125159

126160
python dif.py -D "C:/Path/to/Folder_A/" "C:/Path/to/Folder_B/" "C:/Path/to/Folder_C/"
127161
```
128-
It supports the following arguments:
162+
163+
> :point_right: Windows users can add difPy to their [PATH system variables](https://www.computerhope.com/issues/ch000549.htm) by pointing it to their difPy package installation folder containing the [`difPy.bat`](https://github.com/elisemercury/Duplicate-Image-Finder/difPy/difPy.bat) file. This adds `difPy` as a command in the CLI and will allow direct invocation of `difPy` from anywhere on the device.
164+
165+
difPy CLI supports the following arguments:
129166

130167
```python
131-
dif.py [-h] -D DIRECTORY [-Z OUTPUT_DIRECTORY] [-f {True,False}] [-r {True,False}] [-s SIMILARITY]
132-
[-px PX_SIZE] [-mv MOVE_TO] [-le {True,False}] [-p {True,False}] [-o {True,False}]
133-
[-d {True,False}] [-sd {True,False}] [-l {True,False}]
168+
dif.py [-h] [-D DIRECTORY [DIRECTORY ...]] [-Z OUTPUT_DIRECTORY]
169+
[-r {True,False}] [-i {True,False}] [-le {True,False}]
170+
[-px PX_SIZE] [-p {True,False}] [-s SIMILARITY]
171+
[-mv MOVE_TO] [-d {True,False}] [-sd {True,False}]
172+
[-l {True,False}]
134173
```
135174

136175
| | Parameter | | Parameter |
137176
| :---: | ------ | :---: | ------ |
138-
| `-D` | directory | `-p` | show_progress |
139-
| `-Z` | output_directory | `-o` | show_output |
140-
| `-f`| fast_search | `-mv` | move_to |
141-
| `-r`| recursive | `-d` | delete |
142-
| `-s` | similarity | `-sd` | silent_del |
177+
| `-D` | directory | `-le` | limit_extensions
178+
| `-Z` | output_directory | `-p` | show_progress |
179+
| `-r`| recursive | `-mv` | move_to |
180+
| `-i`| in_folder | `-d` | delete |
181+
| `-s`| similarity | `-sd` | silent_del |
143182
| `-px` | px_size | `-l` | logs |
144-
| `-le` | limit_extensions | | |
145183

146-
When running from the CLI, the output of difPy is written to files and saved in the working directory by default. To change the default output directory, specify the `-Z / -output_directory` parameter. The "xxx" in the output filenames is a unique timestamp:
184+
If no directory parameter is given in the CLI, difPy will **run on the current working directory**.
185+
186+
When running from the CLI, the output of difPy is written to files and **saved in the working directory** by default. To change the default output directory, specify the `-Z / -output_directory` parameter. The "xxx" in the output filenames is the current timestamp:
147187

148188
```python
149-
difPy_results_xxx.json
150-
difPy_lower_quality_xxx.csv
151-
difPy_stats_xxx.json
189+
difPy_xxx_results.json
190+
difPy_xxx_lower_quality.json
191+
difPy_xxx_stats.json
152192
```
153193

154194
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
155195

156196
## difPy Web App
157197

158-
difPy can also be accessed via its new **web interface**. With difPy Web, you can compare **up to 200 images** and download a **deduplicated ZIP file** - all powered by difPy. [Read more](https://github.com/elisemercury/difPy-app).
198+
difPy can also be accessed via a browser. With difPy Web, you can compare **up to 200 images** and download a **deduplicated ZIP file** - all powered by difPy. [Read more](https://github.com/elisemercury/difPy-app).
159199

160200
:iphone: **Try the new [difPy Web App](https://difpy.app/)**!
161201

@@ -166,5 +206,5 @@ difPy can also be accessed via its new **web interface**. With difPy Web, you ca
166206
-------
167207

168208
<p align="center"><b>
169-
We :heart: Open Source
209+
:heart: Open Source
170210
</b></p>

difPy/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
from .version import __version__
2-
from .dif import dif
2+
from .dif import build, search

0 commit comments

Comments
 (0)