Skip to content

Commit 6445ef0

Browse files
committed
Updates for v4
1 parent 5641078 commit 6445ef0

File tree

6 files changed

+707
-452
lines changed

6 files changed

+707
-452
lines changed

README.md

+82-46
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
<p align="center">
2+
<img src="static/difPy_logo_3.png" width="300" title="Example Output: Duplicate Image Finder">
3+
</p>
4+
15
# Duplicate Image Finder (difPy)
26

37
[![PyPIv](https://img.shields.io/pypi/v/difPy)](https://pypi.org/project/difPy/)
@@ -16,9 +20,7 @@
1620
pip install difPy
1721
```
1822

19-
> :iphone: **Try the new [difPy Web App](https://difpy.app/)**!
20-
21-
> :point_right: :new: With **difPy v3.x** you can count on signifcant **performance increases**, **new features** and **bug fixes**. Check out the [release notes](https://github.com/elisemercury/Duplicate-Image-Finder/releases/) for details.
23+
> :point_right: :new: **difPy v4-beta** is out! difPy v4 is up to **10x as fast** as previous difpy versions. Check out the [release notes](https://github.com/elisemercury/Duplicate-Image-Finder/releases/) for details.
2224
2325
> :open_hands: Our motto? We :heart: Open Source! **Contributions and new ideas for difPy are always welcome** - check our [Contributor Guidelines](https://github.com/elisemercury/Duplicate-Image-Finder/wiki/Contributing-to-difPy) for more information.
2426
@@ -31,32 +33,40 @@ Check out the [difPy package on PyPI.org](https://pypi.org/project/difPy/)
3133
## Description
3234
difPy searches for images in **one or more different folders**, compares the images it found and checks whether these are duplicates. It then outputs the **image files classified as duplicates** as well as the **images having the lowest resolutions**, so you know which of the duplicate images are safe to be deleted. You can then either delete them manually, or let difPy delete them for you.
3335

34-
<p align="center">
35-
<img src="example_output.png" width="400" title="Example Output: Duplicate Image Finder">
36-
</p>
37-
3836
difPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content - this allows difPy to **not only search for duplicate images, but also for similar images**.
3937

38+
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
39+
40+
## Table of Contents
41+
1. [Basic Usage](https://github.com/elisemercury/Duplicate-Image-Finder#basic-usage)
42+
2. [Output](https://github.com/elisemercury/Duplicate-Image-Finder#output)
43+
3. [Additional Parameters](https://github.com/elisemercury/Duplicate-Image-Finder#additional-parameters)
44+
4. [CLI Usage](https://github.com/elisemercury/Duplicate-Image-Finder#cli-usage)
45+
5. [difPy Web App](https://github.com/elisemercury/Duplicate-Image-Finder#difpy-web-app)
46+
4047
## Basic Usage
4148
To make difPy search for duplicates **within one folder**:
4249

4350
```python
44-
from difPy import dif
45-
search = dif("C:/Path/to/Folder/")
51+
import difPy
52+
dif = difPy.build("C:/Path/to/Folder/")
53+
search = difPy.search(dif)
4654
```
4755
To search for duplicates **within multiple folders**:
4856

4957
```python
50-
from difPy import dif
51-
search = dif(["C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", "C:/Path/to/Folder_C/", ... ])
58+
import difPy
59+
dif = difPy.build(["C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", "C:/Path/to/Folder_C/", ... ])
60+
search = difPy.search(dif)
5261
```
53-
Folder paths can be specified as standalone Python strings, or within a list.
62+
Folder paths can be specified as standalone Python strings, or within a list. `difPy.build()` first builds a collection of images by scanning the provided folders and generating image tensors. `difPy.search()` then starts the search for duplicate image.
5463

5564
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
5665

5766
## Output
5867
difPy returns various types of output that you may use depending on your use case:
5968

69+
### I. Result Dictionary
6070
A **JSON formatted collection** of duplicates/similar images (i. e. **match groups**) that were found, where the keys are a **randomly generated unique id** for each image file:
6171

6272
```python
@@ -72,7 +82,7 @@ search.result
7282
...
7383
}
7484
```
75-
85+
### II. Lower Quality List
7686
A **list** of duplicates/similar images that have the **lowest quality** among match groups:
7787

7888
```python
@@ -82,37 +92,57 @@ search.lower_quality
8292
["C:/Path/to/Image/duplicate_image1.jpg",
8393
"C:/Path/to/Image/duplicate_image2.jpg", ...]
8494
```
85-
A **JSON formatted collection** with statistics on the completed difPy process:
95+
96+
Lower quality images then can be moved to a different location:
97+
98+
```python
99+
from difPy.actions import move_to
100+
move_to(search, destination_path="C:/Path/to/Destination/")
101+
```
102+
Or automcatically deleted:
103+
104+
```python
105+
from difPy.actions import delete
106+
delete(search, silent_del=False)
107+
```
108+
109+
### III. Statistics
110+
111+
A **JSON formatted collection** with statistics on the completed difPy processes:
86112

87113
```python
88114
search.stats
89115

90116
> Output:
91117
{"directory" : ("C:/Path/to/Folder_A/", "C:/Path/to/Folder_B/", ... ),
92-
"duration" : {"start_date" : "2023-02-15",
93-
"start_time" : "18:44:19",
94-
"end_date" : "2023-02-15",
95-
"end_time" : "18:44:38",
96-
"seconds_elapsed" : 18.6113},
97-
"fast_search" : True,
98-
"recursive" : True,
99-
"match_mse" : 0,
100-
"px_size" : 50,
101-
"files_searched" : 1032,
102-
"matches_found" : {"duplicates" : 52,
103-
"similar" : 0},
104-
"invalid_files" : {"count" : 4},
105-
"deleted_files" : {"count" : 0},
106-
"skipped_files" : {"count" : 0}}
118+
"process" : {"build" : {"duration" : {"start" : "2023-08-27T22:41:42.741440",
119+
"end" : "2023-08-27T22:42:45.781104",
120+
"seconds_elapsed" : "0.185"},
121+
"parameters" : {"recursive" : True,
122+
"in_folder" : False,
123+
"limit_extensions" : True,
124+
"px_size" : 50}},
125+
"search" : {"duration" : {"start" : "2023-08-27T22:41:42.741440",
126+
"end" : "2023-08-27T22:42:45.781104",
127+
"seconds_elapsed" : "0.185"},
128+
"parameters" : {"similarity_mse" : 0}
129+
"files_searched" : 537,
130+
"matches_found" : {"duplicates" : 5,
131+
"similar" : 0}}}
132+
"invalid_files" : {'count' : 5,
133+
'logs' : {}}})
107134
```
108135

109136
## Additional Parameters
110137
difPy supports the following parameters:
111138

112139
```python
113-
dif(*directory, fast_search=True, recursive=True, similarity='duplicates', px_size=50, move_to=None,
114-
limit_extensions=False, show_progress=True, show_output=False, delete=False, silent_del=False,
115-
logs=False)
140+
difPy.build(*directory, recursive=True, in_folder=False, limit_extensions=True,
141+
px_size=50, show_progress=False, logs=True)
142+
```
143+
144+
```python
145+
difPy.search(difpy_obj, similarity='duplicates', show_progress=False, logs=True)
116146
```
117147

118148
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
@@ -121,41 +151,47 @@ dif(*directory, fast_search=True, recursive=True, similarity='duplicates', px_si
121151
difPy can also be invoked through the CLI by using the following commands:
122152

123153
```python
154+
python dif.py #working directory
155+
124156
python dif.py -D "C:/Path/to/Folder/"
125157

126158
python dif.py -D "C:/Path/to/Folder_A/" "C:/Path/to/Folder_B/" "C:/Path/to/Folder_C/"
127159
```
128-
It supports the following arguments:
160+
161+
> :point_right: Windows users can add difPy to their [PATH system variables](https://www.computerhope.com/issues/ch000549.htm) by pointing it to their difPy package installation folder containing the [`difPy.bat`](https://github.com/elisemercury/Duplicate-Image-Finder/difPy/difPy.bat) file. This adds `difPy` as a command in the CLI and will allow direct invocation of `difPy` from anywhere on the device.
162+
163+
difPy CLI supports the following arguments:
129164

130165
```python
131-
dif.py [-h] -D DIRECTORY [-Z OUTPUT_DIRECTORY] [-f {True,False}] [-r {True,False}] [-s SIMILARITY]
132-
[-px PX_SIZE] [-mv MOVE_TO] [-le {True,False}] [-p {True,False}] [-o {True,False}]
133-
[-d {True,False}] [-sd {True,False}] [-l {True,False}]
166+
dif.py [-h] [-D DIRECTORY] [-Z OUTPUT_DIRECTORY] [-r {True,False}] [-s SIMILARITY] [-px PX_SIZE]
167+
[-mv MOVE_TO] [-le {True,False}] [-p {True,False}] [-d {True,False}] [-sd {True,False}]
168+
[-l {True,False}]
134169
```
135170

136171
| | Parameter | | Parameter |
137172
| :---: | ------ | :---: | ------ |
138173
| `-D` | directory | `-p` | show_progress |
139-
| `-Z` | output_directory | `-o` | show_output |
140-
| `-f`| fast_search | `-mv` | move_to |
174+
| `-Z` | output_directory | `-mv` | move_to |
141175
| `-r`| recursive | `-d` | delete |
142-
| `-s` | similarity | `-sd` | silent_del |
176+
| `-s`| similarity | `-sd` | silent_del |
143177
| `-px` | px_size | `-l` | logs |
144-
| `-le` | limit_extensions | | |
178+
| `-le` | limit_extensions | | |
179+
180+
When no directory parameter is given in the CLI, difPy will **run on the current working directory**.
145181

146-
When running from the CLI, the output of difPy is written to files and saved in the working directory by default. To change the default output directory, specify the `-Z / -output_directory` parameter. The "xxx" in the output filenames is a unique timestamp:
182+
When running from the CLI, the output of difPy is written to files and **saved in the working directory** by default. To change the default output directory, specify the `-Z / -output_directory` parameter. The "xxx" in the output filenames is the current timestamp:
147183

148184
```python
149-
difPy_results_xxx.json
150-
difPy_lower_quality_xxx.csv
151-
difPy_stats_xxx.json
185+
difPy_xxx_results.json
186+
difPy_xxx_lower_quality.csv
187+
difPy_xxx_stats.json
152188
```
153189

154190
:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.
155191

156192
## difPy Web App
157193

158-
difPy can also be accessed via its new **web interface**. With difPy Web, you can compare **up to 200 images** and download a **deduplicated ZIP file** - all powered by difPy. [Read more](https://github.com/elisemercury/difPy-app).
194+
difPy can also be accessed via a browser. With difPy Web, you can compare **up to 200 images** and download a **deduplicated ZIP file** - all powered by difPy. [Read more](https://github.com/elisemercury/difPy-app).
159195

160196
:iphone: **Try the new [difPy Web App](https://difpy.app/)**!
161197

@@ -166,5 +202,5 @@ difPy can also be accessed via its new **web interface**. With difPy Web, you ca
166202
-------
167203

168204
<p align="center"><b>
169-
We :heart: Open Source
205+
:heart: Open Source
170206
</b></p>

difPy/actions.py

+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
'''
2+
difPy - Python package for finding duplicate and similar images
3+
2023 Elise Landman
4+
https://github.com/elisemercury/Duplicate-Image-Finder
5+
'''
6+
import os
7+
from pathlib import Path
8+
import numpy as np
9+
10+
class delete:
11+
'''
12+
A class used to delete difPy objects.
13+
'''
14+
def __init__(self, difpy_obj, silent_del=False):
15+
self.silent_del = _validate._silent_del(silent_del)
16+
print(difpy_obj.lower_quality)
17+
self.lower_quality = _validate._file_list(difpy_obj.lower_quality)
18+
self._main()
19+
20+
def _main(self):
21+
# Function for deleting the lower quality images that were found after the search
22+
deleted_files = []
23+
if not self.silent_del:
24+
usr = input('Are you sure you want to delete all lower quality matched images? \n! This cannot be undone. (y/n)')
25+
if str(usr).lower() == 'y':
26+
for file in self.lower_quality:
27+
print('\nDeletion in progress...', end='\r')
28+
try:
29+
os.remove(file)
30+
deleted_files.append(file)
31+
except:
32+
print(f'Could not delete file: {file}', end='\r')
33+
else:
34+
print('Deletion canceled.')
35+
return
36+
else:
37+
for file in self.lower_quality:
38+
print('\nDeletion in progress...', end='\r')
39+
try:
40+
os.remove(file)
41+
deleted_files.append(file)
42+
except:
43+
print(f'Could not delete file: {file}', end='\r')
44+
print(f'Deleted {len(deleted_files)} file(s).')
45+
return deleted_files
46+
47+
class move_to:
48+
'''
49+
A class used to move difPy objects.
50+
'''
51+
def __init__(self, difpy_obj, destination_path):
52+
self.destination_path = _validate._move_to(destination_path)
53+
self.lower_quality = difpy_obj.lower_quality
54+
self.lower_quality = self._main()
55+
56+
def _main(self):
57+
new_lower_quality = []
58+
for file in self.lower_quality:
59+
try:
60+
head, tail = os.path.split(file)
61+
os.replace(file, os.path.join(move_to, tail))
62+
new_lower_quality = np.append(new_lower_quality, str(Path(os.path.join(move_to, tail))))
63+
except:
64+
print(f'Could not move file: {file}', end='\r')
65+
print(f'Moved {len(self.lower_quality)} files(s) to {str(Path(move_to))}')
66+
return new_lower_quality
67+
68+
class _validate:
69+
'''
70+
A class used to validate action input parameters.
71+
'''
72+
def _silent_del(silent_del):
73+
# Function that _validates the 'delete' and the 'silent_del' input parameter
74+
if not isinstance(silent_del, bool):
75+
raise Exception('Invalid value for "silent_del" parameter: must be of type BOOL.')
76+
return silent_del
77+
78+
def _file_list(file_list):
79+
# Function that _validates the 'delete' and the 'silent_del' input parameter
80+
if not isinstance(file_list, list):
81+
raise Exception('Invalid value for "file_list" parameter: must be of type LIST.')
82+
return file_list
83+
84+
def _move_to(move_to):
85+
# Function that _validates the 'move_to' input parameter
86+
if not isinstance(move_to, str):
87+
if not move_to == None:
88+
raise Exception('Invalid value for "move_to" parameter: must be of type str or "None"')
89+
else:
90+
if not os.path.exists(dir):
91+
try:
92+
os.makedirs(dir)
93+
except:
94+
raise Exception(f'Invalid value for "move_to" parameter: "{str(dir)}" does not exist.')
95+
elif not os.path.isdir(dir):
96+
raise ValueError(f'Invalid value for "move_to" parameter: "{str(dir)}" is not a directory.')
97+
return move_to

0 commit comments

Comments
 (0)