Skip to content

Commit b0f8851

Browse files
authored
Merge pull request #107 from elisemercury/v4.1.3
V4.1.3
2 parents f9eeafb + e359b15 commit b0f8851

16 files changed

+150
-415
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Folder paths can be specified as standalone Python strings, or within a list. Wi
6969
## Output
7070
difPy returns various types of output that you may use depending on your use case:
7171

72-
### I. Search Result Dictionary
72+
### I. Search Result
7373
A **JSON formatted collection** of duplicates/similar images (i. e. **match groups**) that were found. Each match group has a primary image (the key of the dictionary) which holds the list of its duplicates including their filename and MSE (Mean Squared Error). The lower the MSE, the more similar the primary image and the matched images are. Therefore, an MSE of 0 indicates that two images are exact duplicates.
7474

7575
```python
@@ -84,7 +84,7 @@ search.result
8484
```
8585

8686
### II. Lower Quality Files
87-
A **list** of duplicates/similar images that have the **lowest quality** among match groups:
87+
A **list** of duplicates/similar images that have the **lowest quality** (image resolution) among match groups:
8888

8989
```python
9090
search.lower_quality
@@ -105,7 +105,7 @@ Or **deleted**:
105105
search.delete(silent_del=False)
106106
```
107107

108-
### III. Process Statistics
108+
### III. Search Statistics
109109

110110
A **JSON formatted collection** with statistics on the completed difPy processes:
111111

difPy/dif.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,7 @@ def _search_union(self):
349349

350350
# format the end result
351351
result = self._group_result_union(result_raw)
352+
352353
return result
353354

354355
def _search_infolder(self):
@@ -734,9 +735,11 @@ def _sort_imgs_by_size(img_list):
734735
# Function for sorting a list of images based on their file sizes
735736
imgs_sizes = []
736737
for img in img_list:
737-
img_size = (os.stat(str(img)).st_size, img)
738+
with Image.open(img) as image:
739+
resolution = image.size
740+
img_size = (sum(resolution), img)
738741
imgs_sizes.append(img_size)
739-
sort_by_size = [file for size, file in sorted(imgs_sizes, reverse=True)]
742+
sort_by_size = [file for size, file in sorted(imgs_sizes, reverse=True)] # Highest first
740743
return sort_by_size
741744

742745
class _generate_stats:

difPy/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '4.1.2'
1+
__version__ = '4.1.3'

docs/app.rst

-67
This file was deleted.

docs/concepts.rst

-10
This file was deleted.

docs/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
copyright = '2024, Elise Landman'
77
author = 'Elise Landman'
88

9-
release = 'v4.1.2'
10-
version = 'v4.1.2'
9+
release = 'v4.1.3'
10+
version = 'v4.1.3'
1111

1212
# -- General configuration
1313

docs/output/lower_quality.rst

+22-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,28 @@
1+
.. _search.lower_quality:
2+
3+
Lower Quality Files
4+
^^^^^^^^^^
5+
6+
A **list** of duplicates/similar images that have the **lowest resolution** among match groups:
7+
18
.. code-block:: python
29
310
search.lower_quality
411
512
> Output:
613
['C:/Path/duplicate_image1.jpg',
7-
'C:/Path/duplicate_image2.jpg', ...]
14+
'C:/Path/duplicate_image2.jpg', ...]
15+
16+
To find the lower quality images, difPy compares the **image resolutions** (pixel width x pixel height) within a match group and selects all images that have lowest image file resolutions among the group.
17+
18+
Lower quality images then can be **moved** to a different location (see :ref:`search.move_to`):
19+
20+
.. code-block:: python
21+
22+
search.move_to(destination_path='C:/Path/to/Destination/')
23+
24+
Or **deleted** (see :ref:`search.delete`):
25+
26+
.. code-block:: python
27+
28+
search.delete(silent_del=False)

docs/output/main.rst

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _output:
2+
3+
Output
4+
----------------
5+
6+
difPy returns various types of output:

docs/output/result.rst

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
.. _search.result:
2+
3+
Search Result
4+
^^^^^^^^^^
5+
6+
A **dictionary** of duplicates/similar images (i. e. **match groups**) that were found. Each match group has a primary image (the key of the dictionary) which holds the list of its duplicates including their filename and MSE (Mean Squared Error). The lower the MSE, the more similar the primary image and the matched images are. Therefore, an MSE of 0 indicates that two images are exact duplicates.
7+
18
.. code-block:: python
29
310
search.result

docs/output/result_infolder.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
When :ref:`in_folder` is set to ``True``, the result output is slightly modified and matches are grouped in their separate folders, with the key of the dictionary being the folder path.
2+
13
.. code-block:: python
24
35
search.result

docs/output/stats.rst

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
.. _search.stats:
2+
3+
Search Statistics
4+
^^^^^^^^^^
5+
6+
A **JSON formatted collection** with statistics on the completed difPy process:
7+
18
.. code-block:: python
29
310
search.stats

docs/parameters/in_folder.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
in_folder (bool)
22
++++++++++++
33

4-
By default, difPy will search for matches in the union of all directories specified in the :ref:`directory` parameter. To have difPy only search for matches within each folder separately, set ``in_folder`` to ``True``.
4+
By default, difPy will search for matches in the union of all directories specified in the :ref:`directory` parameter. To have difPy only search for matches within each folder separately, set ``in_folder`` to ``True``. The structure of the ``search.result`` output will be slightly different if ``in_folder`` is set to ``True`` (see :ref:`output`).
55

66
``True`` = searches for matches only among each individual directory, including subdirectories
77

docs/parameters.rst renamed to docs/parameters/main.rst

+9-10
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1+
.. _parameters:
2+
13
Parameters
2-
=====
4+
----------------
35

4-
.. _parameters:
56
.. _difPy.build:
67

78
difPy.build
8-
------------
9+
^^^^^^^^^^
910

1011
Before difPy can perform any search, it needs to build its image repository and transform the images in the provided directory into tensors. This is what is done when ``difPy.build()`` is invoked.
1112

@@ -73,7 +74,7 @@ Upon completion, ``difPy.build()`` returns a ``dif`` object that can be used in
7374
.. _difPy.search:
7475

7576
difPy.search
76-
------------
77+
^^^^^^^^^^
7778

7879
After the ``dif`` object has been built using :ref:`difPy.build`, the search can be initiated with ``difPy.search``.
7980

@@ -103,20 +104,18 @@ After the search is completed, further actions can be performed using :ref:`sear
103104
.. _difPy_obj:
104105

105106
difPy_obj
106-
^^^^^^^^^^^^
107+
++++++++++++
107108

108109
The required ``difPy_obj`` parameter should be pointing to the ``dif`` object that was built during the invocation of :ref:`difPy.build`.
109110

110111
.. _similarity:
111112

112113
.. include:: /parameters/similarity.rst
113114

114-
115115
.. _lazy:
116116

117117
.. include:: /parameters/lazy.rst
118118

119-
120119
.. _rotate:
121120

122121
.. include:: /parameters/rotate.rst
@@ -144,7 +143,7 @@ The required ``difPy_obj`` parameter should be pointing to the ``dif`` object th
144143
.. _search.move_to:
145144

146145
search.move_to
147-
------------
146+
^^^^^^^^^^
148147

149148
difPy can automatically move the lower quality duplicate/similar images it found to another directory. Images can be moved by invoking ``move_to`` on the difPy search:
150149

@@ -171,11 +170,11 @@ difPy can automatically move the lower quality duplicate/similar images it found
171170
.. _search.delete:
172171

173172
search.delete
174-
------------
173+
^^^^^^^^^^
175174

176175
difPy can automatically delete the lower quality duplicate/similar images it found. Images can be deleted by invoking ``delete`` on the difPy search:
177176

178-
.. note::
177+
.. warning::
179178

180179
Please use with care, as this cannot be undone.
181180

0 commit comments

Comments
 (0)