Skip to content

Commit b0c3e89

Browse files
authored
Merge pull request #109 from elisemercury/docs
Revamped difpy documentation
2 parents c22be40 + 0968880 commit b0c3e89

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+544
-531
lines changed

docs/conf.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,11 @@
1818
'sphinx.ext.autosummary',
1919
'sphinx.ext.intersphinx',
2020
'sphinx_rtd_theme',
21+
#-- 'sphinxcontrib.googleanalytics'
2122
]
2223

24+
# -- googleanalytics_id = 'G-X002SSZTWC'
25+
2326
intersphinx_mapping = {
2427
'python': ('https://docs.python.org/3/', None),
2528
'sphinx': ('https://www.sphinx-doc.org/en/master/', None),
@@ -32,6 +35,8 @@
3235

3336
html_theme = 'sphinx_rtd_theme'
3437

38+
html_static_path = ['_static']
39+
3540
# -- Options for EPUB output
3641
epub_show_urls = 'footnote'
3742

docs/contributing.rst renamed to docs/contributing/contributing.rst

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,6 @@ Contributing to difPy
33

44
.. _Contributing:
55

6-
.. include:: /misc/support_difpy.rst
7-
86
difPy is constantly updated with code improvements, new features and requests from the community. Contributions are a good way to give feedback and to improve the functionalities and quality of the package.
97

108
**Do you feel like difPy is missing a certain feature? Or do you have an idea of how to improve difPy?**
@@ -35,8 +33,4 @@ Your pull request and implementation will be reviewed and approved if it passes
3533

3634
👉 comment your code |br|
3735
👉 follow the code style of the project, including indentation |br|
38-
👉 update the `README.md <https://github.com/elisemercury/Duplicate-Image-Finder/blob/main/README.md>`_ instructions
39-
40-
------------
41-
42-
.. include:: /misc/support_difpy.rst
36+
👉 update the `README.md <https://github.com/elisemercury/Duplicate-Image-Finder/blob/main/README.md>`_ instructions

docs/contributing/support.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Support difPy
2+
=====
3+
4+
.. _Support:
5+
6+
.. include:: /misc/support_difpy.rst
File renamed without changes.

docs/usage.rst renamed to docs/getting_started/cli_usage.rst

Lines changed: 1 addition & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,3 @@
1-
Using difPy
2-
=====
3-
4-
.. _using difPy:
5-
6-
**difPy** is a Python package that automates the search for duplicate/similar images.
7-
8-
.. include:: /using/installation.rst
9-
10-
.. include:: /using/basic_usage.rst
11-
12-
.. raw:: html
13-
14-
<hr>
15-
16-
171
.. _cli usage:
182

193
CLI Usage
@@ -66,27 +50,4 @@ The output of difPy is written to files and **saved in the working directory** b
6650
6751
difPy_xxx_results.json
6852
difPy_xxx_lower_quality.txt
69-
difPy_xxx_stats.json
70-
71-
72-
.. raw:: html
73-
74-
<hr>
75-
76-
77-
.. include:: /parameters/main.rst
78-
79-
80-
.. raw:: html
81-
82-
<hr>
83-
84-
.. include:: /output/main.rst
85-
86-
.. include:: /output/result.rst
87-
88-
.. include:: /output/result_infolder.rst
89-
90-
.. include:: /output/lower_quality.rst
91-
92-
.. include:: /output/stats.rst
53+
difPy_xxx_stats.json
File renamed without changes.

docs/getting_started/output.rst

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
.. _output:
2+
3+
Output
4+
----------------
5+
6+
difPy returns various types of output:
7+
8+
.. _search.result:
9+
10+
Search Result
11+
^^^^^^^^^^
12+
13+
A **dictionary** of duplicates/similar images (i. e. **match groups**) that were found. Each match group has a primary image (the key of the dictionary) which holds the list of its duplicates including their filename and MSE (Mean Squared Error). The lower the MSE, the more similar the primary image and the matched images are. Therefore, an MSE of 0 indicates that two images are exact duplicates.
14+
15+
.. code-block:: python
16+
17+
search.result
18+
19+
> Output:
20+
{'C:/Path/image1.jpg' : [['C:/Path/duplicate_image1a.jpg', 0.0],
21+
['C:/Path/duplicate_image1b.jpg', 0.0]],
22+
'C:/Path/image2.jpg' : [['C:/Path/duplicate_image2a.jpg', 0.0]],
23+
...
24+
}
25+
26+
When :ref:`in_folder` is set to ``True``, the result output is slightly modified and matches are grouped in their separate folders, with the key of the dictionary being the folder path.
27+
28+
.. code-block:: python
29+
30+
search.result
31+
32+
> Output:
33+
{'C:/Path1/' : {'C:/Path1/image1.jpg' : [['C:/Path1/duplicate_image1a.jpg', 0.0],
34+
['C:/Path1/duplicate_image1b.jpg', 0.0]],
35+
'C:/Path1/image2.jpg' : [['C:/Path1/duplicate_image2a.jpg', 0.0]],
36+
'C:/Path2/' : {'C:/Path2/image1.jpg' : [['C:/Path2/duplicate_image1a.jpg', 0.0]],
37+
...
38+
}
39+
40+
.. _search.lower_quality:
41+
42+
Lower Quality Files
43+
^^^^^^^^^^
44+
45+
A **list** of duplicates/similar images that have the **lowest resolution** among match groups:
46+
47+
.. code-block:: python
48+
49+
search.lower_quality
50+
51+
> Output:
52+
['C:/Path/duplicate_image1.jpg',
53+
'C:/Path/duplicate_image2.jpg', ...]
54+
55+
To find the lower quality images, difPy compares the **image resolutions** (pixel width x pixel height) within a match group and selects all images that have lowest image file resolutions among the group.
56+
57+
Lower quality images then can be **moved** to a different location (see :ref:`search.move_to`):
58+
59+
.. code-block:: python
60+
61+
search.move_to(destination_path='C:/Path/to/Destination/')
62+
63+
Or **deleted** (see :ref:`search.delete`):
64+
65+
.. code-block:: python
66+
67+
search.delete(silent_del=False)
68+
69+
.. _search.stats:
70+
71+
Search Statistics
72+
^^^^^^^^^^
73+
74+
A **JSON formatted collection** with statistics on the completed difPy process:
75+
76+
.. code-block:: python
77+
78+
search.stats
79+
80+
> Output:
81+
{'directory': ['C:/Path1/', 'C:/Path2/', ... ],
82+
'process': {'build': {'duration': {'start': '2024-02-18T19:52:39.479548',
83+
'end': '2024-02-18T19:52:41.630027',
84+
'seconds_elapsed': 2.1505},
85+
'parameters': {'recursive': True,
86+
'in_folder': False,
87+
'limit_extensions': True,
88+
'px_size': 50,
89+
'processes': 5}},
90+
'search': {'duration': {'start': '2024-02-18T19:52:41.630027',
91+
'end': '2024-02-18T19:52:46.770077',
92+
'seconds_elapsed': 5.14},
93+
'parameters': {'similarity_mse': 0,
94+
'rotate': True,
95+
'lazy': True,
96+
'processes': 5,
97+
'chunksize': None},
98+
'files_searched': 3228,
99+
'matches_found': {'duplicates': 3030,
100+
'similar': 0}}},
101+
'total_files': 3232,
102+
'invalid_files': {'count': 4,
103+
'logs': {'C:/Path/invalid_File.pdf': 'Unsupported file type',
104+
... }}}}

docs/index.rst

Lines changed: 43 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,41 @@
1+
.. toctree::
2+
:maxdepth: 2
3+
:hidden:
4+
:caption: Getting started
5+
6+
/getting_started/installation
7+
/getting_started/basic_usage
8+
/getting_started/cli_usage
9+
/getting_started/output
10+
11+
.. toctree::
12+
:maxdepth: 2
13+
:hidden:
14+
:caption: Methods and parameters
15+
16+
/methods/build
17+
/methods/search
18+
/methods/search_moveto
19+
/methods/search_delete
20+
21+
.. toctree::
22+
:maxdepth: 2
23+
:hidden:
24+
:caption: Contributing
25+
26+
/contributing/contributing
27+
/contributing/support
28+
29+
.. toctree::
30+
:maxdepth: 2
31+
:hidden:
32+
:caption: Further Resources
33+
34+
/resources/desktop
35+
/resources/large_datasets
36+
/resources/supported_filetypes
37+
/resources/report_bug
38+
139
difPy Guide
240
===================================
341

@@ -8,27 +46,17 @@ difPy Guide
846

947
**difPy** is a Python package that automates the search for duplicate/similar images.
1048

11-
.. note::
12-
13-
✨ Update to `difPy v4 <https://pypi.org/project/difPy/>`_ for up to **10x performance increases** to previous versions! :ref:`What's new in v4?`
14-
15-
difPy searches for images in **one or more different directories**, compares the images it found and checks whether these are duplicates. It then outputs the **image files classified as duplicates** as well as the **images having the lowest resolutions**, so you know which of the duplicate images are safe to be deleted. You can then either delete them manually, or let difPy delete them for you.
49+
difPy searches for images in **one or more directories**, compares the images it found and checks whether these are duplicates. It then outputs the **image files classified as duplicates**, as well as the **images having the lowest resolutions**, so that you know which of the duplicate images are safe to be moved/deleted. You can then either move/delete them manually, or let difPy do this for you.
1650

17-
difPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content. This allows difPy to **not only search for duplicate images, but also for similar images**.
51+
difPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content. This allows you to let difPy **not only search for duplicate images, but also for similar images**.
1852

1953
difPy leverages Python's multiprocessing capabilities and is therefore able to perform at high performance even on large datasets.
2054

21-
View difPy on `GitHub <https://github.com/elisemercury/Duplicate-Image-Finder>`_ and `PyPi <https://pypi.org/project/difPy/>`_.
55+
.. note::
56+
✨ difPy will soon be available as an app for your desktop! `Learn more <https://difpy-dev.readthedocs.io/en/latest/resources/desktop.html>`_.
2257

23-
Guide Content
24-
--------
2558

26-
.. toctree::
27-
:maxdepth: 3
28-
29-
usage
30-
contributing
31-
faq
59+
View difPy on `GitHub <https://github.com/elisemercury/Duplicate-Image-Finder>`_ and `PyPi <https://pypi.org/project/difPy/>`_.
3260

3361
------------
3462

0 commit comments

Comments
 (0)