-
-
Notifications
You must be signed in to change notification settings - Fork 70
Enhancement - Optional parameter set for source folder / comparison folder mode #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the same wish as well to better use difpy in my projects. |
@elisemercury, just noticed this issue and had a quick look to see how it might be implemented. I've not coded/tested this yet - just did a very quick code review and noted down the idea, so may not work (and may well have idiotic mistakes!). But if you think it's a valid approach - and want to add this feature - let me know I'll code/test/pull request.
|
I had implemented a very rough way to do pairwise comparison between folders in the difpy V3 but I don't have the knowledge to do it in V4. This only works for two folders but was useful sometimes if you need to compare a small number of files (500) against a much larger set (20,000) and don't want to run in exponential time. the break point (bp) between folders is hard coded here and is the number of images in the smaller folder. ` def _matches(imgs_matrices, id_by_location, similarity, show_output, show_progress, fast_search):
` |
My apologies if this is already possible, but I didn't find any method within the documentation at https://difpy.readthedocs.io/en/latest/usage.html.
Currently, the multiple folder method within difPy searches for duplicates amongst all the folders listed. However, once you have de-duplicated the images within a folder, if you then look to compare an additional folder against those that you have already de-duplicated, the de-duplicated folder's contents again get compared against themselves.
It would be nice if there was an optional parameter set where a source folder could be set that multiple comparison folders could check against. Basically, difPy would assume the image contents in the source folder are unique, and only need to be processed against duplicates within the provided comparison folders.
I believe this would help in larger image projects like mine. In my scenario, I've downloaded photos from my partner and my phone's and tablets multiple times throughout the years. As I started de-duplicating with difPy, I would move the de-duplicated files into a central project folder and then scan the next photo dump folder against the project folder. Since the project folder already contains unique images, I don't need difPy to check those images against each other again, but I'm not seeing any existing method to do that. I think this would be a noticeable performance improvement, especially as image sets get larger and larger.
The text was updated successfully, but these errors were encountered: