Reimplementation of DifPy for performance #113

AliSot2000 · 2024-12-19T04:02:35Z

Hi 👋

I've been running into issues with your implementation. Sadly I have a very large dataset of images which I want to deduplicate and this implementation is running into its limits with both the memory footprint and size of json files.

I've implemented the basic idea of using the mse to compare compressed image tensors but fully focused on performance. Using every trick I know to make the computation faster. To accommodate larger datasets, I've implemented a cache using the file system, I'm using SQLite as the backend to store the file indexes and pairs of diffs and I've added checkpoints for extremely large datasets to be able to interrupt the computation.

You're difpy is already tied into frontends. And I'm not entirely if it would be worth the effort of converting everything to this new implementation. Additionally, the way FastDiffPy is written at the moment it's more a framework than a script like this repo. Output to results.json and lower_quailty.txt is implemented but I'd think the frontends would profit from using the db in their backend in favor of those files.

This is the FastDiffPy at the moment

Best

AliSot2000

The text was updated successfully, but these errors were encountered:

elisemercury · 2024-12-19T16:41:15Z

Hi @AliSot2000,

Great project, congrats! Thanks for sharing it with us - it's always great to see when adaptations or improvements of difPy are developed :) Looking forward to seeing the benchmark results!

Best
Elise

AliSot2000 · 2024-12-25T20:11:50Z

Hi

First Benchmark Results are now published.

elisemercury added the comment/feedback A comment or feedback about difPy. label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplementation of DifPy for performance #113

Reimplementation of DifPy for performance #113

AliSot2000 commented Dec 19, 2024

elisemercury commented Dec 19, 2024

AliSot2000 commented Dec 25, 2024

Reimplementation of DifPy for performance #113

Reimplementation of DifPy for performance #113

Comments

AliSot2000 commented Dec 19, 2024

elisemercury commented Dec 19, 2024

AliSot2000 commented Dec 25, 2024