You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running into issues with your implementation. Sadly I have a very large dataset of images which I want to deduplicate and this implementation is running into its limits with both the memory footprint and size of json files.
I've implemented the basic idea of using the mse to compare compressed image tensors but fully focused on performance. Using every trick I know to make the computation faster. To accommodate larger datasets, I've implemented a cache using the file system, I'm using SQLite as the backend to store the file indexes and pairs of diffs and I've added checkpoints for extremely large datasets to be able to interrupt the computation.
You're difpy is already tied into frontends. And I'm not entirely if it would be worth the effort of converting everything to this new implementation. Additionally, the way FastDiffPy is written at the moment it's more a framework than a script like this repo. Output to results.json and lower_quailty.txt is implemented but I'd think the frontends would profit from using the db in their backend in favor of those files.
Great project, congrats! Thanks for sharing it with us - it's always great to see when adaptations or improvements of difPy are developed :) Looking forward to seeing the benchmark results!
Hi 👋
I've been running into issues with your implementation. Sadly I have a very large dataset of images which I want to deduplicate and this implementation is running into its limits with both the memory footprint and size of json files.
I've implemented the basic idea of using the mse to compare compressed image tensors but fully focused on performance. Using every trick I know to make the computation faster. To accommodate larger datasets, I've implemented a cache using the file system, I'm using SQLite as the backend to store the file indexes and pairs of diffs and I've added checkpoints for extremely large datasets to be able to interrupt the computation.
You're
difpy
is already tied into frontends. And I'm not entirely if it would be worth the effort of converting everything to this new implementation. Additionally, the wayFastDiffPy
is written at the moment it's more a framework than a script like this repo. Output toresults.json
andlower_quailty.txt
is implemented but I'd think the frontends would profit from using the db in their backend in favor of those files.This is the FastDiffPy at the moment
Best
AliSot2000
The text was updated successfully, but these errors were encountered: