Open
Description
Description
Today, I use web scraping and LLM models to extract data from URLs. If, for some reason, I encounter an error with the LLM, my plan is to retry processing during the next scheduled run (I schedule URLs every 15 minutes).
The dupefilter is very helpful, but if a URL has already been parsed by Scrapy, I need to download it again just for that specific URL.
Is there a way to remove a specific URL from the dupefilter redis set?
Metadata
Metadata
Assignees
Labels
No labels