Skip to content

create a model for removing old sites #1493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gbinal opened this issue May 21, 2025 · 0 comments
Open

create a model for removing old sites #1493

gbinal opened this issue May 21, 2025 · 0 comments
Assignees

Comments

@gbinal
Copy link
Member

gbinal commented May 21, 2025

Idea:

  • For a recent scan date, take the all snapshot, filter for the following scan statuses to all be dns resolution errors: primary, robots_txt, sitemap_xml, and www.
  • Take the resulting list of initial domains.
  • Repeat this for one scan date per month for the past 12 months.
  • Append all of the initial domains
  • Count how many times each initial domain appears. If the count = 12, it makes the end product file.
  • Put the list here.
  • As the second to last step in the index building process (right before the forced-add-in step), remove these initial domains.

I'm open to another model though...

@gbinal gbinal added this to the Sprint 215 (5/22-5/28) milestone May 21, 2025
@gbinal gbinal self-assigned this May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant