Skip to content

Add ZIP as export option #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tokee opened this issue Sep 9, 2022 · 0 comments · Fixed by #390
Closed

Add ZIP as export option #245

tokee opened this issue Sep 9, 2022 · 0 comments · Fixed by #390
Assignees

Comments

@tokee
Copy link
Contributor

tokee commented Sep 9, 2022

At the Royal Danish Library we have had multiple requests for exporting specific resources for a query, e.g. "All PDFs from domain X". Currently is can be done by exporting to WARC and then use a tool to convert to individual files. We should add the option of exporting to a ZIP for easier use.

There are some problems with this approach:

  • What should the file structure be? domain/url-path/file seems logical enough, but what if the URL is http://example.com/foo?bar=87?type=pdf?
  • How to handle name-duplicates? We could choose to use the hash as part of the file path, but that is a hassle when there are no or very few duplicates. Maybe store the first instance directly and subsequent instances with their hashes appended to the file name?

We should check how other tools handles that.

@tokee tokee mentioned this issue Sep 9, 2022
@thomasegense thomasegense self-assigned this Nov 17, 2022
@VictorHarbo VictorHarbo linked a pull request Jul 24, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants