Add ZIP as export option #245

tokee · 2022-09-09T09:56:35Z

At the Royal Danish Library we have had multiple requests for exporting specific resources for a query, e.g. "All PDFs from domain X". Currently is can be done by exporting to WARC and then use a tool to convert to individual files. We should add the option of exporting to a ZIP for easier use.

There are some problems with this approach:

What should the file structure be? domain/url-path/file seems logical enough, but what if the URL is http://example.com/foo?bar=87?type=pdf?
How to handle name-duplicates? We could choose to use the hash as part of the file path, but that is a hassle when there are no or very few duplicates. Maybe store the first instance directly and subsequent instances with their hashes appended to the file name?

We should check how other tools handles that.

The text was updated successfully, but these errors were encountered:

tokee added the enhancement label Sep 9, 2022

tokee mentioned this issue Sep 9, 2022

Rethink export #246

Open

thomasegense self-assigned this Nov 17, 2022

tokee mentioned this issue Jul 1, 2023

Batch content export #382

Closed

VictorHarbo linked a pull request Jul 24, 2023 that will close this issue

Zip export. #390

Merged

VictorHarbo mentioned this issue Jul 24, 2023

Zip export. #390

Merged

thomasegense assigned VictorHarbo and unassigned thomasegense Jul 25, 2023

thomasegense closed this as completed in #390 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ZIP as export option #245

Add ZIP as export option #245

tokee commented Sep 9, 2022

Add ZIP as export option #245

Add ZIP as export option #245

Comments

tokee commented Sep 9, 2022