You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the Royal Danish Library we have had multiple requests for exporting specific resources for a query, e.g. "All PDFs from domain X". Currently is can be done by exporting to WARC and then use a tool to convert to individual files. We should add the option of exporting to a ZIP for easier use.
There are some problems with this approach:
What should the file structure be? domain/url-path/file seems logical enough, but what if the URL is http://example.com/foo?bar=87?type=pdf?
How to handle name-duplicates? We could choose to use the hash as part of the file path, but that is a hassle when there are no or very few duplicates. Maybe store the first instance directly and subsequent instances with their hashes appended to the file name?
We should check how other tools handles that.
The text was updated successfully, but these errors were encountered:
At the Royal Danish Library we have had multiple requests for exporting specific resources for a query, e.g. "All PDFs from domain X". Currently is can be done by exporting to WARC and then use a tool to convert to individual files. We should add the option of exporting to a ZIP for easier use.
There are some problems with this approach:
domain/url-path/file
seems logical enough, but what if the URL ishttp://example.com/foo?bar=87?type=pdf
?We should check how other tools handles that.
The text was updated successfully, but these errors were encountered: