-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add ZIP explorer to import individual files from local or remote ZIP archives #20054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Really beautiful - I'm in awe! |
d8351ae
to
e2468c2
Compare
This is really awesome, nice work! I have an alternative suggestion for the UI. Rather than having "Explore Zip" as a top-level option, would it be feasible to have it as a contextual option which appears as you drill down? E.g., you browse into a remote file source, and if it's a zip file, you get an option to drill-down into the zip file and select nested files. Uploading a local file would work in the same way - if the file is detected to be a zip - there's a contextual option to drill-down into the file. I don't know how workflow imports would be handled, but presumably in a similar way? This would reduce the number of top-level options which may not be relevant to the user unless they are specifically browsing a zip file. Not sure whether this is feasible, but just putting it out there for discussion. |
Thanks for the suggestions @nuwang! I appreciate it!
Unfortunately, in my testing, this is really hard to tell; from most URLs you cannot know if the file at the end is a ZIP file, most headers don't specify the correct MIME type, just use "application/octet-stream", or use redirects until you get to the actual file, etc. And inspecting the last 65kb of every URL to search for the ZIP EOCD would be a bit too much 😞 I see the value in tapping into this functionality for file sources for drilling down on ZIPs, but to me, that would be an "additional" feature, not a replacement for this UI. I also understand the concern about increasing the number of top-level options, and I already simplified the first approach that was taking a whole section in the Uploader to just a single button 😅 Anyway, I'm happy to look for alternatives to reduce the number of options not relevant to the user, but I don't know how to make this discoverable for the user and explicit (since we cannot rely on auto-dectection of ZIPs) at the same time 🤔 |
@davelopez Thanks for clarifying. If the main purpose is to explore rich archive files, what do you think of "Explore Archive" as an alternative name? (Assuming future archive formats might include tar archives or formats altogether different to zip). |
Youhouhou ! So amazing!!! Really amazing and important need!!! |
Waouh this is super useful !! Can't wait to use it ! |
2d4cd0c
to
d84f4fe
Compare
d84f4fe
to
99c9ef0
Compare
This is finally ready for review. I added a few API and Selenium tests. I've changed the terminology in some places to refer to "compressed archives" instead of ZIP in case more compressed formats are supported in the future, and updated the screenshot in the PR description. |
Test failures are unrelated |
Wouldn't it be more useful to add this functionality right into those places (i.e. history and invocation imports ?).
This seems bad, is there no way to make this a lazy process so we only send chunks ? If this is not possible I think i'd rather we don't add the local zip functionality (or limit it to something like 100MB ?) in favor of not killing the user's tab (or is it even the whole browser ?). From a UX perspective I think the remote file exploration is a much bigger win.
Could we hook remote file sources into this ? That would be one way to handle non-public data, with a possible future extension making it possible to prompt users for authentication ? |
I thought about this briefly, but then I thought that users who want to use that import functionality are likely to want to import the whole thing all the time rather than just a single file 🤔
I can try to limit the file size to 100MB and then also add some kind of warning or make files above this size non-selectable. Not sure about the chunking... I'll need to investigate more.
We could try to do that, but I'm not sure how right now. If so, we will only allow exploring archives that are exclusively in those file sources and not arbitrarily hosted somewhere? |
makes sense
That's not part of the PR now though, right ? It would certainly be pretty cool if we can select which datasets from a history/invocation we materialize. Is @nuwang's suggestion off the table ? What would be the problem with attempting to fetch the last bytes of pasted URLs ? Aren't you doing that in the component anyway ? |
Sorry for the verbosity, I'll try to explain my reasoning 😅
I might be using the word "metadata" too loosely 😅 The Galaxy "metadata" coming from the This is all happening in the client code (in client/src/composables/zipExplorer.ts), not in this "RemoteZipFilesSource".
Maybe I did not completely understood Nuwang's suggestion. So what I understood is more like the other kind of file source we commented earlier. From an existing file source (S3, posix, Google Drive, Dropbox etc.) we could add an extension to detect a ZIP or compressed archive and display it as a folder and drill down from there selecting individual files. That is completely in the table, but probably not in this PR since it serves a different use case and can be a project on its own. The use case I used for designing this is: "The user has an URL to a remote ZIP file containing an RO-crate and wants to import some files (maybe workflows) from it without having to download or upload the entire archive. It would also be helpful to show a preview with the RO-Crate manifest information before importing." From the suggestion, the part that I did not understand well is how I can handle the use case of a "random url pointing to a ZIP" using the File Source approach. In the case of a file inside another file source, then yes, we could simply check the file ends with ".zip", then change the icon to a folder and then attempt to read the ZIP EOCD on drill-down and so on --still I don't know how to handle or preserve the state in the file source while navigating the ZIP without having to request the EOCD each time, but hopefully we could figure out something caching mechanism or it sounds a bit too inefficient 🤔--. |
I understood @nuwang's comment as making this available everywhere (no problem if file sources is out of scope currently) instead of requiring the top level button. So if you paste a url we could just check to see if it is a zip file and if so offer the wizard ? |
Oh! I see now... so just hide the button and on URL paste make the button visible again or something similar? |
726c818
to
79e2dec
Compare
…omponents So other formats might be supported in the future.
From "zip archive" to "compressed archive" for consistency
Co-authored-by: Marius van den Beek <[email protected]>
Move isValidUrl and getProxiedUrl to module
Co-authored-by: Marius van den Beek <[email protected]>
Do not reopen if the same zip is "explored" again and already open
848156a
to
eb36e4b
Compare
eb36e4b
to
8c4fa1c
Compare
Thank you @davelopez! |
This pull request introduces a new feature for previewing the content of ZIP archives and importing individual files from them. This is useful when you only need a particular file or set of small files from the ZIP archive and don't want to upload the entire archive. It also works with remote ZIP archives, meaning you don't need to download the ZIP to your computer or upload it to Galaxy to be able to select and extract individual files and send them to Galaxy.
Features
Works with local and remote ZIP archives (with some limitations)
Preview ZIP archives
Import Galaxy workflow files
How to use it
Go to the upload dialog in Galaxy.

Click the
Explore ZIP
button.The ZIP Explorer wizard will be shown.
Select a ZIP archive from your local file system or enter the URL of a remote ZIP archive.

Click the

Next
button to preview the contents of the ZIP archive.Once the ZIP preview is loaded, you can decide which files you want to import into Galaxy in the next step.
Click the

Next
button to select the files you want to import.Before clicking

Import
, you will have the chance to review what elements will be imported.Click
Import
and the selected files will be extracted from the ZIP archive and sent to Galaxy.After the upload, the files will be available in your current history, and any workflow files will be available in your workflow list, ready to run or edit.

Technical details
New
Remote ZIP archive extractor
file sourceThe new
Remote ZIP archive extractor
file source is a new file source that allows you to extract files from a remote ZIP archive without downloading the entire archive.It handles URIs like:
Where:
remote_zip_url
is the URL of the remote ZIP archive.header_offset
is the offset of the file header in the ZIP archive.compress_size
is the size of the compressed file in the ZIP archive.compression_method
is the compression method used in the ZIP archive (0 for no compression, 8 for deflate are the only ones supported).These parameters must be known in advance, so the ZIP explorer will extract them from the ZIP archive when you preview it. The
Remote ZIP archive extractor
file source will then use these parameters to extract the files from the remote ZIP archive without downloading the entire archive.New API endpoint for proxying requests
The new
/api/proxy
API endpoint allows HEAD and GET requests to be proxyed to remote ZIP archives to avoid CORS issues.New client dependency
ro-crate-zip-explorer
Most of the code for the ZIP explorer is in a new typescript library
ro-crate-zip-explorer
that handles the preview and extraction of files from local and remote ZIP archives.How to test the changes?
License