-
Notifications
You must be signed in to change notification settings - Fork 66
Added format validation to retrieved data manifest #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @afshinmhCLG — I'm taking over maintenance of pins and since I'm still coming up to speed with it, I'd really appreciate if you could give me a bit more context. When does this cause a problem? Can you generate a reprex, even if it's something that I can't run locally? |
Hi @hadley ,
The above works perfectly. Now, if you are dealing with a different API (e.g. Labkey) where "File doesn't exist" is not encoded as an error but rather the missingness is encoded in the body of the message, step 3 above passes (i.e. no error is caught). The workflow will then wrongly assumes successful retrieval of a data.txt and proceeds to parse the content per anticipated structure of the data.txt. Here, clearly as the content is not the content of a data.txt file (rather just text indicating missingness), we encounter an error. |
Why not make the wrapper around labkey throw an error? I don't think it's pins responsibility to figure out whether the body of an HTTP response represents an error or not (indeed, I have no idea how you could do that because errors could be represented in any number of ways). |
Thanks @hadley!
The PR simply augments step 2 to make sure that dt is indeed a list containing element "path" before calling dt$path on step 3. No other checking is done as you are absolutely right: "errors could be represented in any number of ways". |
Now that I understand the datatxt format, unfortunately I have determined that it is not a good fit for pins' needs. The problem is maintaining a single central list of pins — if two pins should be written at the same time, there is a substantial chance of a race condition so that only one pin will recorded in the site metadata. This means that I don't think it's worthwhile to patch up smaller bugs in (what is now) |
Thanks @hadley! Is support for extending pins within the vision for pins going forward? IMHO, it is the single most valuable feature of pins. Picture large orgs with lots of historical places to store your data objects, from well-established cloud-based solns like S3 to less known data platforms like labkey etc. Try to move your data workflow from one to another and you are left dealing with API differences and here comes pins with a uniform API, so you can move in between all those seamlessly and expect the same behavior (as far as versioning etc.) |
Yes, definitely, I just want to make sure the extension interface is solid before encouraging others to use it. |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Issue: currently retrieval of manifest is considered successful when http call does not return an error response. However, there are no checks about the implied format of the manifest (e.g. data.txt).
Proposed solution: Rather than relying only on http call’s success, an additional verification of the “implied” format of the manifest is added (i.e. checking whether “path” is element of the retrieved manifest).
Use-case: This is being proposed in support of making custom board for labkey, an open-source data platform used in clinical research setting (see https://www.labkey.org). When data.txt does not exist, the API for labkey, rather than returning and error response, returns a success response but encodes the missingness of data.txt in the body of the response. As it is, casting the custom board_labkey as datatxt fails because the call to retrieve manifest returns successfully however the body of the message lacks the anticipated structure of data.txt. By making the implicit assumption about the structure of manifest explicit, board_datatxt remains functional (e.g. for S3board which I have tested) but can also add utility for labkey (and possibly other similar APIs) that encode missing files in the body of the response.
I intend to also submit a PR for board_labkey
Thank you!