Skip to content

[EPIC] Only call discover schema when necessary #9895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 6 tasks
cgardens opened this issue Jan 31, 2022 · 1 comment · Fixed by #10226
Closed
5 of 6 tasks

[EPIC] Only call discover schema when necessary #9895

cgardens opened this issue Jan 31, 2022 · 1 comment · Fixed by #10226
Assignees
Labels
area/platform issues related to the platform Epic team/compose team/platform-move type/enhancement New feature or request

Comments

@cgardens
Copy link
Contributor

cgardens commented Jan 31, 2022

Tell us about the problem you're trying to solve

Discover schema (docs) can be expensive (both in terms of rate limiting and time).

Additionally, because we do not store output of discover schema (just the configured version which is lossy), we lose information about the schema in the UI after a connection gets configured. For example, if the catalog from discover_schema has stream1 and stream2, but then the connection only configures stream1, then the user will never be able to see that stream again in the UI (unless they click force refresh schema). This is very confusing, and we should be able to keep track of this information after configuration.

Finally, if setting up a connection gets interrupted, it forces a user to re-pull the schema. If this is a schema that takes a long time to pull (or the schema taking a long time to pull was the problem), then this can leave the user stuck. While we would should give feedback to the user if discover schema is taking a long time, it is at least slightly better to store the output so that they do not need to do the expensive thing multiple times.

Describe the solution you’d like

We should persist the output of discover schema, so that if there is a failure in the middle of configuration the user does not need to refetch. This will also allow us to not lose information about the schema in the UI.

Acceptance Criteria

  • Retain information about the full catalog, even after configuring a connector.
  • Avoid calling discover schema again if the connector version and configuration has not changed.

Execution Plan

@cgardens
Copy link
Contributor Author

Potentially a good series of tasks for a new hire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform Epic team/compose team/platform-move type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants