[EPIC] Only call discover schema when necessary #9895
Labels
area/platform
issues related to the platform
Epic
team/compose
team/platform-move
type/enhancement
New feature or request
Tell us about the problem you're trying to solve
Discover schema (docs) can be expensive (both in terms of rate limiting and time).
Additionally, because we do not store output of discover schema (just the configured version which is lossy), we lose information about the schema in the UI after a connection gets configured. For example, if the catalog from discover_schema has stream1 and stream2, but then the connection only configures stream1, then the user will never be able to see that stream again in the UI (unless they click force refresh schema). This is very confusing, and we should be able to keep track of this information after configuration.
Finally, if setting up a connection gets interrupted, it forces a user to re-pull the schema. If this is a schema that takes a long time to pull (or the schema taking a long time to pull was the problem), then this can leave the user stuck. While we would should give feedback to the user if discover schema is taking a long time, it is at least slightly better to store the output so that they do not need to do the expensive thing multiple times.
Describe the solution you’d like
We should persist the output of discover schema, so that if there is a failure in the middle of configuration the user does not need to refetch. This will also allow us to not lose information about the schema in the UI.
Acceptance Criteria
Execution Plan
The text was updated successfully, but these errors were encountered: