-
Notifications
You must be signed in to change notification settings - Fork 4.6k
✨ Source Zendesk Support: migration posts child streams to low-code #55686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: issue-11894/posts-migration-to-low-code
Are you sure you want to change the base?
✨ Source Zendesk Support: migration posts child streams to low-code #55686
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
"states": [ | ||
{ | ||
"partition": {"parent_slice": {}, "post_id": post["id"]}, | ||
"cursor": {"updated_at": datetime_to_string(post_comments_updated_at)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should be formatted as %s
but this uses DatetimeBasedCursor
instead of the concurrent cursor and it seems like there is a discrepancy where the value used for the state is the one matching the records...
I have updated this issue to mention this situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more context on this, it seems like we evaluate the cursor value using the keys of this dictionary but use the cursor value representation from the record as a cursor value.
This means that the behavior between the two cursors is different as to how we save the state where the concurrent framework uses the output format while the DatetimeBasedCursor uses the format of the record i.e. one of the cursor_datetime_formats
.
@@ -95,13 +95,11 @@ def test_given_403_error_when_read_posts_comments_then_skip_stream(self, http_mo | |||
|
|||
output = read_stream("post_comment_votes", SyncMode.full_refresh, self._config) | |||
assert len(output.records) == 0 | |||
|
|||
info_logs = get_log_messages_by_log_level(output.logs, LogLevel.INFO) | |||
assert output.get_stream_statuses("post_comment_votes")[-1] == AirbyteStreamStatus.INCOMPLETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a change in behavior. It felt weird to silently fail, the reasons were not documented and it felt weird. The 404 case makes a bit more sense but even then...
Given the low usage for now, I wouldn't add more complexity and just go with the same error handler as the other streams. We can change that very easily later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mock server tests pass so i think this should all be working, but just want to make sure i under the post_comments_votes
stream implemenation a little, otherwise 👍
airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/manifest.yaml
Show resolved
Hide resolved
airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/manifest.yaml
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/manifest.yaml
Outdated
Show resolved
Hide resolved
…94/posts-child-streams-migration-to-low-code
What
Partially addresses https://github.com/airbytehq/airbyte-internal-issues/issues/11894
Migrating post_comments, post_votes and post_comment_votes to low-code.
How
Implementing them in the manifest.yaml and removing the code in the python code.
We don't have data in prod for those streams so it will be hard to validate these streams using regression testing.
Review guide
User Impact
There were some cases where the connector would return a 404/403 and we would silently stop the stream and ignore the error. With the current change, the stream would fail. We don't have examples in cloud to validate why we have this behavior...
There might have been some rare cases of data loss that the new state format should handle. Other than that, this change is mostly to ease maintenance for devs.
Can this PR be safely reverted and rolled back?
The state before was a simple
{"updated_at": <cursor value>}
. This format could lead to data loss given the following case (example withpost_comments
):post_id_1
, cursor value at this point is T1post_id_1
post_id_2
post_id_2
, cursor value at this point is T3As we can see here, the comment created at T2 for
post_id_1
would never be synced. Hence the new format is more adequate.