Skip to content

✨ Source Zendesk Support: migration posts child streams to low-code #55686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: issue-11894/posts-migration-to-low-code
Choose a base branch
from

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Mar 10, 2025

What

Partially addresses https://github.com/airbytehq/airbyte-internal-issues/issues/11894

Migrating post_comments, post_votes and post_comment_votes to low-code.

How

Implementing them in the manifest.yaml and removing the code in the python code.

We don't have data in prod for those streams so it will be hard to validate these streams using regression testing.

Review guide

User Impact

There were some cases where the connector would return a 404/403 and we would silently stop the stream and ignore the error. With the current change, the stream would fail. We don't have examples in cloud to validate why we have this behavior...

There might have been some rare cases of data loss that the new state format should handle. Other than that, this change is mostly to ease maintenance for devs.

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

The state before was a simple {"updated_at": <cursor value>}. This format could lead to data loss given the following case (example with post_comments):

  • T0: query for posts
  • T1: query for comments on post with id post_id_1, cursor value at this point is T1
  • T2: a new comment is added to post_id_1
  • T3: a new comment is added to post_id_2
  • T4: query for comments on post with id post_id_2, cursor value at this point is T3

As we can see here, the comment created at T2 for post_id_1 would never be synced. Hence the new format is more adequate.

Copy link

vercel bot commented Mar 10, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 21, 2025 7:27pm

"states": [
{
"partition": {"parent_slice": {}, "post_id": post["id"]},
"cursor": {"updated_at": datetime_to_string(post_comments_updated_at)},
Copy link
Contributor Author

@maxi297 maxi297 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be formatted as %s but this uses DatetimeBasedCursor instead of the concurrent cursor and it seems like there is a discrepancy where the value used for the state is the one matching the records...

I have updated this issue to mention this situation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more context on this, it seems like we evaluate the cursor value using the keys of this dictionary but use the cursor value representation from the record as a cursor value.

This means that the behavior between the two cursors is different as to how we save the state where the concurrent framework uses the output format while the DatetimeBasedCursor uses the format of the record i.e. one of the cursor_datetime_formats.

@@ -95,13 +95,11 @@ def test_given_403_error_when_read_posts_comments_then_skip_stream(self, http_mo

output = read_stream("post_comment_votes", SyncMode.full_refresh, self._config)
assert len(output.records) == 0

info_logs = get_log_messages_by_log_level(output.logs, LogLevel.INFO)
assert output.get_stream_statuses("post_comment_votes")[-1] == AirbyteStreamStatus.INCOMPLETE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a change in behavior. It felt weird to silently fail, the reasons were not documented and it felt weird. The 404 case makes a bit more sense but even then...

Given the low usage for now, I wouldn't add more complexity and just go with the same error handler as the other streams. We can change that very easily later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further research , the 404 comes from here. It is not clear what is the cause unfortunately.

The handling of the 403 comes from here.

It is weird because both have been migrated to fail by default here and nobody complained so I'll assume we can still fail on these cases too

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mock server tests pass so i think this should all be working, but just want to make sure i under the post_comments_votes stream implemenation a little, otherwise 👍

…94/posts-child-streams-migration-to-low-code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants