[airbyte-cdk] add print parameter to flush the print buffer after each invocation #37000
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes https://github.com/airbytehq/airbyte-internal-issues/issues/7087
Identifying the fix
For a small number of syncs to
source-salesforce
andsource-stripe
we were seeing a few records not make it to the platform. After validating that both source and platform counts were accurate and there was a legitimate functional mismatch we determined there's an issue with how we useprint()
in the entrypoint.What the fix is
There is a Python article related to print buffers: https://realpython.com/python-flush-print-output/#set-the-flush-parameter-if-youve-disabled-newlines
In an earlier PR to fix concurrent CDK printing, we made a small change that replaced the default end value from
\n
to""
, this had the side effect of disabling automatically flushing the print buffer. Adding backflush=True
, gets us back to the previous behavior while still retaining the original fix that was made.This does give me a slight pause because always flushing can have performance implications, but I think our biggest performance bottleneck is still like API request/response and this fix just aligns back to how things were acting before. I wish it were super clear why
print()
at high volume isn't working, but right now we need to fix the dropped recordsSummary:
This PR fixes an issue with dropped records during syncs by adding
flush=True
to theprint()
function in thelaunch()
function in/airbyte-cdk/python/airbyte_cdk/entrypoint.py
, restoring automatic flushing of the print buffer.Key points:
print()
in the entrypoint causing dropped records during syncs tosource-salesforce
andsource-stripe
.flush=True
to theprint()
function in thelaunch()
function in/airbyte-cdk/python/airbyte_cdk/entrypoint.py
to restore automatic flushing of the print buffer.Generated with ❤️ by ellipsis.dev