-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Improve performance of interpolation in decalarative sources #44027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of interpolation in decalarative sources #44027
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge gain 👏
@@ -28,6 +28,7 @@ def __post_init__(self, parameters: Mapping[str, Any]) -> None: | |||
self.default = self.default or self.string | |||
self._interpolation = JinjaInterpolation() | |||
self._parameters = parameters | |||
self._just_string = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._just_string = None | |
self._is_plain_string = None |
Naming doesn't sit right with me — something like this perhaps? Up to @artem1205 and @girarda though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this one
@@ -37,6 +38,12 @@ def eval(self, config: Config, **kwargs: Any) -> Any: | |||
:param kwargs: Optional parameters used for interpolation | |||
:return: The interpolated string | |||
""" | |||
if self._just_string: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this does a bit of caching magic, I'd love a comment explaining why you're doing this in eval specifically. I understand you need config to actually interpolate, so this is not insane, but a comment for posterity would be good.
Also, are there scenarios where the first interpolation would render the same string, but later interpolation would render something different? I can't think of it, but is there a world where a jinja expression renders itself / morphs conditionally on an input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. The change makes sense to me, but let's describe the rationale for posterity.
That being said, thank you for taking the time to contribute to Airbyte @szubster ! <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is any (which I doubt) then the output would still not look like input?
There are code-challenges, like https://en.wikipedia.org/wiki/Quine_(computing) but I doubt it's a real-life scenario.
And even if it is, then the logic is still correct :)
Thank you for the comments. |
Is there some synthetic benchmark or a test suite I could use to test those changes? |
@szubster there aren't performance benchmark tests, but depending on where your follow up changes land:
We'll also run regression tests with a few connectors to make sure this doesn't introduce regressions |
e128129
to
61a15d3
Compare
3610819
to
6d9ebc5
Compare
6d9ebc5
to
cc27a11
Compare
All checks are passing now :) |
Running regression tests for a few connectors. I'll merge after they run successfully 🎉 |
There is one failure, but I do not see how it could be related to my change? |
It should be quickbooks — must be a typo ;) |
@szubster this is now available in CDK 4.5.3 🎉 Thank you for the huge improvement <3 |
If you would have any numbers from synthetic tests or anything to share how it improved performance, I would be glad to hear :) |
What
interpolated_string.py
is called thousands or even millions of times in declarative streams.E.g. in
datetime_based_cursor.py
.It is often the case (e.g. in jira source) that there is no evaluation needed, and input is just a string.
When encountered with multiple parent streams this can lead to jinja evaluation taking most of the CPU time.
Jinja compiled templates are also compiled on each evaluation. Which makes them slow.
How
Check whether evaluated string equals input on first try. If so, use input string directly.
Add cache for compiled jinja templates.
Review guide
interpolated_string.py
User Impact
Improved performance.
Example of small run on our jira instance (tests from changes in
interpolated_string.py
only:pre:
post:
Difference in test execution for whole interpolated package (with jinja improvements):
Can this PR be safely reverted and rolled back?