Improve performance of interpolation in decalarative sources #44027

szubster · 2024-08-14T12:11:05Z

What

interpolated_string.py is called thousands or even millions of times in declarative streams.
E.g. in datetime_based_cursor.py.
It is often the case (e.g. in jira source) that there is no evaluation needed, and input is just a string.
When encountered with multiple parent streams this can lead to jinja evaluation taking most of the CPU time.

Jinja compiled templates are also compiled on each evaluation. Which makes them slow.

How

Check whether evaluated string equals input on first try. If so, use input string directly.
Add cache for compiled jinja templates.

Review guide

interpolated_string.py

User Impact

Improved performance.
Example of small run on our jira instance (tests from changes in interpolated_string.py only:
pre:

Executed in  267.01 secs    fish           external
   usr time  153.66 secs    0.11 millis  153.66 secs
   sys time    1.10 secs    1.35 millis    1.10 secs

post:

Executed in  130.13 secs    fish           external
   usr time   24.47 secs    0.18 millis   24.47 secs
   sys time    0.79 secs    2.05 millis    0.78 secs

Difference in test execution for whole interpolated package (with jinja improvements):

	before	after
single eval	0.75s	0.76s
loop 100 evals on single object	14.93s	1.36s

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

vercel · 2024-08-14T12:11:11Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 21, 2024 8:06pm

CLAassistant · 2024-08-14T12:11:12Z

All committers have signed the CLA.

natikgadzhi

This is a huge gain 👏

natikgadzhi · 2024-08-14T21:46:09Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/interpolation/interpolated_string.py

@@ -28,6 +28,7 @@ def __post_init__(self, parameters: Mapping[str, Any]) -> None:
        self.default = self.default or self.string
        self._interpolation = JinjaInterpolation()
        self._parameters = parameters
+        self._just_string = None


Suggested change

self._just_string = None

self._is_plain_string = None

Naming doesn't sit right with me — something like this perhaps? Up to @artem1205 and @girarda though.

I like this one

natikgadzhi · 2024-08-14T21:47:49Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/interpolation/interpolated_string.py

@@ -37,6 +38,12 @@ def eval(self, config: Config, **kwargs: Any) -> Any:
        :param kwargs: Optional parameters used for interpolation
        :return: The interpolated string
        """
+        if self._just_string:


Since this does a bit of caching magic, I'd love a comment explaining why you're doing this in eval specifically. I understand you need config to actually interpolate, so this is not insane, but a comment for posterity would be good.

Also, are there scenarios where the first interpolation would render the same string, but later interpolation would render something different? I can't think of it, but is there a world where a jinja expression renders itself / morphs conditionally on an input?

+1. The change makes sense to me, but let's describe the rationale for posterity.

That being said, thank you for taking the time to contribute to Airbyte @szubster ! <3

If there is any (which I doubt) then the output would still not look like input?
There are code-challenges, like https://en.wikipedia.org/wiki/Quine_(computing) but I doubt it's a real-life scenario.
And even if it is, then the logic is still correct :)

szubster · 2024-08-15T20:21:10Z

Thank you for the comments.
I think I have found few more tricks to speed whole jinja interpolation. Not just the simple case.
Give me some time and I will come back to this.

szubster · 2024-08-15T20:22:00Z

Is there some synthetic benchmark or a test suite I could use to test those changes?

girarda · 2024-08-15T20:25:56Z

@szubster there aren't performance benchmark tests, but depending on where your follow up changes land:

We'll also run regression tests with a few connectors to make sure this doesn't introduce regressions

…them

szubster · 2024-08-21T11:56:10Z

All checks are passing now :)

girarda · 2024-08-21T20:07:46Z

Running regression tests for a few connectors. I'll merge after they run successfully 🎉

szubster · 2024-08-21T22:14:49Z

There is one failure, but I do not see how it could be related to my change?
Error: Invalid value for '--name': 'source-quickbook' in ....

natikgadzhi · 2024-08-22T00:05:58Z

It should be quickbooks — must be a typo ;)

girarda · 2024-08-22T00:39:02Z

@szubster this is now available in CDK 4.5.3 🎉

Thank you for the huge improvement <3

szubster · 2024-08-22T04:33:32Z

If you would have any numbers from synthetic tests or anything to share how it improved performance, I would be glad to hear :)

octavia-squidington-iii added CDK Connector Development Kit community labels Aug 14, 2024

szubster had a problem deploying to community-ci-auto August 14, 2024 12:11 — with GitHub Actions Failure

szubster temporarily deployed to community-ci-auto August 14, 2024 12:11 — with GitHub Actions Inactive

szubster had a problem deploying to community-ci August 14, 2024 12:11 — with GitHub Actions Error

szubster temporarily deployed to community-ci-auto August 14, 2024 12:24 — with GitHub Actions Inactive

szubster temporarily deployed to community-ci August 14, 2024 12:24 — with GitHub Actions Inactive

natikgadzhi reviewed Aug 14, 2024

View reviewed changes

szubster had a problem deploying to community-ci August 19, 2024 11:45 — with GitHub Actions Error

szubster temporarily deployed to community-ci-auto August 19, 2024 11:45 — with GitHub Actions Inactive

szubster had a problem deploying to community-ci-auto August 19, 2024 11:45 — with GitHub Actions Failure

szubster had a problem deploying to community-ci August 19, 2024 11:45 — with GitHub Actions Error

szubster had a problem deploying to community-ci-auto August 19, 2024 11:55 — with GitHub Actions Failure

szubster temporarily deployed to community-ci-auto August 19, 2024 11:55 — with GitHub Actions Inactive

szubster had a problem deploying to community-ci August 19, 2024 11:55 — with GitHub Actions Error

szubster changed the title ~~Check for simple strings in interpolated_string and disable eval for them~~ Improve performance of interpolation in decalarative sources Aug 19, 2024

szubster added 4 commits August 19, 2024 14:01

Check for simple strings in interpolated_string and disable eval for …

ecc1177

…them

Fix formatting

1f7c8bf

Add cache and optimise Jinja templating

bdd0998

Add comments explaining functionality

61a15d3

szubster force-pushed the faster-interpolated-string branch from e128129 to 61a15d3 Compare August 19, 2024 12:01

Back off some some performance improvements, for backward compatibility

4b0bbf0

szubster had a problem deploying to community-ci August 20, 2024 13:16 — with GitHub Actions Error

szubster had a problem deploying to community-ci-auto August 20, 2024 13:16 — with GitHub Actions Failure

szubster temporarily deployed to community-ci-auto August 20, 2024 13:16 — with GitHub Actions Inactive

szubster had a problem deploying to community-ci August 20, 2024 13:17 — with GitHub Actions Error

szubster had a problem deploying to community-ci-auto August 20, 2024 13:29 — with GitHub Actions Error

szubster had a problem deploying to community-ci August 20, 2024 13:29 — with GitHub Actions Error

szubster had a problem deploying to community-ci-auto August 20, 2024 13:29 — with GitHub Actions Error

szubster had a problem deploying to community-ci August 20, 2024 13:29 — with GitHub Actions Error

szubster force-pushed the faster-interpolated-string branch from 3610819 to 6d9ebc5 Compare August 20, 2024 13:31

formatting

cc27a11

szubster force-pushed the faster-interpolated-string branch from 6d9ebc5 to cc27a11 Compare August 20, 2024 13:32

szubster temporarily deployed to community-ci August 20, 2024 13:32 — with GitHub Actions Inactive

szubster temporarily deployed to community-ci-auto August 20, 2024 13:32 — with GitHub Actions Inactive

Merge branch 'master' into faster-interpolated-string

3c95b01

girarda had a problem deploying to community-ci August 21, 2024 20:01 — with GitHub Actions Failure

girarda temporarily deployed to community-ci-auto August 21, 2024 20:01 — with GitHub Actions Inactive

vercel bot deployed to Preview August 21, 2024 20:06 View deployment

girarda merged commit 9e35a88 into airbytehq:master Aug 22, 2024
56 of 64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of interpolation in decalarative sources #44027

Improve performance of interpolation in decalarative sources #44027

szubster commented Aug 14, 2024 •

edited

Loading

vercel bot commented Aug 14, 2024 •

edited

Loading

CLAassistant commented Aug 14, 2024 •

edited

Loading

natikgadzhi left a comment

natikgadzhi Aug 14, 2024

szubster Aug 19, 2024

natikgadzhi Aug 14, 2024

girarda Aug 15, 2024

szubster Aug 19, 2024

szubster commented Aug 15, 2024

szubster commented Aug 15, 2024

girarda commented Aug 15, 2024

szubster commented Aug 21, 2024

girarda commented Aug 21, 2024

szubster commented Aug 21, 2024

natikgadzhi commented Aug 22, 2024

girarda commented Aug 22, 2024

szubster commented Aug 22, 2024

Improve performance of interpolation in decalarative sources #44027

Improve performance of interpolation in decalarative sources #44027

Conversation

szubster commented Aug 14, 2024 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented Aug 14, 2024 • edited Loading

CLAassistant commented Aug 14, 2024 • edited Loading

natikgadzhi left a comment

Choose a reason for hiding this comment

natikgadzhi Aug 14, 2024

Choose a reason for hiding this comment

szubster Aug 19, 2024

Choose a reason for hiding this comment

natikgadzhi Aug 14, 2024

Choose a reason for hiding this comment

girarda Aug 15, 2024

Choose a reason for hiding this comment

szubster Aug 19, 2024

Choose a reason for hiding this comment

szubster commented Aug 15, 2024

szubster commented Aug 15, 2024

girarda commented Aug 15, 2024

szubster commented Aug 21, 2024

girarda commented Aug 21, 2024

szubster commented Aug 21, 2024

natikgadzhi commented Aug 22, 2024

girarda commented Aug 22, 2024

szubster commented Aug 22, 2024

szubster commented Aug 14, 2024 •

edited

Loading

vercel bot commented Aug 14, 2024 •

edited

Loading

CLAassistant commented Aug 14, 2024 •

edited

Loading