Skip to content

#1313 source google ads: write less logs #21517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 23, 2023

Conversation

davydov-d
Copy link
Collaborator

What

https://github.com/airbytehq/oncall/issues/1313

How

Filter out each 10th log record

@octavia-squidington-iv octavia-squidington-iv added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/google-ads labels Jan 18, 2023
@davydov-d davydov-d requested a review from brianjlai January 18, 2023 12:10
@davydov-d
Copy link
Collaborator Author

davydov-d commented Jan 18, 2023

/test connector=connectors/source-google-ads

🕑 connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3948722563
❌ connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3948722563
🐛 https://gradle.com/s/kiz4dbndylt5a

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream accou...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
============= 1 failed, 28 passed, 2 skipped in 1608.20s (0:26:48) =============

@davydov-d
Copy link
Collaborator Author

davydov-d commented Jan 18, 2023

/test connector=connectors/source-google-ads

🕑 connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3949865210
✅ connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3949865210
Python tests coverage:

Name                                       Stmts   Miss  Cover
--------------------------------------------------------------
source_google_ads/models.py                   18      0   100%
source_google_ads/__init__.py                  2      0   100%
source_google_ads/streams.py                 196      6    97%
source_google_ads/source.py                   86      5    94%
source_google_ads/custom_query_stream.py      75      5    93%
source_google_ads/google_ads.py               73     12    84%
--------------------------------------------------------------
TOTAL                                        450     28    94%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
================== 29 passed, 2 skipped in 1419.82s (0:23:39) ==================

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about some specifics of how we use the sieve, but overall the approach looks good.

@@ -154,6 +177,7 @@ def current_state(self, customer_id, default=None):

def stream_slices(self, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Optional[MutableMapping[str, any]]]:
for customer in self.customers:
logger = cyclic_sieve(self.logger, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid initializing a new cyclic_sieve since it should have already been defined on the stream as self.incremental_sieve_logger?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we can, but this was done on purpose. The idea was to filter out not just 9 out of 10 messages, but 9 out of 10 message cycles to have more consistent logs so an engineer shouldn't have to stick together messages from different routine calls / cycles / exceptions / if statements and so on when debugging. Since stream_slices() is called once per stream before reading records, and read_records() is called once per stream slice, these are two different log cycles.
If you still insist, I can reuse self.incremental_sieve_logger here, technically it is not a problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, that makes sense and I agree that for a better experience we wouldn't want to share the same log cycles. Can you add a comment for that here just mentioning that's why a logger sieve is initialized here.

@@ -218,3 +219,14 @@ def test_retry_transient_errors(mocker, config, customers, error_cls):
records = list(stream.read_records(sync_mode=SyncMode.incremental, cursor_field=["segments.date"], stream_slice=stream_slice))
assert mocked_search.call_count == 5
assert records == []


def test_cyclic_sieve(caplog):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding tests for the new logger?

@davydov-d
Copy link
Collaborator Author

davydov-d commented Jan 19, 2023

/test connector=connectors/source-google-ads

🕑 connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3956741759

@davydov-d davydov-d requested a review from brianjlai January 19, 2023 08:42
@davydov-d
Copy link
Collaborator Author

davydov-d commented Jan 19, 2023

/test connector=connectors/source-google-ads

🕑 connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3957323653
✅ connectors/source-google-ads https://github.com/airbytehq/airbyte/actions/runs/3957323653
Python tests coverage:

Name                                       Stmts   Miss  Cover
--------------------------------------------------------------
source_google_ads/models.py                   18      0   100%
source_google_ads/__init__.py                  2      0   100%
source_google_ads/streams.py                 195      6    97%
source_google_ads/source.py                   86      5    94%
source_google_ads/custom_query_stream.py      75      5    93%
source_google_ads/google_ads.py               73     12    84%
--------------------------------------------------------------
TOTAL                                        449     28    94%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
================== 29 passed, 2 skipped in 1699.95s (0:28:19) ==================

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall no blocking comments, but had one discussion point about whether we want a new sieve per-read_records invocation if that made interpreting the logs more intuitive

@@ -154,6 +177,7 @@ def current_state(self, customer_id, default=None):

def stream_slices(self, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Optional[MutableMapping[str, any]]]:
for customer in self.customers:
logger = cyclic_sieve(self.logger, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, that makes sense and I agree that for a better experience we wouldn't want to share the same log cycles. Can you add a comment for that here just mentioning that's why a logger sieve is initialized here.

@@ -189,7 +213,8 @@ def read_records(
and update `start_date` key in the `stream_slice` with the latest read record's cursor value, then retry the sync.
"""
while True:
self.logger.info("Starting a while loop iteration")
self.incremental_sieve_logger.bump()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going off the prior comment about separate slice and read_records log cycles, do you think it's also worth initializing a new sieve every time read_records is invoked? If the sieve is shared on the stream, we will emit exactly 1/10 of logs, but it could be a little confusing why certain cycles are skipped on different read_records invocations if a prior one has a partially bumped count.

I'm open to either option and this is non-blocking on approval, but I think worth highlighting when the sieve is shared at this point? If we do decide to have a separate sieve per read_records invocation, we could probably remove it from __init__() and do something similar to slices where it is created within this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tricky point cause normally a while loop here contains a single iteration. It only goes further if there were errors during the read. So I do not think there's much sense in initializing new sieve each time read_records is invoked. On the other hand I agree that logs may become inconsistent if there are two or more while loop iterations, therefore I moved incremental_sieve_logger.bump() outside the loop to remain consistent

@davydov-d
Copy link
Collaborator Author

davydov-d commented Jan 20, 2023

/publish connector=connectors/source-google-ads

🕑 Publishing the following connectors:
connectors/source-google-ads
https://github.com/airbytehq/airbyte/actions/runs/3967044312


Connector Did it publish? Were definitions generated?
connectors/source-google-ads

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 20, 2023 11:47 — with GitHub Actions Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 20, 2023 11:47 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2023

Airbyte Code Coverage

There is no coverage information present for the Files changed

Total Project Coverage 26.72% 🍏

@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:05 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:05 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:38 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:38 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:49 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 12:49 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 13:51 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 13:51 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 15:17 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 15:17 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 17:58 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 20, 2023 17:58 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 23, 2023 07:40 — with GitHub Actions Inactive
@davydov-d davydov-d temporarily deployed to more-secrets January 23, 2023 07:40 — with GitHub Actions Inactive
@davydov-d davydov-d merged commit e35dc23 into master Jan 23, 2023
@davydov-d davydov-d deleted the ddavydov/#1313-source-google-ads-write-fewer-logs branch January 23, 2023 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/google-ads
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants