Airbyte-ci: Add `--metadata-query` option #30330

bnchrch · 2023-09-12T01:22:58Z

Problem(s)

Selecting which category of connectors to run is too rigid and requires boilerplate for each new field.
Complex selections are impossible
We want to run weekly tests based on ab_internal

Solution

Allow for filters based on metadata.yaml contents

related to #30218
e.g.

airbyte-ci connectors --metadata-query="data.name == 'Postgres'" test
airbyte-ci connectors --metadata-query="(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)" test

bnchrch · 2023-09-12T01:25:44Z

airbyte-ci/connectors/connector_ops/tests/conftest.py

+
+
+@pytest.fixture()
+def set_working_dir_to_repo_root(monkeypatch):


@alafanechere I discovered we are not currently running these tests.

I imagine its because many of them depend on connectors being preset in the main repo.

This was a shortcut I took to have them passing again. But I want to raise a flag that it is a hack. Just not sure if its an acceptable short term hack.

Thoughts?

Good catch.
I've set it in airbyte-ci/pipelines in a slighlty different way:

@pytest.fixture(scope="session") def airbyte_repo_path() -> Path: return Path(git.Repo(search_parent_directories=True).working_tree_dir) @pytest.fixture(autouse=True, scope="session") def from_airbyte_root(airbyte_repo_path): """ Change the working directory to the root of the Airbyte repo. This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True. """ original_dir = Path.cwd() os.chdir(airbyte_repo_path) yield airbyte_repo_path os.chdir(original_dir)

It's a bit less hacky as the git library finds the repo path.
The sessions scope is cool to not re-evaluate this fixture on each test function depending on it.

bnchrch · 2023-09-12T01:27:28Z

.github/workflows/connectors_weekly_build.yml

@@ -41,4 +41,4 @@ jobs:
          gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
          git_branch: ${{ steps.extract_branch.outputs.branch }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
-          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
+          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"


This will reduce our weekly connector tests from over 200 down to ~70.

The question is should we test the 100 level ql community connectors weekly as well?

if possible, it would still be nice to define this query as a variable that describes what it is for

That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!

evantahler

👍 from me on the basics - I like the metadata query a lot, and you've got good tests around it

evantahler · 2023-09-12T22:36:43Z

.github/workflows/connectors_weekly_build.yml

@@ -41,4 +41,4 @@ jobs:
          gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
          git_branch: ${{ steps.extract_branch.outputs.branch }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
-          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
+          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"


That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!

erohmensing

Won't comment on the short term hack - perhaps i'd pull it out for now to get this in and reconsider it later. Otherwise, added some documentation notes, but looks like a really nice solution!

erohmensing · 2023-09-12T22:35:42Z

.github/workflows/connectors_weekly_build.yml

@@ -41,4 +41,4 @@ jobs:
          gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
          git_branch: ${{ steps.extract_branch.outputs.branch }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
-          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
+          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"


if possible, it would still be nice to define this query as a variable that describes what it is for

erohmensing · 2023-09-12T22:36:36Z

airbyte-ci/connectors/connector_ops/connector_ops/utils.py

+        Examples
+        --------
+        >>> connector.metadata_query_match("'s3' in data.name")
+        True
+
+        >>> connector.metadata_query_match("data.supportLevel == 'certified'")
+        False
+
+        >>> connector.metadata_query_match("data.ab_internal.ql >= 100")
+        True


great examples!

erohmensing · 2023-09-12T22:37:43Z

airbyte-ci/connectors/connector_ops/tests/test_utils.py

+            assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg")
            assert len(connector.version.split(".")) == 3
        else:
            assert connector.metadata is None
            assert connector.support_level is None
            assert connector.acceptance_test_config is None
-            assert connector.icon_path == Path(f"./airbyte-config-oss/init-oss/src/main/resources/icons/{connector.name}.svg")
+            assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg")


I guess these were tests that weren't being run, and didn't pass after we made them run?

exactly! I want to get them run automatically again #30330 (comment)

Ill address that in another PR

great, thanks for fixing them in any case :D

erohmensing · 2023-09-12T22:39:57Z

airbyte-ci/connectors/pipelines/README.md

@@ -122,6 +122,7 @@ Available commands:
 | `--use-remote-secrets`                                         | False    | True                             | If True, connectors configuration will be pulled from Google Secret Manager. Requires the GCP_GSM_CREDENTIALS environment variable to be set with a service account with permission to read GSM secrets. If False the connector configuration will be read from the local connector `secrets` folder. |
 | `--name`                                                       | True     |                                  | Select a specific connector for which the pipeline will run. Can be used multiple time to select multiple connectors. The expected name is the connector technical name. e.g. `source-pokeapi`                                                                                                        |
 | `--support-level`                                              | True     |                                  | Select connectors with a specific support level: `community`, `certified`.  Can be used multiple times to select multiple support levels.                                                                                                                                                             |
+| `--metadata-query`                                              | False     |                                | Filter connectors by metadata query using `simpleeval`. e.g. 'data.ab_internal.ql == 200' |


Can we add the link to simpleeval? and will everyone need to backslash the quotes in the usage like here? some more concrete examples of using this in the readme might be helpful

especially pointing out that data points to the top level of the metadata file - not sure people will know that.

erohmensing · 2023-09-12T22:41:14Z

airbyte-ci/connectors/pipelines/pipelines/commands/groups/connectors.py

    non_empty_connector_sets = [
        connector_set
        for connector_set in [
            selected_connectors_by_name,
            selected_connectors_by_support_level,
            selected_connectors_by_language,
+            selected_connectors_by_query,
            selected_modified_connectors,
        ]
        if connector_set


Nit: line 100, add "by query" to the intersection of listed groups comment

…l-test

alafanechere

Very cool feature! I added a suggestion related to the test execution from the repo root.

alafanechere · 2023-09-13T13:04:31Z

.github/workflows/connectors_weekly_build.yml

@@ -41,4 +41,4 @@ jobs:
          gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
          git_branch: ${{ steps.extract_branch.outputs.branch }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
-          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test"
+          subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test"


@bnchrch looks like simple_eval is providing safe eval 👍

alafanechere · 2023-09-13T13:06:45Z

airbyte-ci/connectors/connector_ops/tests/conftest.py

+
+
+@pytest.fixture()
+def set_working_dir_to_repo_root(monkeypatch):


Good catch.
I've set it in airbyte-ci/pipelines in a slighlty different way:

@pytest.fixture(scope="session") def airbyte_repo_path() -> Path: return Path(git.Repo(search_parent_directories=True).working_tree_dir) @pytest.fixture(autouse=True, scope="session") def from_airbyte_root(airbyte_repo_path): """ Change the working directory to the root of the Airbyte repo. This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True. """ original_dir = Path.cwd() os.chdir(airbyte_repo_path) yield airbyte_repo_path os.chdir(original_dir)

It's a bit less hacky as the git library finds the repo path.
The sessions scope is cool to not re-evaluate this fixture on each test function depending on it.

bnchrch requested review from alafanechere and a team September 12, 2023 01:22

bnchrch commented Sep 12, 2023

View reviewed changes

bnchrch added 7 commits September 12, 2023 13:59

Add simpleeval to connector ops

9ba5662

Add metadata query option

d3657f3

Update weekly test to use query

8ae4fee

Format

318e149

Update lock file

d6a5c34

Update PR number

59b8eee

Fix tests

33f3a60

bnchrch force-pushed the bnchrch/ci/weekly-ql-test branch from 6267363 to 33f3a60 Compare September 12, 2023 21:12

Automated Commit - Format and Process Resources Changes

1e8685b

evantahler approved these changes Sep 12, 2023

View reviewed changes

Merge branch 'master' into bnchrch/ci/weekly-ql-test

6b04534

erohmensing approved these changes Sep 12, 2023

View reviewed changes

bnchrch added 3 commits September 12, 2023 18:28

auto conf

d081e24

Address comments

a61a90e

Merge remote-tracking branch 'origin/master' into bnchrch/ci/weekly-q…

1de9ff3

…l-test

bnchrch enabled auto-merge (squash) September 13, 2023 01:34

bnchrch merged commit 13a5a40 into master Sep 13, 2023

bnchrch deleted the bnchrch/ci/weekly-ql-test branch September 13, 2023 01:53

alafanechere reviewed Sep 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Airbyte-ci: Add `--metadata-query` option #30330

Airbyte-ci: Add `--metadata-query` option #30330

bnchrch commented Sep 12, 2023 •

edited

Loading

bnchrch Sep 12, 2023

alafanechere Sep 13, 2023

bnchrch Sep 12, 2023

erohmensing Sep 12, 2023

evantahler Sep 12, 2023

evantahler left a comment

evantahler Sep 12, 2023

erohmensing left a comment

erohmensing Sep 12, 2023

erohmensing Sep 12, 2023

erohmensing Sep 12, 2023

bnchrch Sep 12, 2023

erohmensing Sep 12, 2023

erohmensing Sep 12, 2023

erohmensing Sep 12, 2023

erohmensing Sep 12, 2023

alafanechere left a comment

alafanechere Sep 13, 2023

alafanechere Sep 13, 2023



		@pytest.fixture()
		def set_working_dir_to_repo_root(monkeypatch):

Airbyte-ci: Add --metadata-query option #30330

Airbyte-ci: Add --metadata-query option #30330

Conversation

bnchrch commented Sep 12, 2023 • edited Loading

Problem(s)

Solution

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evantahler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erohmensing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Airbyte-ci: Add `--metadata-query` option #30330

Airbyte-ci: Add `--metadata-query` option #30330

bnchrch commented Sep 12, 2023 •

edited

Loading