-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Airbyte-ci: Add --metadata-query
option
#30330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
|
||
@pytest.fixture() | ||
def set_working_dir_to_repo_root(monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I discovered we are not currently running these tests.
I imagine its because many of them depend on connectors being preset in the main repo.
This was a shortcut I took to have them passing again. But I want to raise a flag that it is a hack. Just not sure if its an acceptable short term hack.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I've set it in airbyte-ci/pipelines
in a slighlty different way:
@pytest.fixture(scope="session")
def airbyte_repo_path() -> Path:
return Path(git.Repo(search_parent_directories=True).working_tree_dir)
@pytest.fixture(autouse=True, scope="session")
def from_airbyte_root(airbyte_repo_path):
"""
Change the working directory to the root of the Airbyte repo.
This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True.
"""
original_dir = Path.cwd()
os.chdir(airbyte_repo_path)
yield airbyte_repo_path
os.chdir(original_dir)
It's a bit less hacky as the git
library finds the repo path.
The sessions
scope is cool to not re-evaluate this fixture on each test function depending on it.
@@ -41,4 +41,4 @@ jobs: | |||
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }} | |||
git_branch: ${{ steps.extract_branch.outputs.branch }} | |||
github_token: ${{ secrets.GITHUB_TOKEN }} | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test" | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will reduce our weekly connector tests from over 200 down to ~70.
The question is should we test the 100 level ql community connectors weekly as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, it would still be nice to define this query as a variable that describes what it is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!
6267363
to
33f3a60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from me on the basics - I like the metadata query a lot, and you've got good tests around it
@@ -41,4 +41,4 @@ jobs: | |||
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }} | |||
git_branch: ${{ steps.extract_branch.outputs.branch }} | |||
github_token: ${{ secrets.GITHUB_TOKEN }} | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test" | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cool! The parsing knows to convert the input into numbers and do math?!
Edit: Thanks, simple_eval!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't comment on the short term hack - perhaps i'd pull it out for now to get this in and reconsider it later. Otherwise, added some documentation notes, but looks like a really nice solution!
@@ -41,4 +41,4 @@ jobs: | |||
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }} | |||
git_branch: ${{ steps.extract_branch.outputs.branch }} | |||
github_token: ${{ secrets.GITHUB_TOKEN }} | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test" | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, it would still be nice to define this query as a variable that describes what it is for
Examples | ||
-------- | ||
>>> connector.metadata_query_match("'s3' in data.name") | ||
True | ||
|
||
>>> connector.metadata_query_match("data.supportLevel == 'certified'") | ||
False | ||
|
||
>>> connector.metadata_query_match("data.ab_internal.ql >= 100") | ||
True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great examples!
assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg") | ||
assert len(connector.version.split(".")) == 3 | ||
else: | ||
assert connector.metadata is None | ||
assert connector.support_level is None | ||
assert connector.acceptance_test_config is None | ||
assert connector.icon_path == Path(f"./airbyte-config-oss/init-oss/src/main/resources/icons/{connector.name}.svg") | ||
assert connector.icon_path == Path(f"./airbyte-integrations/connectors/{connector.technical_name}/icon.svg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these were tests that weren't being run, and didn't pass after we made them run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly! I want to get them run automatically again #30330 (comment)
Ill address that in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks for fixing them in any case :D
@@ -122,6 +122,7 @@ Available commands: | |||
| `--use-remote-secrets` | False | True | If True, connectors configuration will be pulled from Google Secret Manager. Requires the GCP_GSM_CREDENTIALS environment variable to be set with a service account with permission to read GSM secrets. If False the connector configuration will be read from the local connector `secrets` folder. | | |||
| `--name` | True | | Select a specific connector for which the pipeline will run. Can be used multiple time to select multiple connectors. The expected name is the connector technical name. e.g. `source-pokeapi` | | |||
| `--support-level` | True | | Select connectors with a specific support level: `community`, `certified`. Can be used multiple times to select multiple support levels. | | |||
| `--metadata-query` | False | | Filter connectors by metadata query using `simpleeval`. e.g. 'data.ab_internal.ql == 200' | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the link to simpleeval? and will everyone need to backslash the quotes in the usage like here? some more concrete examples of using this in the readme might be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially pointing out that data
points to the top level of the metadata file - not sure people will know that.
non_empty_connector_sets = [ | ||
connector_set | ||
for connector_set in [ | ||
selected_connectors_by_name, | ||
selected_connectors_by_support_level, | ||
selected_connectors_by_language, | ||
selected_connectors_by_query, | ||
selected_modified_connectors, | ||
] | ||
if connector_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: line 100, add "by query" to the intersection of listed groups comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool feature! I added a suggestion related to the test execution from the repo root.
@@ -41,4 +41,4 @@ jobs: | |||
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }} | |||
git_branch: ${{ steps.extract_branch.outputs.branch }} | |||
github_token: ${{ secrets.GITHUB_TOKEN }} | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --support-level=community' }} test" | |||
subcommand: "--show-dagger-logs connectors ${{ inputs.test-connectors-options || '--concurrency=3 --metadata-query=\"(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)\"' }} test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnchrch looks like simple_eval is providing safe eval 👍
|
||
|
||
@pytest.fixture() | ||
def set_working_dir_to_repo_root(monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I've set it in airbyte-ci/pipelines
in a slighlty different way:
@pytest.fixture(scope="session")
def airbyte_repo_path() -> Path:
return Path(git.Repo(search_parent_directories=True).working_tree_dir)
@pytest.fixture(autouse=True, scope="session")
def from_airbyte_root(airbyte_repo_path):
"""
Change the working directory to the root of the Airbyte repo.
This will make all the tests current working directory to be the root of the Airbyte repo as we've set autouse=True.
"""
original_dir = Path.cwd()
os.chdir(airbyte_repo_path)
yield airbyte_repo_path
os.chdir(original_dir)
It's a bit less hacky as the git
library finds the repo path.
The sessions
scope is cool to not re-evaluate this fixture on each test function depending on it.
Problem(s)
Solution
Allow for filters based on metadata.yaml contents
related to #30218
e.g.
airbyte-ci connectors --metadata-query="data.name == 'Postgres'" test
airbyte-ci connectors --metadata-query="(data.ab_internal.ql > 100) & (data.ab_internal.sl < 200)" test