Regression testing version comparison #35534

clnoll · 2024-02-22T02:35:24Z

What

Introduces the first phase of regression tests, which allows us to compare two arbitrary versions of a connector against each other.

The regression test run can be invoked with the following:

airbyte-ci connectors --name=<connector_name> regression_test
                     [--control-version=<version>]  # where version is a docker image name & tag, e.g. airbyte/source-faker:0.1.1
                     [--target-version=<version>]
                     [--fail-fast]

Implementation

At a high level, we are using the airbyte-ci framework to do 2 CAT runs:

A "control" run, which runs CATs against an input version of the container (intended to be a trusted version). During this run we write the records to a file and skip the expected record check.
A "target" run, which runs CATs against an input version of the container (the version to be tested), whose records are compared against the stored records from 1.

To make this work, we pass the control CAT run flag instructing it to store records at a specific location; when run via airbyte-ci/dagger, the records are then exported to the host machine. After the control CATs are complete, we prepare the the target run by writing a new acceptance test config file, which uses the new location for expected records; when run via airbyte-ci/dagger, the new config file and the control's records are mounted to the target CAT container.

High-level overview (source):

CAT modifications:

--acceptance-test-config-filepath: Path to an acceptance test config yml, to override the acceptance-test-config.yml in the default location.
--connector-version: The image tag for the version of the connector to test. If not set, we use the connector version in acceptance-test-config.yml.
--store-expected-records: Folder to store expected records, e.g. for inspection or use in a subsequent test. If not set records will not be stored.
--container-id-filepath: Path to the container ID, when one is present and differs from the default (/tmp/container_id.txt).

airbyte-ci regression_test subcommand:

--control-version: Control version of the connector to be tested. If not set, defaults to the latest version.
--target-version: Target version of the connector being tested. If not set, defaults to a dev version.
--fail-fast: When enabled, tests will fail fast.

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/2855.

vercel · 2024-02-22T02:35:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Feb 25, 2024 2:31am

alafanechere

Before raising my concerns I really want to thank your for the care you've put in crafting this PR!

I'm sorry this feedback comes after and implementation. But our recent discussions in the blowing up your test suites sheds some lights on how clearly defined test suites scopes could help our connector teams.

I have the following concern with the design you're introducing:

Regression testing, as a tool, is totally coupled to two components:

airbyte-ci for interface declaration and CAT orchestration
CAT to run the expected record tests
It means we'll have to make the regression testing framework evolve according to changes to these projects, e vice versa. It also mean regression test can't not easily run independently, which is something we might want if we ever use regression testing in blue green deployments.

And conceptually your changes is not making regression test a new test suite, it's a new CAT orchestration mode. I think it makes it hard to reason about what regression tests are, and to reason about CAT too as they're being complexified.

In the light of the "blowing up our test suite" discussions we had last week I'd love to push for introducing a separate and independent regression test suite (which could be run indenpendly or via airbyte-ci). This would allow to:

Not add more things to CAT whose scoped is not clearly identified: it currently does too much things.
Keep airbyte-ci as an orchestrator with no "business logic" awareness.

In practice I think this could look like this:

We create a new python package (in airbyte-ci/connectors/regression_testing)?
This package uses dagger and reuses the ConnectorRunner to run commands against connectors
This packages has a CLI entry point (using click) - we can reuse the one you've defined: regression-test --control-version=0.1.0 --target-version=1.0.0
Actual tests are declared inside this package - (We could use programatically call pytest if we want to declare tests with it)
If we want to manually run regression tests we could directly call its CLI.
If we want to automatically run regression tests in CI we would add a new Step in the airbyte-ci connectors test pipeline.

Let me know how this sounds to you. I'd happily jump in helping you making these changes!
I think it's the kind of move I've taken for connectors_qa: a new python package with clearly scoped checks - orchestrated by airbyte-ci connector tests

clnoll · 2024-02-26T07:20:23Z

Hey @alafanechere thanks for taking a look. The tldr is that I think I agree with you.

A few thoughts -

If the point of regression testing is to identify differences in the behavior of two connector runs, we can see that CATs already do regression testing - they just do it against a hardcoded "control" version instead of a dynamically generated one. With that in mind, one of the primary reasons that I tied this to CATs to begin with was because I felt that most of the basic tests we'd want to run in a regression test suite were already being handled by CATs. I still do think that's true, but don't mind having some overlap between what's covered by which suite.
If people like more separation of concerns, I suppose we could get rid of regression-style tests in CATs, and only have them ensure that connectors are conforming to the Airbyte protocol. I think that could work, as long as in other suites we're not too strict about what kind of test goes where. I think too much strictness could put us at greater risk of missing coverage if it causes people to think in terms of test suites instead of what overall tests there should be for a given change.
I agree with the drawbacks to coupling them too closely that you brought up, and I do think there's value in letting the regression tests evolve on their own. So, I'm going to close this PR for now.

I played around with your suggestion and like the result - and think that we'll put something very useful together quickly. It's very WIP and needs some cleanup but the basic structure is there and it runs (writing output to different directories for each version of the connector; haven't implemented a comparison of the output yet). LMK what you think.

clnoll requested a review from a team February 22, 2024 02:35

clnoll requested review from lazebnyi and oustynova as code owners February 22, 2024 02:35

clnoll marked this pull request as draft February 22, 2024 02:35

clnoll closed this Feb 22, 2024

clnoll deleted the regression-testing-version-comparison branch February 22, 2024 02:36

clnoll restored the regression-testing-version-comparison branch February 22, 2024 02:36

clnoll deleted the regression-testing-version-comparison branch February 22, 2024 02:38

clnoll restored the regression-testing-version-comparison branch February 22, 2024 02:38

clnoll reopened this Feb 22, 2024

clnoll force-pushed the regression-testing-version-comparison branch 2 times, most recently from 1289099 to cc7dd6c Compare February 22, 2024 05:35

airbyte-ci: Add regression_tests option

299a543

clnoll force-pushed the regression-testing-version-comparison branch from 808fcd9 to 299a543 Compare February 25, 2024 02:31

clnoll marked this pull request as ready for review February 25, 2024 02:31

clnoll requested a review from alafanechere February 25, 2024 02:31

alafanechere suggested changes Feb 25, 2024

View reviewed changes

clnoll closed this Feb 26, 2024

clnoll deleted the regression-testing-version-comparison branch May 8, 2024 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression testing version comparison #35534

Regression testing version comparison #35534

Uh oh!

clnoll commented Feb 22, 2024 •

edited

Loading

Uh oh!

vercel bot commented Feb 22, 2024 •

edited

Loading

Uh oh!

alafanechere left a comment

Uh oh!

clnoll commented Feb 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

Regression testing version comparison #35534

Regression testing version comparison #35534

Uh oh!

Conversation

clnoll commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Implementation

Uh oh!

vercel bot commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alafanechere left a comment

Choose a reason for hiding this comment

Uh oh!

clnoll commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

clnoll commented Feb 22, 2024 •

edited

Loading

vercel bot commented Feb 22, 2024 •

edited

Loading

clnoll commented Feb 26, 2024 •

edited

Loading