|
| 1 | +# Connector Live Testing |
| 2 | + |
| 3 | +This project contains utilities for running connector tests against live data. |
| 4 | + |
| 5 | +## Requirements |
| 6 | +* `docker` |
| 7 | +* `Python ^3.10` |
| 8 | +* `pipx` |
| 9 | +* `poetry` |
| 10 | + |
| 11 | +## Install |
| 12 | +```bash |
| 13 | +# From tools/connectors/live-tests |
| 14 | +pipx install . |
| 15 | +# To install in editable mode for development |
| 16 | +pipx install . --force --editable |
| 17 | +``` |
| 18 | + |
| 19 | +## Commands |
| 20 | + |
| 21 | +### `debug` |
| 22 | + |
| 23 | +``` |
| 24 | +Usage: live-tests debug [OPTIONS] {check|discover|read|read-with-state|spec} |
| 25 | +
|
| 26 | + Run a specific command on one or multiple connectors and persists the |
| 27 | + outputs to local storage. |
| 28 | +
|
| 29 | +Options: |
| 30 | + --connection-id TEXT |
| 31 | + --config-path FILE |
| 32 | + --catalog-path FILE |
| 33 | + --state-path FILE |
| 34 | + -c, --connector-image TEXT Docker image name of the connector to debug |
| 35 | + (e.g. `airbyte/source-faker:latest`, |
| 36 | + `airbyte/source-faker:dev`) [required] |
| 37 | + -hc, --http-cache Use the HTTP cache for the connector. |
| 38 | + --help Show this message and exit. |
| 39 | +``` |
| 40 | + |
| 41 | +This command is made to run any of the following connector commands against one or multiple connector images. |
| 42 | + |
| 43 | +**Available connector commands:** |
| 44 | +* `spec` |
| 45 | +* `check` |
| 46 | +* `discover` |
| 47 | +* `read` or `read_with_state` (requires a `--state-path` to be passed) |
| 48 | + |
| 49 | +It will write artifacts to an output directory: |
| 50 | +* `stdout.log`: The collected standard output following the command execution |
| 51 | +* `stderr.log`: The collected standard error following the c |
| 52 | +* `http_dump.txt`: An `mitmproxy` http stream log. Can be consumed with `mitmweb` (version `9.0.1`) for debugging. |
| 53 | +* `airbyte_messages.db`: A DuckDB database containing the messages produced by the connector. |
| 54 | +* `airbyte_messages`: A directory containing `.jsonl` files for each message type (logs, records, traces, controls, states etc.) produced by the connector. |
| 55 | + |
| 56 | +#### Example |
| 57 | +Let's run `debug` to check the output of `read` on two different versions of the same connector: |
| 58 | + |
| 59 | +```bash |
| 60 | +live-tests debug read \ |
| 61 | +--connector-image=airbyte/source-pokeapi:dev \ |
| 62 | +--connector-image=airbyte/source-pokeapi:latest \ |
| 63 | +--config-path=poke_config.json \ |
| 64 | +--catalog-path=configured_catalog.json |
| 65 | +``` |
| 66 | + |
| 67 | +It will store the results in a `live_test_debug_reports` directory under the current working directory: |
| 68 | + |
| 69 | +``` |
| 70 | +live_tests_debug_reports |
| 71 | +└── 1709547771 |
| 72 | + └── source-pokeapi |
| 73 | + └── read |
| 74 | + ├── dev |
| 75 | + │ ├── airbyte_messages |
| 76 | + | │ ├── duck.db # DuckDB database |
| 77 | + │ │ ├── logs.jsonl |
| 78 | + │ │ ├── records.jsonl |
| 79 | + │ │ └── traces.jsonl |
| 80 | + │ ├── stderr.log |
| 81 | + │ └── stdout.log |
| 82 | + └── latest |
| 83 | + ├── airbyte_messages |
| 84 | + │ ├── duck.db # DuckDB database |
| 85 | + │ ├── logs.jsonl |
| 86 | + │ ├── records.jsonl |
| 87 | + │ └── traces.jsonl |
| 88 | + ├── stderr.log |
| 89 | + └── stdout.log |
| 90 | +
|
| 91 | +``` |
| 92 | + |
| 93 | +You can also run the `debug` command on a live connection by passing the `--connection-id` option: |
| 94 | + |
| 95 | +```bash |
| 96 | +live-tests debug read \ |
| 97 | +--connector-image=airbyte/source-pokeapi:dev \ |
| 98 | +--connector-image=airbyte/source-pokeapi:latest \ |
| 99 | +--connection-id=<CONNECTION-ID> |
| 100 | +``` |
| 101 | + |
| 102 | +##### Consuming `http_dump.mitm` |
| 103 | +You can install [`mitmproxy`](https://mitmproxy.org/): |
| 104 | +```bash |
| 105 | +pipx install mitmproxy |
| 106 | +``` |
| 107 | + |
| 108 | +And run: |
| 109 | +```bash |
| 110 | +mitmweb --rfile=http_dump.mitm |
| 111 | +``` |
| 112 | + |
| 113 | +## Regression tests |
| 114 | +We created a regression test suite to run tests to compare the outputs of connector commands on different versions of the same connector. |
| 115 | + |
| 116 | +You can run the existing test suites with the following command: |
| 117 | + |
| 118 | +#### With local connection objects (`config.json`, `catalog.json`, `state.json`) |
| 119 | +```bash |
| 120 | +poetry run pytest src/live_tests/regression_tests \ |
| 121 | +--connector-image=airbyte/source-faker \ |
| 122 | + --config-path=<path-to-config-path> \ |
| 123 | + --catalog-path=<path-to-catalog-path> \ |
| 124 | + --target-version=dev \ |
| 125 | + --control-version=latest |
| 126 | + --pr-url=<PR-URL> # The URL of the PR you are testing |
| 127 | +``` |
| 128 | + |
| 129 | +#### Using a live connection |
| 130 | +The live connection objects will be fetched. |
| 131 | + |
| 132 | +```bash |
| 133 | + poetry run pytest src/live_tests/regression_tests \ |
| 134 | + --connector-image=airbyte/source-faker \ |
| 135 | + --connection-id=<CONNECTION-ID> \ |
| 136 | + --target-version=dev \ |
| 137 | + --control-version=latest |
| 138 | + --pr-url=<PR-URL> # The URL of the PR you are testing |
| 139 | + ``` |
| 140 | + |
| 141 | +You can also pass local connection objects path to override the live connection objects with `--config-path`, `--state-path` or `--catalog-path`. |
| 142 | + |
| 143 | +#### Test artifacts |
| 144 | +The test suite run will produce test artifacts in the `/tmp/regression_tests_artifacts/` folder. |
| 145 | +**They will get cleared after each test run on prompt exit. Please do not copy them elsewhere in your filesystem as they contain sensitive data that are not meant to be stored outside of your debugging session!** |
| 146 | + |
| 147 | +##### Artifacts types |
| 148 | +* `report.html`: A report of the test run. |
| 149 | +* `stdout.log`: The collected standard output following the command execution |
| 150 | +* `stderr.log`: The collected standard error following the command execution |
| 151 | +* `http_dump.mitm`: An `mitmproxy` http stream log. Can be consumed with `mitmweb` (version `>=10`) for debugging. |
| 152 | +* `http_dump.har`: An `mitmproxy` http stream log in HAR format (a JSON encoded version of the mitm dump). |
| 153 | +* `airbyte_messages`: A directory containing `.jsonl` files for each message type (logs, records, traces, controls, states etc.) produced by the connector. |
| 154 | +* `duck.db`: A DuckDB database containing the messages produced by the connector. |
| 155 | +* `dagger.log`: The log of the Dagger session, useful for debugging errors unrelated to the tests. |
| 156 | + |
| 157 | +**Tests can also write specific artifacts like diffs under a directory named after the test function.** |
| 158 | + |
| 159 | + |
| 160 | +``` |
| 161 | +/tmp/regression_tests_artifacts |
| 162 | +└── session_1710754231 |
| 163 | + ├── duck.db |
| 164 | + |── report.html |
| 165 | + ├── command_execution_artifacts |
| 166 | + │ └── source-orb |
| 167 | + │ ├── check |
| 168 | + │ │ ├── dev |
| 169 | + │ │ │ ├── airbyte_messages |
| 170 | + │ │ │ │ ├── connection_status.jsonl |
| 171 | + │ │ │ │ └── logs.jsonl |
| 172 | + │ │ │ ├── http_dump.har |
| 173 | + │ │ │ ├── http_dump.mitm |
| 174 | + │ │ │ ├── stderr.log |
| 175 | + │ │ │ └── stdout.log |
| 176 | + │ │ └── latest |
| 177 | + │ │ ├── airbyte_messages |
| 178 | + │ │ │ ├── connection_status.jsonl |
| 179 | + │ │ │ └── logs.jsonl |
| 180 | + │ │ ├── http_dump.har |
| 181 | + │ │ ├── http_dump.mitm |
| 182 | + │ │ ├── stderr.log |
| 183 | + │ │ └── stdout.log |
| 184 | + │ ├── discover |
| 185 | + │ │ ├── dev |
| 186 | + │ │ │ ├── airbyte_messages |
| 187 | + │ │ │ │ └── catalog.jsonl |
| 188 | + │ │ │ ├── http_dump.har |
| 189 | + │ │ │ ├── http_dump.mitm |
| 190 | + │ │ │ ├── stderr.log |
| 191 | + │ │ │ └── stdout.log |
| 192 | + │ │ └── latest |
| 193 | + │ │ ├── airbyte_messages |
| 194 | + │ │ │ └── catalog.jsonl |
| 195 | + │ │ ├── http_dump.har |
| 196 | + │ │ ├── http_dump.mitm |
| 197 | + │ │ ├── stderr.log |
| 198 | + │ │ └── stdout.log |
| 199 | + │ ├── read-with-state |
| 200 | + │ │ ├── dev |
| 201 | + │ │ │ ├── airbyte_messages |
| 202 | + │ │ │ │ ├── logs.jsonl |
| 203 | + │ │ │ │ ├── records.jsonl |
| 204 | + │ │ │ │ ├── states.jsonl |
| 205 | + │ │ │ │ └── traces.jsonl |
| 206 | + │ │ │ ├── http_dump.har |
| 207 | + │ │ │ ├── http_dump.mitm |
| 208 | + │ │ │ ├── stderr.log |
| 209 | + │ │ │ └── stdout.log |
| 210 | + │ │ └── latest |
| 211 | + │ │ ├── airbyte_messages |
| 212 | + │ │ │ ├── logs.jsonl |
| 213 | + │ │ │ ├── records.jsonl |
| 214 | + │ │ │ ├── states.jsonl |
| 215 | + │ │ │ └── traces.jsonl |
| 216 | + │ │ ├── http_dump.har |
| 217 | + │ │ ├── http_dump.mitm |
| 218 | + │ │ ├── stderr.log |
| 219 | + │ │ └── stdout.log |
| 220 | + │ └── spec |
| 221 | + │ ├── dev |
| 222 | + │ │ ├── airbyte_messages |
| 223 | + │ │ │ └── spec.jsonl |
| 224 | + │ │ ├── stderr.log |
| 225 | + │ │ └── stdout.log |
| 226 | + │ └── latest |
| 227 | + │ ├── airbyte_messages |
| 228 | + │ │ └── spec.jsonl |
| 229 | + │ ├── stderr.log |
| 230 | + │ └── stdout.log |
| 231 | + └── dagger.log |
| 232 | + ``` |
| 233 | + |
| 234 | +#### HTTP Proxy and caching |
| 235 | +We use a containerized `mitmproxy` to capture the HTTP traffic between the connector and the source. Connector command runs produce `http_dump.mitm` (can be consumed with `mitmproxy` (version `>=10`) for debugging) and `http_dump.har` (a JSON encoded version of the mitm dump) artifacts. |
| 236 | +The traffic recorded on the control connector is passed to the target connector proxy to cache the responses for requests with the same URL. This is useful to avoid hitting the source API multiple times when running the same command on different versions of the connector. |
| 237 | + |
| 238 | +## Changelog |
| 239 | + |
| 240 | +### 0.14.2 |
| 241 | +Fix KeyError when target & control streams differ. |
| 242 | + |
| 243 | +### 0.14.1 |
| 244 | +Improve performance when reading records per stream. |
| 245 | + |
| 246 | +### 0.14.0 |
| 247 | +Track usage via Segment. |
| 248 | + |
| 249 | +### 0.13.0 |
| 250 | +Show test docstring in the test report. |
| 251 | + |
| 252 | +### 0.12.0 |
| 253 | +Implement a test to compare schema inferred on both control and target version. |
| 254 | + |
| 255 | +### 0.11.0 |
| 256 | +Create a global duckdb instance to store messages produced by the connector in target and control version. |
| 257 | + |
| 258 | +### 0.10.0 |
| 259 | +Show record count per stream in report and list untested streams. |
| 260 | + |
| 261 | +### 0.9.0 |
| 262 | +Make the regressions tests suite better at handling large connector outputs. |
| 263 | + |
| 264 | +### 0.8.1 |
| 265 | +Improve diff output. |
| 266 | + |
| 267 | +### 0.8.0 |
| 268 | +Regression tests: add an HTML report. |
| 269 | + |
| 270 | +### 0.7.0 |
| 271 | +Improve the proxy workflow and caching logic + generate HAR files. |
| 272 | + |
| 273 | +### 0.6.6 |
| 274 | +Exit pytest if connection can't be retrieved. |
| 275 | + |
| 276 | +### 0.6.6 |
| 277 | +Cleanup debug files when prompt is closed. |
| 278 | + |
| 279 | +### 0.6.5 |
| 280 | +Improve ConnectorRunner logging. |
| 281 | + |
| 282 | +### 0.6.4 |
| 283 | +Add more data integrity checks to the regression tests suite. |
| 284 | + |
| 285 | +### 0.6.3 |
| 286 | +Make catalog diffs more readable. |
| 287 | + |
| 288 | +### 0.6.2 |
| 289 | +Clean up regression test artifacts on any exception. |
| 290 | + |
| 291 | +### 0.6.1 |
| 292 | +Modify diff output for `discover` and `read` tests. |
| 293 | + |
| 294 | +### 0.5.1 |
| 295 | +Handle connector command execution errors. |
| 296 | + |
| 297 | +### 0.5.0 |
| 298 | +Add new tests and confirmation prompts. |
| 299 | + |
| 300 | +### 0.4.0 |
| 301 | +Introduce DuckDB to store the messages produced by the connector. |
| 302 | + |
| 303 | +### 0.3.0 |
| 304 | +Pass connection id to the regression tests suite. |
| 305 | + |
| 306 | +### 0.2.0 |
| 307 | +Declare the regression tests suite. |
| 308 | + |
| 309 | +### 0.1.0 |
| 310 | +Implement initial primitives and a `debug` command to run connector commands and persist the outputs to local storage. |
0 commit comments