Skip to content

Commit 02750d4

Browse files
authored
Move live-tests to airbyte repo (#37318)
1 parent 653113b commit 02750d4

39 files changed

+8080
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
regression_tests_artifacts
2+
live_tests_debug_reports
+310
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
# Connector Live Testing
2+
3+
This project contains utilities for running connector tests against live data.
4+
5+
## Requirements
6+
* `docker`
7+
* `Python ^3.10`
8+
* `pipx`
9+
* `poetry`
10+
11+
## Install
12+
```bash
13+
# From tools/connectors/live-tests
14+
pipx install .
15+
# To install in editable mode for development
16+
pipx install . --force --editable
17+
```
18+
19+
## Commands
20+
21+
### `debug`
22+
23+
```
24+
Usage: live-tests debug [OPTIONS] {check|discover|read|read-with-state|spec}
25+
26+
Run a specific command on one or multiple connectors and persists the
27+
outputs to local storage.
28+
29+
Options:
30+
--connection-id TEXT
31+
--config-path FILE
32+
--catalog-path FILE
33+
--state-path FILE
34+
-c, --connector-image TEXT Docker image name of the connector to debug
35+
(e.g. `airbyte/source-faker:latest`,
36+
`airbyte/source-faker:dev`) [required]
37+
-hc, --http-cache Use the HTTP cache for the connector.
38+
--help Show this message and exit.
39+
```
40+
41+
This command is made to run any of the following connector commands against one or multiple connector images.
42+
43+
**Available connector commands:**
44+
* `spec`
45+
* `check`
46+
* `discover`
47+
* `read` or `read_with_state` (requires a `--state-path` to be passed)
48+
49+
It will write artifacts to an output directory:
50+
* `stdout.log`: The collected standard output following the command execution
51+
* `stderr.log`: The collected standard error following the c
52+
* `http_dump.txt`: An `mitmproxy` http stream log. Can be consumed with `mitmweb` (version `9.0.1`) for debugging.
53+
* `airbyte_messages.db`: A DuckDB database containing the messages produced by the connector.
54+
* `airbyte_messages`: A directory containing `.jsonl` files for each message type (logs, records, traces, controls, states etc.) produced by the connector.
55+
56+
#### Example
57+
Let's run `debug` to check the output of `read` on two different versions of the same connector:
58+
59+
```bash
60+
live-tests debug read \
61+
--connector-image=airbyte/source-pokeapi:dev \
62+
--connector-image=airbyte/source-pokeapi:latest \
63+
--config-path=poke_config.json \
64+
--catalog-path=configured_catalog.json
65+
```
66+
67+
It will store the results in a `live_test_debug_reports` directory under the current working directory:
68+
69+
```
70+
live_tests_debug_reports
71+
└── 1709547771
72+
└── source-pokeapi
73+
└── read
74+
├── dev
75+
│   ├── airbyte_messages
76+
| │ ├── duck.db # DuckDB database
77+
│   │   ├── logs.jsonl
78+
│   │   ├── records.jsonl
79+
│   │   └── traces.jsonl
80+
│   ├── stderr.log
81+
│   └── stdout.log
82+
└── latest
83+
├── airbyte_messages
84+
│ ├── duck.db # DuckDB database
85+
│   ├── logs.jsonl
86+
│   ├── records.jsonl
87+
│   └── traces.jsonl
88+
├── stderr.log
89+
└── stdout.log
90+
91+
```
92+
93+
You can also run the `debug` command on a live connection by passing the `--connection-id` option:
94+
95+
```bash
96+
live-tests debug read \
97+
--connector-image=airbyte/source-pokeapi:dev \
98+
--connector-image=airbyte/source-pokeapi:latest \
99+
--connection-id=<CONNECTION-ID>
100+
```
101+
102+
##### Consuming `http_dump.mitm`
103+
You can install [`mitmproxy`](https://mitmproxy.org/):
104+
```bash
105+
pipx install mitmproxy
106+
```
107+
108+
And run:
109+
```bash
110+
mitmweb --rfile=http_dump.mitm
111+
```
112+
113+
## Regression tests
114+
We created a regression test suite to run tests to compare the outputs of connector commands on different versions of the same connector.
115+
116+
You can run the existing test suites with the following command:
117+
118+
#### With local connection objects (`config.json`, `catalog.json`, `state.json`)
119+
```bash
120+
poetry run pytest src/live_tests/regression_tests \
121+
--connector-image=airbyte/source-faker \
122+
--config-path=<path-to-config-path> \
123+
--catalog-path=<path-to-catalog-path> \
124+
--target-version=dev \
125+
--control-version=latest
126+
--pr-url=<PR-URL> # The URL of the PR you are testing
127+
```
128+
129+
#### Using a live connection
130+
The live connection objects will be fetched.
131+
132+
```bash
133+
poetry run pytest src/live_tests/regression_tests \
134+
--connector-image=airbyte/source-faker \
135+
--connection-id=<CONNECTION-ID> \
136+
--target-version=dev \
137+
--control-version=latest
138+
--pr-url=<PR-URL> # The URL of the PR you are testing
139+
```
140+
141+
You can also pass local connection objects path to override the live connection objects with `--config-path`, `--state-path` or `--catalog-path`.
142+
143+
#### Test artifacts
144+
The test suite run will produce test artifacts in the `/tmp/regression_tests_artifacts/` folder.
145+
**They will get cleared after each test run on prompt exit. Please do not copy them elsewhere in your filesystem as they contain sensitive data that are not meant to be stored outside of your debugging session!**
146+
147+
##### Artifacts types
148+
* `report.html`: A report of the test run.
149+
* `stdout.log`: The collected standard output following the command execution
150+
* `stderr.log`: The collected standard error following the command execution
151+
* `http_dump.mitm`: An `mitmproxy` http stream log. Can be consumed with `mitmweb` (version `>=10`) for debugging.
152+
* `http_dump.har`: An `mitmproxy` http stream log in HAR format (a JSON encoded version of the mitm dump).
153+
* `airbyte_messages`: A directory containing `.jsonl` files for each message type (logs, records, traces, controls, states etc.) produced by the connector.
154+
* `duck.db`: A DuckDB database containing the messages produced by the connector.
155+
* `dagger.log`: The log of the Dagger session, useful for debugging errors unrelated to the tests.
156+
157+
**Tests can also write specific artifacts like diffs under a directory named after the test function.**
158+
159+
160+
```
161+
/tmp/regression_tests_artifacts
162+
└── session_1710754231
163+
├── duck.db
164+
|── report.html
165+
├── command_execution_artifacts
166+
│   └── source-orb
167+
│   ├── check
168+
│   │   ├── dev
169+
│   │   │   ├── airbyte_messages
170+
│   │   │   │   ├── connection_status.jsonl
171+
│   │   │   │   └── logs.jsonl
172+
│   │   │   ├── http_dump.har
173+
│   │   │   ├── http_dump.mitm
174+
│   │   │   ├── stderr.log
175+
│   │   │   └── stdout.log
176+
│   │   └── latest
177+
│   │   ├── airbyte_messages
178+
│   │   │   ├── connection_status.jsonl
179+
│   │   │   └── logs.jsonl
180+
│   │   ├── http_dump.har
181+
│   │   ├── http_dump.mitm
182+
│   │   ├── stderr.log
183+
│   │   └── stdout.log
184+
│   ├── discover
185+
│   │   ├── dev
186+
│   │   │   ├── airbyte_messages
187+
│   │   │   │   └── catalog.jsonl
188+
│   │   │   ├── http_dump.har
189+
│   │   │   ├── http_dump.mitm
190+
│   │   │   ├── stderr.log
191+
│   │   │   └── stdout.log
192+
│   │   └── latest
193+
│   │   ├── airbyte_messages
194+
│   │   │   └── catalog.jsonl
195+
│   │   ├── http_dump.har
196+
│   │   ├── http_dump.mitm
197+
│   │   ├── stderr.log
198+
│   │   └── stdout.log
199+
│   ├── read-with-state
200+
│   │   ├── dev
201+
│   │   │   ├── airbyte_messages
202+
│   │   │   │   ├── logs.jsonl
203+
│   │   │   │   ├── records.jsonl
204+
│   │   │   │   ├── states.jsonl
205+
│   │   │   │   └── traces.jsonl
206+
│   │   │   ├── http_dump.har
207+
│   │   │   ├── http_dump.mitm
208+
│   │   │   ├── stderr.log
209+
│   │   │   └── stdout.log
210+
│   │   └── latest
211+
│   │   ├── airbyte_messages
212+
│   │   │   ├── logs.jsonl
213+
│   │   │   ├── records.jsonl
214+
│   │   │   ├── states.jsonl
215+
│   │   │   └── traces.jsonl
216+
│   │   ├── http_dump.har
217+
│   │   ├── http_dump.mitm
218+
│   │   ├── stderr.log
219+
│   │   └── stdout.log
220+
│   └── spec
221+
│   ├── dev
222+
│   │   ├── airbyte_messages
223+
│   │   │   └── spec.jsonl
224+
│   │   ├── stderr.log
225+
│   │   └── stdout.log
226+
│   └── latest
227+
│   ├── airbyte_messages
228+
│   │   └── spec.jsonl
229+
│   ├── stderr.log
230+
│   └── stdout.log
231+
└── dagger.log
232+
```
233+
234+
#### HTTP Proxy and caching
235+
We use a containerized `mitmproxy` to capture the HTTP traffic between the connector and the source. Connector command runs produce `http_dump.mitm` (can be consumed with `mitmproxy` (version `>=10`) for debugging) and `http_dump.har` (a JSON encoded version of the mitm dump) artifacts.
236+
The traffic recorded on the control connector is passed to the target connector proxy to cache the responses for requests with the same URL. This is useful to avoid hitting the source API multiple times when running the same command on different versions of the connector.
237+
238+
## Changelog
239+
240+
### 0.14.2
241+
Fix KeyError when target & control streams differ.
242+
243+
### 0.14.1
244+
Improve performance when reading records per stream.
245+
246+
### 0.14.0
247+
Track usage via Segment.
248+
249+
### 0.13.0
250+
Show test docstring in the test report.
251+
252+
### 0.12.0
253+
Implement a test to compare schema inferred on both control and target version.
254+
255+
### 0.11.0
256+
Create a global duckdb instance to store messages produced by the connector in target and control version.
257+
258+
### 0.10.0
259+
Show record count per stream in report and list untested streams.
260+
261+
### 0.9.0
262+
Make the regressions tests suite better at handling large connector outputs.
263+
264+
### 0.8.1
265+
Improve diff output.
266+
267+
### 0.8.0
268+
Regression tests: add an HTML report.
269+
270+
### 0.7.0
271+
Improve the proxy workflow and caching logic + generate HAR files.
272+
273+
### 0.6.6
274+
Exit pytest if connection can't be retrieved.
275+
276+
### 0.6.6
277+
Cleanup debug files when prompt is closed.
278+
279+
### 0.6.5
280+
Improve ConnectorRunner logging.
281+
282+
### 0.6.4
283+
Add more data integrity checks to the regression tests suite.
284+
285+
### 0.6.3
286+
Make catalog diffs more readable.
287+
288+
### 0.6.2
289+
Clean up regression test artifacts on any exception.
290+
291+
### 0.6.1
292+
Modify diff output for `discover` and `read` tests.
293+
294+
### 0.5.1
295+
Handle connector command execution errors.
296+
297+
### 0.5.0
298+
Add new tests and confirmation prompts.
299+
300+
### 0.4.0
301+
Introduce DuckDB to store the messages produced by the connector.
302+
303+
### 0.3.0
304+
Pass connection id to the regression tests suite.
305+
306+
### 0.2.0
307+
Declare the regression tests suite.
308+
309+
### 0.1.0
310+
Implement initial primitives and a `debug` command to run connector commands and persist the outputs to local storage.

0 commit comments

Comments
 (0)