Skip to content

Commit 5a510dc

Browse files
evantahleredgao
andauthored
[docs] replace _airbyte_meta.errors with _airbyte_meta.changes (#38319)
Co-authored-by: Edward Gao <[email protected]>
1 parent 3a971c3 commit 5a510dc

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

docs/using-airbyte/core-concepts/typing-deduping.md

+15-12
Original file line numberDiff line numberDiff line change
@@ -15,46 +15,49 @@ replicated. Please check each destination to learn if Typing and Deduping is sup
1515

1616
- One-to-one table mapping: Data in one stream will always be mapped to one table in your data
1717
warehouse. No more sub-tables.
18-
- Improved per-row error handling with `_airbyte_meta`: Airbyte will now populate typing errors in
18+
- Improved per-row error/change handling with `_airbyte_meta`: Airbyte will now populate typing changes in
1919
the `_airbyte_meta` column instead of failing your sync. You can query these results to audit
2020
misformatted or unexpected data.
2121
- Internal Airbyte tables in the `airbyte_internal` schema: Airbyte will now generate all raw tables
2222
in the `airbyte_internal` schema. We no longer clutter your desired schema with raw data tables.
2323
- Incremental delivery for large syncs: Data will be incrementally delivered to your final tables
2424
when possible. No more waiting hours to see the first rows in your destination table.
2525

26-
## `_airbyte_meta` Errors
26+
## `_airbyte_meta` Changes
2727

28-
"Per-row error handling" is a new paradigm for Airbyte which provides greater flexibility for our
28+
"Per-row change handling" is a new paradigm for Airbyte which provides greater flexibility for our
2929
users. Airbyte now separates `data-moving problems` from `data-content problems`. Prior to
3030
Destinations V2, both types of errors were handled the same way: by failing the sync. Now, a failing
3131
sync means that Airbyte could not _move_ all of your data. You can query the `_airbyte_meta` column
3232
to see which rows failed for _content_ reasons, and why. This is a more flexible approach, as you
33-
can now decide how to handle rows with errors on a case-by-case basis.
33+
can now decide how to handle rows with errors/changes on a case-by-case basis.
3434

3535
:::tip
3636

3737
When using data downstream from Airbyte, we generally recommend you only include rows which do not
38-
have an error, e.g:
38+
have an change, e.g:
3939

4040
```sql
4141
-- postgres syntax
42-
SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> errors) = 0
42+
SELECT COUNT(*) FROM _table_ WHERE json_array_length(_airbyte_meta ->> changes) = 0
4343
```
4444

4545
:::
4646

47-
The types of errors which will be stored in `_airbyte_meta.errors` include:
47+
The types of changes which will be stored in `_airbyte_meta.changes` include:
4848

49-
- **Typing errors**: the source declared that the type of the column `id` should be an integer, but
49+
- **Typing changes**: the source declared that the type of the column `id` should be an integer, but
5050
a string value was returned.
51-
- **Size errors (coming soon)**: the source returned content which cannot be stored within this this
51+
- **Size changes**: the source returned content which cannot be stored within this this
5252
row or column (e.g.
5353
[a Redshift Super column has a 16mb limit](https://docs.aws.amazon.com/redshift/latest/dg/limitations-super.html)).
5454
Destinations V2 will allow us to trim records which cannot fit into destinations, but retain the
55-
primary key(s) and cursors and include "too big" error messages.
55+
primary key(s) and cursors and include "too big" changes messages.
5656

57-
Depending on your use-case, it may still be valuable to consider rows with errors, especially for
57+
Also, sources can make use of the same tooling to denote that there was a problem emitting the Airbyte record to begin with,
58+
possibly also creating an entry in `_airbyte_meta.changes`.
59+
60+
Depending on your use-case, it may still be valuable to consider rows with changes, especially for
5861
aggregations. For example, you may have a table `user_reviews`, and you would like to know the count
5962
of new reviews received today. You can choose to include reviews regardless of whether your data
6063
warehouse had difficulty storing the full contents of the `message` column. For this use case,
@@ -83,7 +86,7 @@ The data from one stream will now be mapped to one table in your schema as below
8386
| _(note, not in actual table)_ | \_airbyte_raw_id | \_airbyte_extracted_at | \_airbyte_meta | id | first_name | age | address |
8487
| -------------------------------------------- | ---------------- | ---------------------- | -------------------------------------------------------------- | --- | ---------- | ---- | ----------------------------------------- |
8588
| Successful typing and de-duping ⟶ | xxx-xxx-xxx | 2022-01-01 12:00:00 | `{}` | 1 | sarah | 39 | `{ city: “San Francisco”, zip: “94131” }` |
86-
| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | 2022-01-01 12:00:00 | `{ errors: {[“fish” is not a valid integer for column “age”]}` | 2 | evan | NULL | `{ city: “Menlo Park”, zip: “94002” }` |
89+
| Failed typing that didn’t break other rows ⟶ | yyy-yyy-yyy | 2022-01-01 12:00:00 | `{ changes: {"field": "age", "change": "NULLED", "reason": "DESTINATION_TYPECAST_ERROR"}}` | 2 | evan | NULL | `{ city: “Menlo Park”, zip: “94002” }` |
8790
| Not-yet-typed ⟶ | | | | | | | |
8891

8992
In legacy normalization, columns of

0 commit comments

Comments
 (0)