Skip to content

Commit 8276d03

Browse files
authored
Normalization: handle non-object top-level schemas; treat binary data as string (#22165)
* handle dumb top-level schemas * version bump * also definitions * treat binary as string * fallback case * format * new variable
1 parent 2c97aa3 commit 8276d03

File tree

5 files changed

+27
-13
lines changed

5 files changed

+27
-13
lines changed

airbyte-config/init/src/main/resources/seed/destination_definitions.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545
icon: bigquery.svg
4646
normalizationConfig:
4747
normalizationRepository: airbyte/normalization
48-
normalizationTag: 0.3.1
48+
normalizationTag: 0.3.2
4949
normalizationIntegrationType: bigquery
5050
supportsDbt: true
5151
resourceRequirements:
@@ -91,7 +91,7 @@
9191
releaseStage: alpha
9292
normalizationConfig:
9393
normalizationRepository: airbyte/normalization-clickhouse
94-
normalizationTag: 0.3.1
94+
normalizationTag: 0.3.2
9595
normalizationIntegrationType: clickhouse
9696
supportsDbt: true
9797
- name: Cloudflare R2
@@ -213,7 +213,7 @@
213213
releaseStage: alpha
214214
normalizationConfig:
215215
normalizationRepository: airbyte/normalization-mssql
216-
normalizationTag: 0.3.1
216+
normalizationTag: 0.3.2
217217
normalizationIntegrationType: mssql
218218
supportsDbt: true
219219
- name: MeiliSearch
@@ -239,7 +239,7 @@
239239
releaseStage: alpha
240240
normalizationConfig:
241241
normalizationRepository: airbyte/normalization-mysql
242-
normalizationTag: 0.3.1
242+
normalizationTag: 0.3.2
243243
normalizationIntegrationType: mysql
244244
supportsDbt: true
245245
- name: Oracle
@@ -251,7 +251,7 @@
251251
releaseStage: alpha
252252
normalizationConfig:
253253
normalizationRepository: airbyte/normalization-oracle
254-
normalizationTag: 0.3.1
254+
normalizationTag: 0.3.2
255255
normalizationIntegrationType: oracle
256256
supportsDbt: true
257257
- name: Postgres
@@ -263,7 +263,7 @@
263263
releaseStage: alpha
264264
normalizationConfig:
265265
normalizationRepository: airbyte/normalization
266-
normalizationTag: 0.3.1
266+
normalizationTag: 0.3.2
267267
normalizationIntegrationType: postgres
268268
supportsDbt: true
269269
- name: Pulsar
@@ -295,7 +295,7 @@
295295
icon: redshift.svg
296296
normalizationConfig:
297297
normalizationRepository: airbyte/normalization-redshift
298-
normalizationTag: 0.3.1
298+
normalizationTag: 0.3.2
299299
normalizationIntegrationType: redshift
300300
supportsDbt: true
301301
resourceRequirements:
@@ -353,7 +353,7 @@
353353
icon: snowflake.svg
354354
normalizationConfig:
355355
normalizationRepository: airbyte/normalization-snowflake
356-
normalizationTag: 0.3.1
356+
normalizationTag: 0.3.2
357357
normalizationIntegrationType: snowflake
358358
supportsDbt: true
359359
resourceRequirements:
@@ -407,7 +407,7 @@
407407
releaseStage: alpha
408408
normalizationConfig:
409409
normalizationRepository: airbyte/normalization-tidb
410-
normalizationTag: 0.3.1
410+
normalizationTag: 0.3.2
411411
normalizationIntegrationType: tidb
412412
supportsDbt: true
413413
- name: Typesense

airbyte-integrations/bases/base-normalization/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,5 @@ WORKDIR /airbyte
2828
ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh"
2929
ENTRYPOINT ["/airbyte/entrypoint.sh"]
3030

31-
LABEL io.airbyte.version=0.3.1
31+
LABEL io.airbyte.version=0.3.2
3232
LABEL io.airbyte.name=airbyte/normalization

airbyte-integrations/bases/base-normalization/normalization/transform_catalog/catalog_processor.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,17 @@ def build_stream_processor(
135135
primary_key = get_field(configured_stream, "primary_key", f"Undefined primary key for stream {stream_name}")
136136

137137
message = f"'json_schema'.'properties' are not defined for stream {stream_name}"
138-
properties = get_field(get_field(stream_config, "json_schema", message), "properties", message)
138+
stream_schema = get_field(stream_config, "json_schema", f"'json_schema' is not defined for stream {stream_name}")
139+
if "properties" in stream_schema:
140+
properties = get_field(stream_schema, "properties", message)
141+
elif "oneOf" in stream_schema:
142+
options = list(filter(lambda option: "properties" in option, stream_schema["oneOf"]))
143+
if len(options) == 0:
144+
raise KeyError(f"Stream {stream_name} does not have any properties")
145+
# If there are multiple oneOf options, just pick the first one - we don't really support oneOf to begin with
146+
properties = options[0]["properties"]
147+
else:
148+
raise KeyError(f"Stream {stream_name} does not have any properties and no oneOf option with properties")
139149

140150
from_table = dbt_macro.Source(schema_name, raw_table_name)
141151

airbyte-integrations/bases/base-normalization/normalization/transform_catalog/utils.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,14 @@ def is_reftype(definition: dict) -> bool:
3131

3232

3333
def is_string(definition: dict) -> bool:
34-
return is_type_included(definition, get_reftype_function(data_type.STRING_TYPE))
34+
return is_type_included(definition, get_reftype_function(data_type.STRING_TYPE)) or is_type_included(
35+
definition, get_reftype_function(data_type.BINARY_DATA_TYPE)
36+
)
3537

3638

3739
def is_binary_datatype(definition: dict) -> bool:
38-
return is_type_included(definition, get_reftype_function(data_type.BINARY_DATA_TYPE))
40+
return False
41+
# return is_type_included(definition, get_reftype_function(data_type.BINARY_DATA_TYPE))
3942

4043

4144
def is_datetime(definition: dict) -> bool:

docs/understanding-airbyte/basic-normalization.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,7 @@ Therefore, in order to "upgrade" to the desired normalization version, you need
353353

354354
| Airbyte Version | Normalization Version | Date | Pull Request | Subject |
355355
|:----------------|:----------------------|:-----------|:-------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------|
356+
| | 0.3.2 | 2023-01-31 | [\#22165](https://github.com/airbytehq/airbyte/pull/22165) | Fix support for non-object top-level schemas |
356357
| | 0.3.1 | 2023-01-31 | [\#22161](https://github.com/airbytehq/airbyte/pull/22161) | Fix handling for combined primitive types |
357358
| | 0.3.0 | 2023-01-30 | [\#19721](https://github.com/airbytehq/airbyte/pull/19721) | Update normalization to airbyte-protocol v1.0.0 |
358359
| | 0.2.25 | 2022-12-05 | [\#19573](https://github.com/airbytehq/airbyte/pull/19573) | Update Clickhouse dbt version |

0 commit comments

Comments
 (0)