Skip to content

Commit d780141

Browse files
Source File: fix schema generation for json files containing an array (#16772)
* #547 oncall Source File: fix schema generation for json files containing arrays * source file: upda changelog * #547 oncall: source file - upgrade source-file-secure * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <[email protected]>
1 parent 4dc394c commit d780141

File tree

7 files changed

+64
-61
lines changed

7 files changed

+64
-61
lines changed

airbyte-config/init/src/main/resources/seed/source_definitions.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@
287287
- name: File
288288
sourceDefinitionId: 778daa7c-feaf-4db6-96f3-70fd645acc77
289289
dockerRepository: airbyte/source-file
290-
dockerImageTag: 0.2.20
290+
dockerImageTag: 0.2.22
291291
documentationUrl: https://docs.airbyte.io/integrations/sources/file
292292
icon: file.svg
293293
sourceType: file

airbyte-config/init/src/main/resources/seed/source_specs.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2705,7 +2705,7 @@
27052705
supportsNormalization: false
27062706
supportsDBT: false
27072707
supported_destination_sync_modes: []
2708-
- dockerImage: "airbyte/source-file:0.2.20"
2708+
- dockerImage: "airbyte/source-file:0.2.22"
27092709
spec:
27102710
documentationUrl: "https://docs.airbyte.io/integrations/sources/file"
27112711
connectionSpecification:
@@ -2731,6 +2731,7 @@
27312731
- "json"
27322732
- "jsonl"
27332733
- "excel"
2734+
- "excel_binary"
27342735
- "feather"
27352736
- "parquet"
27362737
- "yaml"

airbyte-integrations/connectors/source-file-secure/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM airbyte/source-file:0.2.21
1+
FROM airbyte/source-file:0.2.22
22

33
WORKDIR /airbyte/integration_code
44
COPY source_file_secure ./source_file_secure
@@ -9,5 +9,5 @@ RUN pip install .
99
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
1010
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
1111

12-
LABEL io.airbyte.version=0.2.21
12+
LABEL io.airbyte.version=0.2.22
1313
LABEL io.airbyte.name=airbyte/source-file-secure

airbyte-integrations/connectors/source-file/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,5 @@ COPY source_file ./source_file
1717
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
1818
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
1919

20-
LABEL io.airbyte.version=0.2.21
20+
LABEL io.airbyte.version=0.2.22
2121
LABEL io.airbyte.name=airbyte/source-file

airbyte-integrations/connectors/source-file/source_file/client.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,10 @@ def load_nested_json_schema(self, fp) -> dict:
263263
builder.add_object(json.load(fp))
264264

265265
result = builder.to_schema()
266+
if "items" in result:
267+
# this means we have a json list e.g. [{...}, {...}]
268+
# but need to emit schema of an inside dict
269+
result = result["items"]
266270
result["$schema"] = "http://json-schema.org/draft-07/schema#"
267271
return result
268272

airbyte-integrations/connectors/source-file/unit_tests/test_nested_json_schema.py

Lines changed: 26 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -122,39 +122,36 @@
122122

123123
expected_array_schema = {
124124
"$schema": "http://json-schema.org/draft-07/schema#",
125-
"items": {
126-
"properties": {
127-
"batters": {
128-
"properties": {
129-
"batter": {
130-
"items": {
131-
"properties": {"id": {"type": "string"}, "type": {"type": "string"}},
132-
"required": ["id", "type"],
133-
"type": "object",
134-
},
135-
"type": "array",
136-
}
137-
},
138-
"required": ["batter"],
139-
"type": "object",
125+
"properties": {
126+
"batters": {
127+
"properties": {
128+
"batter": {
129+
"items": {
130+
"properties": {"id": {"type": "string"}, "type": {"type": "string"}},
131+
"required": ["id", "type"],
132+
"type": "object",
133+
},
134+
"type": "array",
135+
}
140136
},
141-
"id": {"type": "string"},
142-
"name": {"type": "string"},
143-
"ppu": {"type": "number"},
144-
"topping": {
145-
"items": {
146-
"properties": {"id": {"type": "string"}, "type": {"type": "string"}},
147-
"required": ["id", "type"],
148-
"type": "object",
149-
},
150-
"type": "array",
137+
"required": ["batter"],
138+
"type": "object",
139+
},
140+
"id": {"type": "string"},
141+
"name": {"type": "string"},
142+
"ppu": {"type": "number"},
143+
"topping": {
144+
"items": {
145+
"properties": {"id": {"type": "string"}, "type": {"type": "string"}},
146+
"required": ["id", "type"],
147+
"type": "object",
151148
},
152-
"type": {"type": "string"},
149+
"type": "array",
153150
},
154-
"required": ["batters", "id", "name", "ppu", "topping", "type"],
155-
"type": "object",
151+
"type": {"type": "string"},
156152
},
157-
"type": "array",
153+
"required": ["batters", "id", "name", "ppu", "topping", "type"],
154+
"type": "object",
158155
}
159156

160157

docs/integrations/sources/file.md

Lines changed: 28 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -127,30 +127,31 @@ In order to read large files from a remote location, this connector uses the [sm
127127

128128
## Changelog
129129

130-
| Version | Date | Pull Request | Subject |
131-
|---------|------------|----------------------------------------------------------|---------------------------------------------------|
132-
| 0.2.21 | 2022-08-26 | [15568](https://github.com/airbytehq/airbyte/pull/15568) | Specify `pyxlsb` library for Excel Binary Workbook files
133-
| 0.2.20 | 2022-08-23 | [15870](https://github.com/airbytehq/airbyte/pull/15870) | Fix CSV schema discovery |
134-
| 0.2.19 | 2022-08-19 | [15768](https://github.com/airbytehq/airbyte/pull/15768) | Convert 'nan' to 'null' |
135-
| 0.2.18 | 2022-08-16 | [15698](https://github.com/airbytehq/airbyte/pull/15698) | Cache binary stream to file for discover |
136-
| 0.2.17 | 2022-08-11 | [15501](https://github.com/airbytehq/airbyte/pull/15501) | Cache binary stream to file |
137-
| 0.2.16 | 2022-08-10 | [15293](https://github.com/airbytehq/airbyte/pull/15293) | added support for encoding reader option |
138-
| 0.2.15 | 2022-08-05 | [15269](https://github.com/airbytehq/airbyte/pull/15269) | Bump `smart-open` version to 6.0.0 |
139-
| 0.2.12 | 2022-07-12 | [14535](https://github.com/airbytehq/airbyte/pull/14535) | Fix invalid schema generation for JSON files |
140-
| 0.2.11 | 2022-07-12 | [9974](https://github.com/airbytehq/airbyte/pull/14588) | Add support to YAML format |
141-
| 0.2.9 | 2022-02-01 | [9974](https://github.com/airbytehq/airbyte/pull/9974) | Update airbyte-cdk 0.1.47 |
142-
| 0.2.8 | 2021-12-06 | [8524](https://github.com/airbytehq/airbyte/pull/8524) | Update connector fields title/description |
143-
| 0.2.7 | 2021-10-28 | [7387](https://github.com/airbytehq/airbyte/pull/7387) | Migrate source to CDK structure, add SAT testing. |
144-
| 0.2.6 | 2021-08-26 | [5613](https://github.com/airbytehq/airbyte/pull/5613) | Add support to xlsb format |
145-
| 0.2.5 | 2021-07-26 | [4953](https://github.com/airbytehq/airbyte/pull/4953) | Allow non-default port for SFTP type |
146-
| 0.2.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support |
147-
| 0.2.3 | 2021-06-01 | [3771](https://github.com/airbytehq/airbyte/pull/3771) | Add Azure Storage Blob Files option |
148-
| 0.2.2 | 2021-04-16 | [2883](https://github.com/airbytehq/airbyte/pull/2883) | Fix CSV discovery memory consumption |
149-
| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning |
150-
| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties |
151-
| 0.1.10 | 2021-02-18 | [2118](https://github.com/airbytehq/airbyte/pull/2118) | Support JSONL format |
152-
| 0.1.9 | 2021-02-02 | [1768](https://github.com/airbytehq/airbyte/pull/1768) | Add test cases for all formats |
153-
| 0.1.8 | 2021-01-27 | [1738](https://github.com/airbytehq/airbyte/pull/1738) | Adopt connector best practices |
154-
| 0.1.7 | 2020-12-16 | [1331](https://github.com/airbytehq/airbyte/pull/1331) | Refactor Python base connector |
155-
| 0.1.6 | 2020-12-08 | [1249](https://github.com/airbytehq/airbyte/pull/1249) | Handle NaN values |
156-
| 0.1.5 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file |
130+
| Version | Date | Pull Request | Subject |
131+
|---------|------------|----------------------------------------------------------|----------------------------------------------------------|
132+
| 0.2.22 | 2022-09-15 | [16772](https://github.com/airbytehq/airbyte/pull/16772) | Fix schema generation for JSON files containing arrays |
133+
| 0.2.21 | 2022-08-26 | [15568](https://github.com/airbytehq/airbyte/pull/15568) | Specify `pyxlsb` library for Excel Binary Workbook files |
134+
| 0.2.20 | 2022-08-23 | [15870](https://github.com/airbytehq/airbyte/pull/15870) | Fix CSV schema discovery |
135+
| 0.2.19 | 2022-08-19 | [15768](https://github.com/airbytehq/airbyte/pull/15768) | Convert 'nan' to 'null' |
136+
| 0.2.18 | 2022-08-16 | [15698](https://github.com/airbytehq/airbyte/pull/15698) | Cache binary stream to file for discover |
137+
| 0.2.17 | 2022-08-11 | [15501](https://github.com/airbytehq/airbyte/pull/15501) | Cache binary stream to file |
138+
| 0.2.16 | 2022-08-10 | [15293](https://github.com/airbytehq/airbyte/pull/15293) | added support for encoding reader option |
139+
| 0.2.15 | 2022-08-05 | [15269](https://github.com/airbytehq/airbyte/pull/15269) | Bump `smart-open` version to 6.0.0 |
140+
| 0.2.12 | 2022-07-12 | [14535](https://github.com/airbytehq/airbyte/pull/14535) | Fix invalid schema generation for JSON files |
141+
| 0.2.11 | 2022-07-12 | [9974](https://github.com/airbytehq/airbyte/pull/14588) | Add support to YAML format |
142+
| 0.2.9 | 2022-02-01 | [9974](https://github.com/airbytehq/airbyte/pull/9974) | Update airbyte-cdk 0.1.47 |
143+
| 0.2.8 | 2021-12-06 | [8524](https://github.com/airbytehq/airbyte/pull/8524) | Update connector fields title/description |
144+
| 0.2.7 | 2021-10-28 | [7387](https://github.com/airbytehq/airbyte/pull/7387) | Migrate source to CDK structure, add SAT testing. |
145+
| 0.2.6 | 2021-08-26 | [5613](https://github.com/airbytehq/airbyte/pull/5613) | Add support to xlsb format |
146+
| 0.2.5 | 2021-07-26 | [4953](https://github.com/airbytehq/airbyte/pull/4953) | Allow non-default port for SFTP type |
147+
| 0.2.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support |
148+
| 0.2.3 | 2021-06-01 | [3771](https://github.com/airbytehq/airbyte/pull/3771) | Add Azure Storage Blob Files option |
149+
| 0.2.2 | 2021-04-16 | [2883](https://github.com/airbytehq/airbyte/pull/2883) | Fix CSV discovery memory consumption |
150+
| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning |
151+
| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties |
152+
| 0.1.10 | 2021-02-18 | [2118](https://github.com/airbytehq/airbyte/pull/2118) | Support JSONL format |
153+
| 0.1.9 | 2021-02-02 | [1768](https://github.com/airbytehq/airbyte/pull/1768) | Add test cases for all formats |
154+
| 0.1.8 | 2021-01-27 | [1738](https://github.com/airbytehq/airbyte/pull/1738) | Adopt connector best practices |
155+
| 0.1.7 | 2020-12-16 | [1331](https://github.com/airbytehq/airbyte/pull/1331) | Refactor Python base connector |
156+
| 0.1.6 | 2020-12-08 | [1249](https://github.com/airbytehq/airbyte/pull/1249) | Handle NaN values |
157+
| 0.1.5 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file |

0 commit comments

Comments
 (0)