Skip to content

Source S3 - fix schema inference #17991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

davydov-d
Copy link
Contributor

@davydov-d davydov-d commented Oct 14, 2022

What

https://github.com/airbytehq/oncall/issues/678

How

Complex types processing is a special case when converting from pyarrow to JSON schema and back. Arrays are now processed the same way as objects are (see #16607). This is not supposed to be a long term solution but rather a hot fix to resolve the oncall issue.

A higher quality solution needs our type conversion to be reworked in the future

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Oct 14, 2022
@davydov-d
Copy link
Contributor Author

davydov-d commented Oct 14, 2022

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/3248902210
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/3248902210
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       145     20    86%   21-23, 29-31, 36-43, 48-61, 224
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     10    87%   15-16, 24-30, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-67, 70-72, 75-77, 80-82, 85-87, 90-92, 95-113, 147-149
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1351    453    66%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/s3file.py                                                  37      0   100%
source_s3/s3_utils.py                                                19      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source.py                                                  27      1    96%
source_s3/source_files_abstract/storagefile.py                       23      1    96%
source_s3/stream.py                                                  43      3    93%
source_s3/source_files_abstract/stream.py                           238     17    93%
source_s3/source_files_abstract/formats/abstract_file_parser.py      39      3    92%
source_s3/source_files_abstract/formats/csv_parser.py                76     18    76%
source_s3/source_files_abstract/file_info.py                         26      8    69%
source_s3/utils.py                                                   31     10    68%
source_s3/source_files_abstract/source.py                            37     14    62%
source_s3/source_files_abstract/spec.py                              44     22    50%
source_s3/source_files_abstract/formats/jsonl_parser.py              44     27    39%
source_s3/source_files_abstract/formats/avro_parser.py               38     25    34%
source_s3/source_files_abstract/formats/parquet_parser.py            61     44    28%
-------------------------------------------------------------------------------------
TOTAL                                                               828    193    77%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/storagefile.py                       23      0   100%
source_s3/source_files_abstract/spec.py                              44      0   100%
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/source.py                                                  27      0   100%
source_s3/s3file.py                                                  37      0   100%
source_s3/s3_utils.py                                                19      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source_files_abstract/formats/parquet_parser.py            61      1    98%
source_s3/source_files_abstract/formats/jsonl_parser.py              44      1    98%
source_s3/stream.py                                                  43      1    98%
source_s3/source_files_abstract/formats/abstract_file_parser.py      39      1    97%
source_s3/source_files_abstract/source.py                            37      2    95%
source_s3/source_files_abstract/formats/avro_parser.py               38      3    92%
source_s3/source_files_abstract/file_info.py                         26      3    88%
source_s3/source_files_abstract/stream.py                           238     40    83%
source_s3/source_files_abstract/formats/csv_parser.py                76     18    76%
source_s3/utils.py                                                   31      8    74%
-------------------------------------------------------------------------------------
TOTAL                                                               828     78    91%

Build Passed

Test summary info:

All Passed

@davydov-d
Copy link
Contributor Author

davydov-d commented Oct 14, 2022

/publish connector=connectors/source-s3

🕑 Publishing the following connectors:
connectors/source-s3
https://github.com/airbytehq/airbyte/actions/runs/3249514467


Connector Did it publish? Were definitions generated?
connectors/source-s3

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@davydov-d davydov-d merged commit 5aa25a1 into master Oct 14, 2022
@davydov-d davydov-d deleted the ddavydov/#678-oncall-source-s3-fix-schema-inference-for-arrays branch October 14, 2022 11:53
letiescanciano added a commit that referenced this pull request Oct 14, 2022
…vation

* master: (98 commits)
  🐛 Source Bing Ads - Fix Campaigns stream misses Audience and Shopping (#17873)
  Source S3 - fix schema inference (#17991)
  🎉 JDBC sources: store cursor record count in db state (#15535)
  Introduce webhook configs into workspace api and persistence (#17950)
  ci: upload test results to github for analysis (#17953)
  Trigger the connectors build if there are worker changes. (#17976)
  Add additional sync timing information (#17643)
  Use page_token_option instead of page_token (#17892)
  capture metrics around json messages size (#17973)
  🐛 Correct kube annotations variable as per the docs. (#17972)
  🪟 🎉 Add /connector-builder page with embedded YAML editor (#17482)
  fix `est_num_metrics_emitted_by_reporter` not being emitted (#17929)
  Update schema dumps (#17960)
  Remove the bump in the value.yml (#17959)
  Ensure database initialization in test container (#17697)
  Remove typo line from incremental reads docs (#17920)
  DocS: Update authentication.md (#17931)
  Use MessageMigration for Source Connection Check. (#17656)
  fixed links (#17949)
  remove usages of YamlSeedConfigPersistence (#17895)
  ...
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* airbytehq#678 oncall. Source S3 - fix schema inference

* source s3: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants