Skip to content

Google Sheets: add row_id to rows and use as primary key #19215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 14, 2022

Conversation

mdibaiee
Copy link
Contributor

@mdibaiee mdibaiee commented Nov 9, 2022

What

Google Sheets connector does not have a primary key and there is no information on the row_id (changes to a row are not easily distinguishable)

How

  • Add row_id to data emitted by google sheets source and use this value as the primary key

Recommended reading order

  1. airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py
  2. airbyte-integrations/connectors/source-google-sheets/google_sheets_source/helpers.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Not sure if addition of a new data field is considered a breaking change, it might be.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit image
Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Nov 9, 2022
@vincentkoc vincentkoc self-requested a review November 13, 2022 12:28
@vincentkoc vincentkoc self-assigned this Nov 13, 2022
@vincentkoc
Copy link
Contributor

@mdibaiee thanks for your contribution. I'm one of the Airbyte maintainers and will take a look and review your PR.

@vincentkoc
Copy link
Contributor

vincentkoc commented Nov 13, 2022

/test connector=connectors/source-google-sheets

🕑 connectors/source-google-sheets https://github.com/airbytehq/airbyte/actions/runs/3455260927
✅ connectors/source-google-sheets https://github.com/airbytehq/airbyte/actions/runs/3455260927
Python tests coverage:

Name                                                Stmts   Miss  Cover
-----------------------------------------------------------------------
google_sheets_source/models/spreadsheet_values.py      12      0   100%
google_sheets_source/models/spreadsheet.py             34      0   100%
google_sheets_source/models/__init__.py                 2      0   100%
google_sheets_source/__init__.py                        2      0   100%
google_sheets_source/helpers.py                       141     26    82%
google_sheets_source/client.py                         23      6    74%
google_sheets_source/google_sheets_source.py          110     88    20%
-----------------------------------------------------------------------
TOTAL                                                 324    120    63%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       139      5    96%   87, 93, 235, 239-240
	 source_acceptance_test/conftest.py                     196     92    53%   35, 41-43, 48, 54, 60, 66, 72-74, 93, 98-100, 106-108, 114-115, 120-121, 126, 132, 141-150, 156-161, 176, 200, 231, 237, 243-248, 256-261, 269-282, 287-293, 300-311, 318-334
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              394    111    72%   53, 58, 87-95, 100-107, 111-112, 116-117, 293, 331-348, 357-365, 369-374, 380, 413-418, 456-463, 506-508, 511, 576-584, 596-599, 604, 660-661, 667, 670, 706-716, 729-754
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-68, 71-73, 76-78, 81-83, 86-88, 91-93, 96-114, 148-150
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1557    349    78%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
============ 30 passed, 1 skipped, 29 warnings in 111.36s (0:01:51) ============

@vincentkoc vincentkoc changed the title source-google-sheets: add row_id to rows and use as primary key Google Sheets: add row_id to rows and use as primary key Nov 13, 2022
@vincentkoc
Copy link
Contributor

vincentkoc commented Nov 13, 2022

/publish connector=connectors/source-google-sheets run-tests=false

🕑 Publishing the following connectors:
connectors/source-google-sheets
https://github.com/airbytehq/airbyte/actions/runs/3457044598


Connector Did it publish? Were definitions generated?
connectors/source-google-sheets

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@sajarin sajarin merged commit 44ac470 into airbytehq:master Nov 14, 2022
@sajarin sajarin added bounty-L Maintainer program: claimable large bounty PR and removed bounty labels Nov 14, 2022
@philippeboyd
Copy link
Contributor

@koconder Not sure how you reviewed the code... Why did we jump from 0.2.21 to 0.2.30 ?

@danieldiamond
Copy link
Contributor

@koconder @mdibaiee i'm experiencing issues now on version 0.2.30 with full-refresh overwrite tables only pulling half the rows

@danieldiamond
Copy link
Contributor

Screenshot 2022-11-18 at 10 28 09 am

@danieldiamond
Copy link
Contributor

@vincentkoc
Copy link
Contributor

vincentkoc commented Nov 18, 2022

@danieldiamond we are rolling this back as we speak.

@danieldiamond
Copy link
Contributor

@koconder please add regression tests.

@vincentkoc
Copy link
Contributor

@danieldiamond agree we need better tests to catch this. I have flagged with the team. Sorry for the inconvenience caused, we will be more diligent with such changes in future.

@mdibaiee
Copy link
Contributor Author

@danieldiamond I don't see how this change would cause the connector to "pull half the rows", unless it used to pull duplicate documents (two for each row) and now that it knows the primary key it's only pulling them once. With that said, I don't know what the test is doing so I don't have all the context.

akashkulk pushed a commit that referenced this pull request Dec 2, 2022
* source-google-sheets: add row_id to rows and use as primary key

* Update Dockerfile

* Update google-sheets.md

* Update Dockerfile

* Update google-sheets.md

* auto-bump connector version

Co-authored-by: Vincent Koc <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation bounty bounty-L Maintainer program: claimable large bounty PR community connectors/source/google-sheets
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

7 participants