Skip to content

🎉 Source Table Storage: Add incremental append capability #14212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

maikelpenz
Copy link
Contributor

What

This is enhancing the existing azure table storage connector to load data incrementally.

  • The existing full refresh capability remains the same as previously developed.
  • Incremental append is the new functionality. Deduplication is not part of this.
  • This implementation uses a source-defined cursor named "PartitionKey". All table storage tables have a "PartitionKey" column as part of its definition. Incremental loads will work as long as the PartitionKey is an increasing number. Check the table storage filter on line 53 of the streams module (filter_query).
  • The code had to be changed quite a bit to include incremental append.
    • The current "reader" module has been updated and renamed to "azure_table"
    • the current "source" module has been split into "source" and "streams". Streams is now necessary because of the state manipulation.

In the UI you will now see a new Sync mode drop down option "Incremental | Append" and the Cursor field is automatically set to "Partition Key"

image

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@CLAassistant
Copy link

CLAassistant commented Jun 28, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Jun 28, 2022
@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch from 8b4818f to 0eeedc4 Compare June 28, 2022 09:35
@maikelpenz
Copy link
Contributor Author

maikelpenz commented Jun 28, 2022

This closes the issue 11275. CLA has already been agreed but the workflow still shows as pending. 🤔

@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch from 3c841d1 to 97195a4 Compare June 29, 2022 21:17
@harshithmullapudi
Copy link
Contributor

Hey @maikelpenz looks like you have committed with a different user. Could you check that?

@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch 3 times, most recently from 57e783b to 7ae1be4 Compare June 30, 2022 09:07
@maikelpenz maikelpenz closed this Jun 30, 2022
@maikelpenz maikelpenz reopened this Jun 30, 2022
@maikelpenz
Copy link
Contributor Author

Hey @maikelpenz looks like you have committed with a different user. Could you check that?

Hey @harshithmullapudi thanks for picking this up. My bad I committed with my work email address by mistake. This has now been fixed :)

@harshithmullapudi harshithmullapudi self-requested a review July 5, 2022 04:20
@harshithmullapudi harshithmullapudi self-assigned this Jul 5, 2022
@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 6, 2022

/test connector=connectors/source-azure-table-storage

🕑 connectors/source-azure-table-storage https://github.com/airbytehq/airbyte/actions/runs/2621807623
❌ connectors/source-azure-table-storage https://github.com/airbytehq/airbyte/actions/runs/2621807623
🐛

Build Failed

Test summary info:

Could not find result summary

@harshithmullapudi harshithmullapudi force-pushed the issue-11275-add-table-storage-incremental-read-support branch from 7ae1be4 to cc0150b Compare July 6, 2022 09:29
Copy link
Contributor

@harshithmullapudi harshithmullapudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the above comments

@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch 2 times, most recently from 0793030 to 363ae5e Compare July 9, 2022 21:04
@maikelpenz
Copy link
Contributor Author

maikelpenz commented Jul 9, 2022

/test connector=connectors/source-azure-table-storage

🕑 connectors/source-azure-table-storage https://github.com/airbytehq/airbyte/actions/runs/2621807623
❌ connectors/source-azure-table-storage https://github.com/airbytehq/airbyte/actions/runs/2621807623
🐛

Build Failed

Test summary info:

Could not find result summary

@harshithmullapudi the execution of this pipeline is using a wrong connector name as an input source-azure-table-storage. The correct name is source-azure-table.

This causes the error:
Connector 'connectors/source-azure-table-storage' not found...

From what I see this is coming from ${{github.event.inputs.connector}}. I believe this has been manually informed to trigger this pipeline. @davinchia it looks like you triggered it?

Note: I don't think I have permission to trigger it myself. Let me know if I do.

@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 11, 2022

/test connector=connectors/source-azure-table

🕑 connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2647524531
❌ connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2647524531
🐛 https://gradle.com/s/2mnrodnb4rwxc

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestDiscovery::test_defined_cursors_exist_in_schema[inputs0]
FAILED test_core.py::TestBasicRead::test_read[inputs0] - docker.errors.Contai...
FAILED test_full_refresh.py::TestFullRefresh::test_sequential_reads[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:56: Skipping TestIncremental.test_two_sequential_reads because not found in the config
=================== 3 failed, 18 passed, 1 skipped in 17.00s ===================

@harshithmullapudi
Copy link
Contributor

Hey @maikelpenz could you confirm if the tests are passing local? If so could you share the screenshot?

@maikelpenz
Copy link
Contributor Author

maikelpenz commented Jul 12, 2022

Hey @maikelpenz could you confirm if the tests are passing local? If so could you share the screenshot?

Hey @harshithmullapudi. Unit tests are working but I see some functional tests are failing. I'll just need some time to go through them. Will push an update once I get them fixed up.

@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch from 6adbd83 to efab818 Compare July 13, 2022 04:35
@maikelpenz
Copy link
Contributor Author

Hey @maikelpenz could you confirm if the tests are passing local? If so could you share the screenshot?

@harshithmullapudi tests are now passing on my local. Is there a way I can trigger it through an action to validate if it works here as well? Might be simpler than going through this back and forward until it's working.

@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 15, 2022

/test connector=connectors/source-azure-table

🕑 connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2675576299
❌ connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2675576299
🐛 https://gradle.com/s/cehjar6tulnac

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - docker.errors.Contai...
FAILED test_full_refresh.py::TestFullRefresh::test_sequential_reads[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:56: Skipping TestIncremental.test_two_sequential_reads because not found in the config
=================== 2 failed, 19 passed, 1 skipped in 17.41s ===================

@maikelpenz
Copy link
Contributor Author

maikelpenz commented Jul 16, 2022

@harshithmullapudi I might need some clarification around the functional tests.

I see this is failing to read data from table storage. It works on my local because I have credentials to pull Table Storage under secrets/config.json. Obviously the CICD pipeline doesn't have access to these credentials so it fails here output = docker_runner.call_read(connector_config, configured_catalog) on both test_read and test_sequential_reads methods.

How is it meant to work through the pipeline without the secrets file in place?

@maikelpenz maikelpenz force-pushed the issue-11275-add-table-storage-incremental-read-support branch from 2a1bef5 to 676befd Compare July 18, 2022 08:45
@harshithmullapudi
Copy link
Contributor

I understood the issue will take a look at it tomorrow

@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 19, 2022

/test connector=connectors/source-azure-table

🕑 connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2699522950
✅ connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2699522950
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        77      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/tests/test_incremental.py       121     25    79%
source_acceptance_test/utils/common.py                  80     17    79%
source_acceptance_test/tests/test_core.py              294    106    64%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
------------------------------------------------------------------------
TOTAL                                                  960    246    74%
Name                                Stmts   Miss  Cover
-------------------------------------------------------
source_azure_table/constants.py         5      0   100%
source_azure_table/__init__.py          2      0   100%
source_azure_table/streams.py          32     15    53%
source_azure_table/source.py           84     42    50%
source_azure_table/azure_table.py      35     19    46%
-------------------------------------------------------
TOTAL                                 158     76    52%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:56: Skipping TestIncremental.test_two_sequential_reads because not found in the config
======================== 21 passed, 1 skipped in 20.73s ========================

@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 19, 2022

/test connector=connectors/source-azure-table

🕑 connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2699799228
✅ connectors/source-azure-table https://github.com/airbytehq/airbyte/actions/runs/2699799228
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        77      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/tests/test_incremental.py       121     25    79%
source_acceptance_test/utils/common.py                  80     17    79%
source_acceptance_test/tests/test_core.py              294    106    64%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
------------------------------------------------------------------------
TOTAL                                                  960    246    74%
Name                                Stmts   Miss  Cover
-------------------------------------------------------
source_azure_table/constants.py         5      0   100%
source_azure_table/__init__.py          2      0   100%
source_azure_table/streams.py          32     15    53%
source_azure_table/source.py           84     42    50%
source_azure_table/azure_table.py      35     19    46%
-------------------------------------------------------
TOTAL                                 158     76    52%

Build Passed

Test summary info:

All Passed

@harshithmullapudi
Copy link
Contributor

harshithmullapudi commented Jul 19, 2022

/publish connector=connectors/source-azure-table

🕑 Publishing the following connectors:
connectors/source-azure-table
https://github.com/airbytehq/airbyte/actions/runs/2699851427


Connector Did it publish? Were definitions generated?
connectors/source-azure-table

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@harshithmullapudi harshithmullapudi merged commit 8f85cff into airbytehq:master Jul 19, 2022
@maikelpenz
Copy link
Contributor Author

Thanks for following up on this @harshithmullapudi and merging my pull request. I understand from your fix commits that the issue with the integration tests was because I was pointing to a custom table and not "Test"?

mfsiega-airbyte pushed a commit that referenced this pull request Jul 21, 2022
* Add incremental append load to azure table storage. Resolves #11275

* fix: revert table name

* fix: integration tests are failing

* chore: update the version

* auto-bump connector version [ci skip]

Co-authored-by: Harshith Mullapudi <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation bounty community connectors/source/azure-table
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

4 participants